Stats Works
  • About This Website

Survivor: Outwit, Outplay, Out...analyze? (Part 2) Looking at Reddit Mentions

Analyzing the Game of Survivor -- Looking at Fan Favorites via Reddit (2)¶

After reading through [my last post] on the ETL process, you may be interested in some of the data that I had collected. Fear not, my dear data science enthusiast, for I have come baring gifts of analysis!

In this second installment of the series, I will begin the analysis by digging into some of the Reddit data that I had collected via the Pushift.io API. For more information on how this data was collected, please check out the [first article in this series], where I describe the ETL process for the Pushift (as well as other!) data.

Looking at Contestants Mentioned in Comments¶

The first query I will be using uses a few different tables. First, I use a CTE (Common Table Expression) which combines information from the contestant and episode performance stats tables. I also contain, in a separate dataframe, the episode table, which may come in handy later.

Then, we look at instances where the first or last name is contained inside of the body of the comment for comments made within a particular season. While there are cases where this will not work (for instance, when a contestant is best known by their nickname or when a shortened, or misspelled, version is used), it should give us a sense of the comments pertaining to particular players.

It's worth noting that the comments only go back to 2011 -- and the number of comments have greatly increased over time. We try a few ways of normalizing based on this information, but some players (particularly older players) will not be considered in this analysis.D

In [1]:
import os
from sqlalchemy import create_engine
import numpy as np
import pandas as pd

import plotly.graph_objects as go  
from copy import deepcopy

from plotly.express import line, bar
In [2]:
pg_un, pg_pw, pg_ip, pg_port = [os.getenv(x) for x in ['PG_UN', 'PG_PW', 'PG_IP', 'PG_PORT']]
In [3]:
def pg_uri(un, pw, ip, port):
    return f'postgresql://{un}:{pw}@{ip}:{port}'
In [4]:
eng = create_engine(pg_uri(pg_un, pg_pw, pg_ip, pg_port))
In [5]:
sql = '''
WITH contestants_to_seasons AS (
SELECT c.contestant_id, c.first_name, 
	   c.last_name, cs.contestant_season_id, 
	   cs.season_id, occupation, location, age, placement, 
	   days_lasted, votes_against, 
	   med_evac, quit, individual_wins, attempt_number, 
	   tribe_0, tribe_1, tribe_2, tribe_3, alliance_0, 
	   alliance_1, alliance_2,
	   challenge_wins, challenge_appearances, sitout, 
	   voted_for_bootee, votes_against_player, 
	   total_number_of_votes_in_episode, tribal_council_appearances, 
	   votes_at_council, number_of_jury_votes, total_number_of_jury_votes, 
	   number_of_days_spent_in_episode, days_in_exile, 
	   individual_reward_challenge_appearances, individual_reward_challenge_wins, 
	   individual_immunity_challenge_appearances, individual_immunity_challenge_wins, 
	   tribal_reward_challenge_appearances, tribal_reward_challenge_wins, 
	   tribal_immunity_challenge_appearances, tribal_immunity_challenge_wins, 
	   tribal_reward_challenge_second_of_three_place, tribal_immunity_challenge_second_of_three_place, 
	   fire_immunity_challenge, tribal_immunity_challenge_third_place, episode_id
FROM survivor.contestant c
RIGHT JOIN survivor.contestant_season cs
ON c.contestant_id = cs.contestant_id
JOIN survivor.episode_performance_stats eps
ON eps.contestant_id = cs.contestant_season_id
), matched_exact AS 
(
SELECT reddit.*, c.*
FROM survivor.reddit_comments reddit
JOIN contestants_to_seasons c
ON (POSITION(c.first_name IN reddit.body) > 0
OR POSITION(c.last_name IN reddit.body) > 0)
AND c.season_id = reddit.within_season
AND c.episode_id = reddit.most_recent_episode
WHERE within_season IS NOT NULL
)
SELECT * 
FROM matched_exact m
'''
In [6]:
reddit_df = pd.read_sql(sql, eng)
In [7]:
ep_df = pd.read_sql('SELECT * FROM survivor.episode', eng)
In [8]:
season_to_name = pd.read_sql('SELECT season_id, name AS season_name FROM survivor.season', eng)
In [9]:
reddit_df = reddit_df.merge(season_to_name, on='season_id')
In [10]:
reddit_df.rename(columns={'name': 'season_name'}, inplace=True)
In [11]:
reddit_df = reddit_df.merge(ep_df.drop(columns=['season_id']), on='episode_id')
In [12]:
reddit_df['created_dt'] = pd.to_datetime(reddit_df['created_dt'])
In [13]:
pd.options.display.max_columns = 100

Taking a Look At The Data¶

In [14]:
reddit_df.head()
Out[14]:
index_x author author_created_utc author_flair_css_class author_flair_text author_fullname body controversiality created_utc distinguished gilded id link_id nest_level parent_id reply_delay retrieved_on score score_hidden subreddit subreddit_id edited user_removed mod_removed stickied author_cakeday can_gild collapsed collapsed_reason is_submitter gildings permalink permalink_url updated_utc subreddit_type no_follow send_replies author_flair_template_id author_flair_background_color author_flair_richtext author_flair_text_color author_flair_type rte_mode subreddit_name_prefixed all_awardings associated_award author_patreon_flair author_premium awarders collapsed_because_crowd_control ... attempt_number tribe_0 tribe_1 tribe_2 tribe_3 alliance_0 alliance_1 alliance_2 challenge_wins challenge_appearances sitout voted_for_bootee votes_against_player total_number_of_votes_in_episode tribal_council_appearances votes_at_council number_of_jury_votes total_number_of_jury_votes number_of_days_spent_in_episode days_in_exile individual_reward_challenge_appearances individual_reward_challenge_wins individual_immunity_challenge_appearances individual_immunity_challenge_wins tribal_reward_challenge_appearances tribal_reward_challenge_wins tribal_immunity_challenge_appearances tribal_immunity_challenge_wins tribal_reward_challenge_second_of_three_place tribal_immunity_challenge_second_of_three_place fire_immunity_challenge tribal_immunity_challenge_third_place episode_id season_name index_y summary story challenges trivia image firstbroadcast viewership wiki_link season_episode_number overall_episode_number overall_slot_rating survivor_rating episode_name created_y updated_y
0 4540248 sampete1157 NaN None None t2_yvimwo1 Yul NaN 1585333566 None NaN flo8j0x t3_fpql0y None t1_flmiy69 NaN 1.585335e+09 1.0 None survivor t5_2qhu3 NaN None None false None None None None false {} /r/survivor/comments/fpql0y/yul/flo8j0x/ None NaN None true true None None [] None text None None [] None false false [] None ... 2.0 NaN NaN NaN NaN NaN NaN NaN 0.125 0.5 0.0 0.0 3.0 4.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 NaN NaN -2.0 NaN 695.0 Winners at War 695 We're in the Majors is the seventh episode of ... Getting voted out before the merge-- that's so... Challenge: (No Title)Two members of each tribe... * "Boa Constrictor at Yara" (Day 17): At Yara... https://vignette.wikia.nocookie.net/survivor/i... 2020-03-25 818000000.0 https://survivor.fandom.com/wiki/We%27re_in_th... 7.0 590.0 8.0 1.7 We%27re in the Majors 2020-07-11 01:03:00.566347+00:00 2020-07-19 00:48:27.943994+00:00
1 4538586 lvl4lapras NaN None None t2_54mpbhfb Yul NaN 1585316533 None NaN flne39u t3_fpql0y None t3_fpql0y NaN 1.585317e+09 1.0 None survivor t5_2qhu3 NaN None None false None None None None false {} /r/survivor/comments/fpql0y/yul/flne39u/ None NaN None true true None None [] None text None None [] None false false [] None ... 2.0 NaN NaN NaN NaN NaN NaN NaN 0.125 0.5 0.0 0.0 3.0 4.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 NaN NaN -2.0 NaN 695.0 Winners at War 695 We're in the Majors is the seventh episode of ... Getting voted out before the merge-- that's so... Challenge: (No Title)Two members of each tribe... * "Boa Constrictor at Yara" (Day 17): At Yara... https://vignette.wikia.nocookie.net/survivor/i... 2020-03-25 818000000.0 https://survivor.fandom.com/wiki/We%27re_in_th... 7.0 590.0 8.0 1.7 We%27re in the Majors 2020-07-11 01:03:00.566347+00:00 2020-07-19 00:48:27.943994+00:00
2 4539257 swells61 NaN 34Gold WS33W J.T. t2_fex1f Yul NaN 1585324546 None NaN flnrm96 t3_fpql0y None t1_flmiy69 NaN 1.585325e+09 1.0 None survivor t5_2qhu3 NaN None None false None None None None false {} /r/survivor/comments/fpql0y/yul/flnrm96/ None NaN None true true None None [{'e': 'text', 't': 'J.T.'}] dark richtext None None [] None false false [] None ... 2.0 NaN NaN NaN NaN NaN NaN NaN 0.125 0.5 0.0 0.0 3.0 4.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 NaN NaN -2.0 NaN 695.0 Winners at War 695 We're in the Majors is the seventh episode of ... Getting voted out before the merge-- that's so... Challenge: (No Title)Two members of each tribe... * "Boa Constrictor at Yara" (Day 17): At Yara... https://vignette.wikia.nocookie.net/survivor/i... 2020-03-25 818000000.0 https://survivor.fandom.com/wiki/We%27re_in_th... 7.0 590.0 8.0 1.7 We%27re in the Majors 2020-07-11 01:03:00.566347+00:00 2020-07-19 00:48:27.943994+00:00
3 4539655 ekwag NaN 40Gold WW Nick t2_fssfb Yul NaN 1585328171 None NaN flnyaj9 t3_fpql0y None t3_fpql0y NaN 1.585329e+09 1.0 None survivor t5_2qhu3 NaN None None false None None None None false {} /r/survivor/comments/fpql0y/yul/flnyaj9/ None NaN None true true None None [{'e': 'text', 't': 'Nick'}] dark richtext None None [] None false false [] None ... 2.0 NaN NaN NaN NaN NaN NaN NaN 0.125 0.5 0.0 0.0 3.0 4.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 NaN NaN -2.0 NaN 695.0 Winners at War 695 We're in the Majors is the seventh episode of ... Getting voted out before the merge-- that's so... Challenge: (No Title)Two members of each tribe... * "Boa Constrictor at Yara" (Day 17): At Yara... https://vignette.wikia.nocookie.net/survivor/i... 2020-03-25 818000000.0 https://survivor.fandom.com/wiki/We%27re_in_th... 7.0 590.0 8.0 1.7 We%27re in the Majors 2020-07-11 01:03:00.566347+00:00 2020-07-19 00:48:27.943994+00:00
4 4539815 Lunarmise NaN None None t2_1r7mqcjo Yul NaN 1585329676 None NaN flo13ml t3_fpql0y None t1_flo0u7q NaN 1.585330e+09 1.0 None survivor t5_2qhu3 NaN None None false None None None None false {} /r/survivor/comments/fpql0y/yul/flo13ml/ None NaN None true true None None [] None text None None [] None false false [] None ... 2.0 NaN NaN NaN NaN NaN NaN NaN 0.125 0.5 0.0 0.0 3.0 4.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 NaN NaN -2.0 NaN 695.0 Winners at War 695 We're in the Majors is the seventh episode of ... Getting voted out before the merge-- that's so... Challenge: (No Title)Two members of each tribe... * "Boa Constrictor at Yara" (Day 17): At Yara... https://vignette.wikia.nocookie.net/survivor/i... 2020-03-25 818000000.0 https://survivor.fandom.com/wiki/We%27re_in_th... 7.0 590.0 8.0 1.7 We%27re in the Majors 2020-07-11 01:03:00.566347+00:00 2020-07-19 00:48:27.943994+00:00

5 rows × 126 columns

In [15]:
reddit_df.shape
Out[15]:
(1149008, 126)

There is a wealth of data here -- the actual content of the message, other Reddit information (like the user, upvotes, flairs, etc.) For this part of the analysis, we will just be looking at the occurances of the names in the body of the comment. In the next installment, we will look a bit deeper at some of the text inside the body and how that relates to the contestants. Additionally, we will take a look at some of the users and other information in later installments.

One thing to note is what the above query did the heavy lifting for -- finding the first and last names in the bodies of texts. So this dataframe (at a short 1.1 M rows) represents only the comments that had either one of these. Comments can appear multiple times if they contain multiple names.

Comments Per Season¶

The first -- and most obvious -- question we can answer is -- how many comments are there each season? And, which seasons are represented by the subreddit?

In [16]:
from plotly.express import bar, line
In [17]:
def plot_season_comments(df):
    comments_per_season = df.groupby('season_name').size().reset_index()
    comments_per_season.rename(columns={'season_name': 'Season', 0: 'Number of Comments (with names)'}, inplace=True)
    return bar(data_frame=comments_per_season.sort_values(by='Number of Comments (with names)'), 
               x='Season', y='Number of Comments (with names)')
In [18]:
plot_season_comments(reddit_df)
Redemption IslandSouth PacificOne WorldPhilippinesBlood vs. WaterCaramoanCagayanSan Juan del SurWorlds ApartKaôh RōngGhost IslandMillennials vs. Gen XCambodiaDavid vs. GoliathEdge of ExtinctionHeroes vs. Healers vs. HustlersIsland of the IdolsGame ChangersWinners at War050k100k150k200k
SeasonNumber of Comments (with names)
plotly-logomark

We can see that the seasons have been those since 2011. We also see that certain seasons, in particular those that were more recent, have much more comments than other seasons. This makes sense intuitively, as there has been a good deal of increased use in Reddit over the years. Winners at War, the most recent season, has the most reddit comments, and also gained a lot of TV viewership as well, as it was an "all-star" type game.

To see this increase a bit more clearly, we can look at this over time based on the broadcast date of the episodes:

In [19]:
def plot_season_comments_time(df):
    comments_per_season = df.sort_values(by='firstbroadcast').groupby([df['firstbroadcast'].sort_values().dt.year, 'season_name']).size().reset_index()
    comments_per_season['Season, year'] = comments_per_season['season_name'] + ', ' + comments_per_season['firstbroadcast'].astype(str)
    comments_per_season.rename(columns={0: 'Number of Comments (with names)'}, inplace=True)
    return line(data_frame=comments_per_season, x='Season, year', y='Number of Comments (with names)', )
In [20]:
plot_season_comments_time(reddit_df)
Redemption Island, 2011South Pacific, 2011One World, 2012Philippines, 2012Blood vs. Water, 2013Caramoan, 2013Cagayan, 2014San Juan del Sur, 2014Cambodia, 2015Worlds Apart, 2015Kaôh Rōng, 2016Millennials vs. Gen X, 2016Game Changers, 2017Heroes vs. Healers vs. Hustlers, 2017David vs. Goliath, 2018Ghost Island, 2018Edge of Extinction, 2019Island of the Idols, 2019Winners at War, 2020050k100k150k200k
Season, yearNumber of Comments (with names)
plotly-logomark

Not too different from the above sorted chart, with a few exceptions of dips and peaks during certain years. Interestingly, two confounding factors exist here -- the increased popularity of Reddit over time, and the popularity of some seasons over others. Something we must keep in mind throughout this analysis!

Comments Per Contestant¶

Now, to jump into the meat of the reason for this query -- to take a look at the number of comments about particular contestants.

First we look at the contestants that had the highest absolute count of comments on Reddit. Since different seasons may have more (or less) comments based on factors not related to the popularity of the season itself, this will not necessarily give us an unbiased answer. However, it still will be interesting to consider this in both absolute and relative terms.

For the absolute chart, we look at the number of mentions of each contestant and plot a bar chart with the top 20 contestants. For the relative, we consider how many comments they got relative to the total number of mentions that season.

In [21]:
def plot_sorted_n_alltime(df, total=True, n=20, top=True):
    
    grper = ['contestant_season_id', 'first_name', 'last_name']
    
    measured = 'Number of Mentions' if total else 'Percent of Season Mentions'
    abs_rel = 'Absolute' if total else 'Relative'
    top_bot = 'Top' if top else 'Bottom'
    title = f'{top_bot} {n} by {abs_rel} Number of Mentions'
    
    totals = df.groupby('season_name').apply(lambda x: x.groupby(grper).size() / (x.shape[0] if not total else 1)).sort_values()
    totals_clipped = totals.tail(n) if top else totals.head(n)
    totals_clipped = totals_clipped.reset_index()
    
    if not total:
        totals_clipped[0] *= 100
    totals_clipped['full_name'] = totals_clipped['first_name'] + \
                                 ' ' + totals_clipped['last_name'] + \
                                 ', ' + totals_clipped['season_name'].astype(str)
    totals_clipped.rename(columns={0: measured, 'full_name': 'Contestant'}, inplace=True)
    return bar(data_frame=totals_clipped, x=measured, y='Contestant', title=title)
    
In [22]:
plot_sorted_n_alltime(reddit_df)
05k10k15k20k25kAdam Klein, Winners at WarBen Driebergen, Heroes vs. Healers vs. HustlersCirie Fields, Game ChangersSophie Clarke, Winners at WarSandra Diaz-Twine, Game ChangersChristian Hubicki, David vs. GoliathRob Mariano, Winners at WarKarishma Patel, Island of the IdolsYul Kwon, Winners at WarChrissy Hofbeck, Heroes vs. Healers vs. HustlersRick Devens, Edge of ExtinctionMichele Fitzgerald, Winners at WarDenise Stapley, Winners at WarKim Spradlin-Wolfe, Winners at WarNick Wilson, Winners at WarSarah Lacina, Winners at WarDan Spilo, Island of the IdolsJeremy Collins, Winners at WarBen Driebergen, Winners at WarTony Vlachos, Winners at War
Top 20 by Absolute Number of MentionsNumber of MentionsContestant
plotly-logomark
In [23]:
plot_sorted_n_alltime(reddit_df, False)
05101520253035Lisa Whelchel, PhilippinesJonathan Penner, PhilippinesTony Vlachos, Winners at WarChristian Hubicki, David vs. GoliathRick Devens, Edge of ExtinctionCoach Wade, South PacificDan Spilo, Island of the IdolsMike Holloway, Worlds ApartTai Trang, Kaôh RōngDan Foley, Worlds ApartMalcolm Freberg, PhilippinesSpencer Bledsoe, CagayanColton Cumbie, One WorldTyson Apostol, Blood vs. WaterKim Spradlin-Wolfe, One WorldKass McQuillen, CagayanOzzy Lusth, South PacificJohn Cochran, South PacificTony Vlachos, CagayanRob Mariano, Redemption Island
Top 20 by Relative Number of MentionsPercent of Season MentionsContestant
plotly-logomark

We see some interesting results. First, some of the big names are at the top of this list -- Tony is a very popular player, as well as a two time winner of the game. We see that the first chart mainly has members from Winners at War, which makes sense since this season has much more comments than the others. Even, Adam Klein makes this list, although he may have ben one of the least popular of the contestants on that season.

The relative chart shows a bit more of a holistic view -- the top 8 contestants are immediately recognizable to me as interesting contestants from past seasons. Rob Mariano or "Boston Rob" is one of the most popular players the show has ever had. As is the nerd figure, John Cochran.

We could look at this a few different ways of course -- if we looked at the bottom of this list, I'm sure we'd see a lot of people who we've never heard of who were voted out in the first episode. Out of curiosity let's check it out!

In [24]:
plot_sorted_n_alltime(reddit_df, False, top=False)
00.10.20.30.40.5Roxanne Morris, PhilippinesDana Lambert, PhilippinesR.C. Saint-Amour, PhilippinesNina Acosta, One WorldKourtney Moon, One WorldElyse Umemoto, South PacificAllie Pohevitz, CaramoanDavid Samson, CagayanFrancesca Hogi, CaramoanRupert Boneham, Blood vs. WaterRonnie Bardah, Island of the IdolsJennifer Lanzetti, Kaôh RōngRachel Ako, Millennials vs. Gen XNadiya Anderson, San Juan del SurJessica Peet, David vs. GoliathRachel Foulger, Blood vs. WaterPeih-Gee Law, CambodiaJ.T. Thomas, Game ChangersHope Driskill, CaramoanKatrina Radke, Heroes vs. Healers vs. Hustlers
Bottom 20 by Relative Number of MentionsPercent of Season MentionsContestant
plotly-logomark

Hm, the results here are somewhat interesting! One one end, there are some names who I have certainly never heard of. On the other, there are some that are quite popular -- J.T. for instance. In this case, I think it's a data error (a data error, unfortunately for JT, that I don't think is worth diving into in this analysis) that J.T. is probably not said much in the comments (maybe JT). Others have contestants thatwere popular overall, but probably unpopular or voted out quicky during their season (like Rupert).

Then, you have people who are notoriously unlucky, like Francesca Hogi, who lost both of the first episodes she was on in Survivor. Hate to say it, but she's exactly who you'd hope would be on the bottom of this list!

Season Breakdown¶

While there were definitely some interesting takeaways looking at his on the aggregate, the next step is to drill down into individual seasons -- when did people get the most comments? Were some people popular (and then voted out?)

The next plots look into this.

In [25]:
from plotly.express import bar
In [26]:
def create_count_from_episode_df(ep_df, unique_idxs):
    grper = ['season_id', 'contestant_season_id', 'first_name', 'last_name']
    reindexed = ep_df.groupby(grper).size().reindex(unique_idxs)
    reindexed.name = 'count'
    reindexed.fillna({'count': 0}, inplace=True)
    reindexed.drop(columns=['episode_name'], inplace=True)
    reindexed = reindexed.reset_index()
    return reindexed



def create_episode_counts(df):
    grper = ['season_id', 'contestant_season_id', 'first_name', 'last_name']
    
    idx = df[grper].drop_duplicates()
    ep_counts = df.groupby(['episode_id', 'episode_name']).apply(lambda x: create_count_from_episode_df(x, idx))
    ep_counts = ep_counts.reset_index()
    ep_counts['cumulative_player_counts'] = ep_counts.groupby('contestant_season_id')['count'].transform(lambda x: x.cumsum())
    ep_counts.sort_values(by=['episode_id', 'cumulative_player_counts'], inplace=True)
    ep_counts['total_counts'] = ep_counts.groupby('contestant_season_id')['count'].transform('sum')
    return ep_counts


def create_racetrack_by_episode(df, *args, **kwargs):
    ep_counts = create_episode_counts(df)
    fig = bar(y='first_name', x='cumulative_player_counts', 
               data_frame=ep_counts,  animation_frame='episode_name', 
               range_x=[0, ep_counts['cumulative_player_counts'].max() * 1.05],
               *args, **kwargs)
    fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 2000
    return fig

def create_racetrack_by_episode_for_season(df, season_id, *args, **kwargs):
    subset_df = df[df['season_id'] == season_id]
    season_name = subset_df['season_name'].iloc[0]
    set_kwargs = dict(title = f'Cumulative Comment Counts for Season: <b>{season_name}</b>',
                      labels=
                     {'first_name': 'First Name', 
                      'cumulative_player_counts': 'Number of Comments',
                      'episode_name': 'Episode Name'})
    set_kwargs.update(kwargs)
    return create_racetrack_by_episode(subset_df, *args, **set_kwargs)
    

These plots below are animated plots for each of the seasons considered in the reddit data. These are what I'm calling racecar plots, or really animated barplots where the categories (in this case the contestants) race against one another to get to the right side of the chart.

In this case, to make things interesting, I only consider when the contestant is still getting comments in reddit -- once you are voted out of the game, your total goes down to zero. An interesting thing you may notice is that this correlates very strongly with the person who was voted out last. This makes sense -- for most players, once they are voted out they are rarely, if ever, mentioned again on Reddit for the episodes after that (or rather, the time after the subsequent epsiodes). People have short memories.

Plotly Express makes this plot very easy to make, at the cost of some customization. The above mentioned aspect of the plot was actually completely unavoidable for this plot, without some big workarounds. Still, the animation capabilities of plotly express are worth mentioning. As you can see below, you can press on the play button to start the animation, and it will begin to play episode by episode until the season is "over" (the last episode of the season). After each episode, the remaining contestants are sorted. Keep an eye on the players, they can be a bit hard to track!

In [1]:
for season in reddit_df['season_id'].unique():
    fig = create_racetrack_by_episode_for_season(reddit_df, season, height=1000, width=1000)

---------------------------------------
NameErrorTraceback (most recent call last)
<ipython-input-1-9f991c8c17d1> in <module>
----> 1 for season in reddit_df['season_id'].unique():
      2     fig = create_racetrack_by_episode_for_season(reddit_df, season, height=1000, width=1000)

NameError: name 'reddit_df' is not defined

Episode by Episode Breakdown¶

While the last plots did show us how the cumumlative comment counts grew over time for each contestant, we'd like to be able to visualize the difference between episodes a bit more clearly. Additionally, it would be nice to have the bars stick around even after that player no longer has additional comments. For this, we will use our own custom plotly animation function below to generate similar plots for each of these seasons.

In [28]:
def plot_episode_by_episode_breakdown_by_season(df, season_id, *args, **kwargs):
    reduced = df[df['season_id'] == season_id]
    season_name = reduced['season_name'].iloc[0]
    
    set_kwargs = dict(title = f'Cumulative Comment Counts by Episode for Season <b>{season_name}</b>',
                      xaxis=dict(title='Number of Comments', autorange=False), 
                      yaxis=dict(title='First Name'))
    
    set_kwargs.update(kwargs)
    return plot_episode_by_episode_breakdown(reduced, *args, **set_kwargs)
In [29]:
def plot_episode_by_episode_breakdown(df, *args, **kwargs):
    ep_counts = create_episode_counts(df)
    ep_counts.sort_values(by='total_counts', inplace=True)
    episodes = ep_counts['episode_id'].unique()
    episodes.sort()

    empty = dict(type='bar', orientation='h', 
             y=ep_counts['first_name'].unique(),
             x=[None] * ep_counts['first_name'].nunique())


    traces = [empty.copy() for i in range(len(episodes))] 
    frames = []
    

    sliders_dict = {
        "active": 0,
        "yanchor": "top",
        "xanchor": "left",
        "currentvalue": {
            "font": {"size": 20},
            "prefix": "Episode: ",
            "visible": True,
            "xanchor": "right"
        },
        "transition": {"duration": 300, "easing": "cubic-in-out"},
        "pad": {"b": 10, "t": 50},
        "len": 0.9,
        "x": 0.1,
        "y": 0,
        "steps": []
    }

    
    for i, ep in enumerate(episodes):

        fr_dict = dict(type='bar', orientation='h')
        new_bool = ep_counts['episode_id'] == ep
        ep_name = ep_counts.loc[new_bool, 'episode_name'].iloc[0]
        traces[i]['name'] = ep_name

        fr_dict.update(dict(y = ep_counts.loc[new_bool, 'first_name'].reset_index(drop=True),
                            x = ep_counts.loc[new_bool, 'count'].reset_index(drop=True)))

        if i > 0:
            last_frame = deepcopy(frames[-1])
            last_frame['data'].append(fr_dict)
            last_frame['traces'] += [i]
        else:
            last_frame = dict(data=[fr_dict], traces=[0])

        frames.append(last_frame)
        
        slider_step = {"args": [
            [ep_name],
        {"frame": {"duration": 300, "redraw": False},
         "mode": "immediate",
         "transition": {"duration": 300}}
            ],
            "label": ep_name,
            "method": "animate"}
        sliders_dict["steps"].append(slider_step)

    layout = go.Layout(width=1000,
                       height=1000,
                       showlegend=True,
                       hovermode='closest')
    
    

    layout["sliders"] = [sliders_dict]
    layout["updatemenus"] = [
        {
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 500, "redraw": False},
                                "fromcurrent": True, "transition": {"duration": 300,
                                                                    "easing": "quadratic-in-out"}}],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False},
                                  "mode": "immediate",
                                  "transition": {"duration": 0}}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
        }
    ]


    try:
        kwargs['xaxis'].update(range=[0, ep_counts['total_counts'].max() * 1.05])
    except KeyError:
        kwargs['xaxis'] = dict(range=[0, ep_counts['total_counts'].max() * 1.05])
    layout.update(barmode='stack',
                  *args, **kwargs)
    fig = go.Figure(data=traces, frames=frames, layout=layout)
    return fig
In [30]:
for season in reddit_df['season_id'].unique():
    plot_episode_by_episode_breakdown_by_season(reddit_df, season).show()
05k10k15k20k25kEthanAmberNatalieDanniParvatiWendellTysonSandraAdamSophieRobYulMicheleDeniseKimNickSarahJeremyBenTony
The Penultimate Step of the WarFriendly FireThis Is ExtortionThe Full CircleWar Is Not PrettyThis Is Where the Battle BeginsWe%27re in the MajorsQuick on the DrawThe Buddy System on SteroidsIt%27s Like a Survivor EconomyGreatest of the GreatsEpisode: Greatest of the GreatsGreatest of the GreatsWe%27re in the MajorsThis Is ExtortionCumulative Comment Counts by Episode for Season Winners at WarNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10k12k14k16kRonnieMollyVinceChelseaJasonJackTomAaronElaineJamalLaurenElizabethTommyMissyKelleeDeanJanetNouraKarishmaDan
Reunion (Island of the Idols)Just Go for ItA Very Simple PlanBring On the BaconTwo for the Price of OneWe Made It to the Merge!I Was Born at Night, but Not Last NightSuck It Up ButtercupDon%27t Bite the Hand That Feeds YouPlan ZHonesty Would Be ChillYOLO, Let%27s Play!I Vote You Out and That%27s ItEpisode: I Vote You Out and That%27s ItI Vote You Out and That%27s ItI Was Born at Night, but Not Last NightReunion (Island of the Idols)Cumulative Comment Counts by Episode for Season Island of the IdolsNumber of CommentsFirst NamePlayPause
plotly-logomark
020406080100120140SaritaFrancescaStephanieKristinaNatalieJulieSteveAshleyMikeDavidAndreaRalphGrantPhillipRussellRob
Too Close for ComfortYou Mangled My NetsA Mystery PackageRice WarsThe Buddy SystemThis Game Respects Big MovesIt Don%27t Take a Smart OneTheir Red-Headed StepchildWe Hate Our TribeDon%27t You Work for Me%3FKeep Hope AliveYou Own My VoteYou%27re Looking at the New Leader of Your TribeEpisode: You%27re Looking at the New Leader of Your TribeYou%27re Looking at the New Leader of Your TribeRice WarsCumulative Comment Counts by Episode for Season Redemption IslandNumber of CommentsFirst NamePlayPause
plotly-logomark
050100150200250300ElyseStaceyChristineSemharMikaylaKeithWhitneyRickJimDawnAlbertSophieEdnaBrandonCoachOzzyJohn
Free AgentThen There Were FiveTicking Time BombCult LikeA Closer Look (South Pacific)Running the ShowCut ThroatDouble AgentTrojan HorseTaste the VictorySurvivalismReap What You SowHe Has DemonsI Need RedemptionEpisode: I Need RedemptionI Need RedemptionTrojan HorseCult LikeCumulative Comment Counts by Episode for Season South PacificNumber of CommentsFirst NamePlayPause
plotly-logomark
0100200300400500600NinaKourtneyMattMichaelMonicaJonasJayBillLeifTroyzanSabrinaChelseaKatChristinaAliciaTarzanColtonKim
It%27s Human NatureIt%27s Gonna Be ChaosNever Say DieI%27m No DummyGo Out with a BangJust Annihilate ThemThe Beauty in a MergeThanks for the SouvenirA Bunch of IdiotsBum-PuzzledOne World Is Out the WindowTotal DysfunctionTwo Tribes, One Camp, No RulesEpisode: Two Tribes, One Camp, No RulesTwo Tribes, One Camp, No RulesThanks for the SouvenirNever Say DieCumulative Comment Counts by Episode for Season One WorldNumber of CommentsFirst NamePlayPause
plotly-logomark
0100200300400500600700DanaRoxanneR.C.ZaneSarahAbi-MariaKatieAngieRussellArtisCarterPeteJeffDeniseMichaelLisaJonathanMalcolm
Gouge My Eyes OutShot into SmithereensHell Hath Frozen OverWhiners Are WienersLittle Miss PerfectDead Man WalkingNot the Only Actor on This IslandDown and DirtyGot My Swag BackCreate a Little ChaosThis Isn%27t a %27We%27 GameDon%27t Be Blinded by the HeadlightsSurvivor Smacked Me in the ChopsEpisode: Survivor Smacked Me in the ChopsSurvivor Smacked Me in the ChopsNot the Only Actor on This IslandGouge My Eyes OutCumulative Comment Counts by Episode for Season PhilippinesNumber of CommentsFirst NamePlayPause
plotly-logomark
0500100015002000AllieFrancescaHopeLauraJuliaMattMichaelCorinneShamarBrandonSherriAndreaEddieReynoldErikPhillipBrendaMalcolmDawnJohn
Don%27t Say Anything About My MomThe Beginning of the EndCome Over to the Dark SideZipping Over the Cuckoo%27s NestCut Off the Head of the SnakeBlindside TimeTubby LunchboxOperation Thunder DomePersona Non GrataKill or Be KilledThere%27s Gonna Be Hell to PayHoney BadgerShe Annoys Me GreatlyEpisode: She Annoys Me GreatlyShe Annoys Me GreatlyTubby LunchboxDon%27t Say Anything About My MomCumulative Comment Counts by Episode for Season CaramoanNumber of CommentsFirst NamePlayPause
plotly-logomark
0500100015002000RupertRachelMarissaCandiceJohnColtonBradKatCalebLauraArasTinaKatieGervaseVytasCieraHaydenMonicaTyson
Out on a Limb (episode)Rustle FeathersGloves Come OffBig Bad WolfMy Brother%27s KeeperSkin of My TeethSwoop In for the KillOne-Man Wrecking BallThe Dead Can Still TalkOne Armed Dude and Three MomsOpening Pandora%27s BoxRule in ChaosBlood Is Thicker than AnythingEpisode: Blood Is Thicker than AnythingBlood Is Thicker than AnythingOne-Man Wrecking BallGloves Come OffCumulative Comment Counts by Episode for Season Blood vs. WaterNumber of CommentsFirst NamePlayPause
plotly-logomark
010002000300040005000DavidBriceLindseyAlexisGarrettCliffJ'TiaJeremiahMorganSarahLJJefraTashaTrishWooSpencerKassTony
Straw That Broke the Camel%27s BackHavoc to WreakChaos Is My FriendSitting in My Spy ShackBag of TricksMad Treasure HuntHead of the SnakeWe Found Our ZombiesOdd One OutOur Time to ShineCops-R-Us (episode)Hot Girl with a GrudgeEpisode: Hot Girl with a GrudgeHot Girl with a GrudgeMad Treasure HuntCumulative Comment Counts by Episode for Season CagayanNumber of CommentsFirst NamePlayPause
plotly-logomark
050010001500200025003000350040004500NadiyaValKelleyDrewJulieDaleWesJohnAlecReedJoshJeremyJaclynMissyBaylorKeithNatalieJon
Let%27s Make a MoveStill Holdin%27 OnThis Is Where We Build TrustGettin%27 to Crunch TimeWrinkle in the PlanMillion Dollar DecisionMake Some Magic HappenBlood Is BloodWe%27re a Hot MessActions vs. AccusationsMethod to This MadnessSuck It Up and SurviveEpisode: Suck It Up and SurviveSuck It Up and SurviveBlood Is BloodGettin%27 to Crunch TimeCumulative Comment Counts by Episode for Season San Juan del SurNumber of CommentsFirst NamePlayPause
plotly-logomark
02000400060008000LindseyNinaVinceJoaquinKellyMaxSoHaliJoeSierraCarolynTylerJennWillShirinRodneyMikeDan
My Word Is My BondHolding On for Dear LifeSurvivor Russian RouletteBring the PopcornLivin%27 on the EdgeKeep It RealThe Line Will Be Drawn TonightWinner Winner, Chicken DinnerCrazy Is as Crazy DoesIt Will Be My RevengeIt%27s Survivor WarfareEpisode: It%27s Survivor WarfareIt%27s Survivor WarfareThe Line Will Be Drawn TonightSurvivor Russian RouletteCumulative Comment Counts by Episode for Season Worlds ApartNumber of CommentsFirst NamePlayPause
plotly-logomark
02000400060008000Peih-GeeAbi-MariaTerryShirinMonicaKellyWooJeffCieraKassKimmiAndrewKelleyTashaKeithStephenJeremyJoeSpencer
Villains Have More FunTiny Little Shanks to the HeartMy Wheels Are SpinningWitches Coven (episode)You Call, We%27ll HaulPlay to WinBunking with the DevilA Snake in the GrassWhat%27s the Beef%3FWe Got a RatSurvivor MacGyverEpisode: Survivor MacGyverSurvivor MacGyverBunking with the DevilMy Wheels Are SpinningCumulative Comment Counts by Episode for Season CambodiaNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10kJenniferDarnellLizAnnaCalebNealPeterNickAleciaJuliaDebbieMicheleCydneyScotJoeAubryKyleTai
With Me or Not with MeNow%27s the Time to Start SchemingIt%27s a %27Me%27 Game, Not a %27We%27 GameI%27m Not Here to Make Good FriendsIt%27s Psychological WarfareThe Jocks vs. the Pretty PeopleIt%27s Merge TimePlay or Go HomeThe Devils We KnowSigned, Sealed and DeliveredThe Circle of LifeKindergarten CampI%27m a Mental GiantEpisode: I%27m a Mental GiantI%27m a Mental GiantI%27m Not Here to Make Good FriendsCumulative Comment Counts by Episode for Season Kaôh RōngNumber of CommentsFirst NamePlayPause
plotly-logomark
010002000300040005000600070008000RachelPaulMariLucyCeCeFiggyJessicaMichelleChrisMichaelaSundayTaylorBretWillZekeHannahAdamKenJayDavid
Slayed the Survivor DragonAbout to Have a RumbleMillion Dollar GambleStill Throwin%27 PunchesI%27m the KingpinI Will Destroy YouThe Truth Works WellIdol Search PartyWho%27s the Sucker at the Table%3FYour Job Is ReconLove GogglesMay the Best Generation WinEpisode: May the Best Generation WinMay the Best Generation WinThe Truth Works WellAbout to Have a RumbleCumulative Comment Counts by Episode for Season Millennials vs. Gen XNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10kJ.T.CieraCalebTonyMalcolmTroyzanHaliAndreaOzzyDebbieSierraAubryMichaelaSarahBradTaiJeffZekeCirieSandra
Parting Is Such Sweet SorrowIt Is Not a High Without a LowReinventing How This Game Is PlayedA Line Drawn in ConcreteThere%27s a New Sheriff in TownWhat Happened on Exile, Stays on ExileVote Early, Vote OftenDirty DeedThe Tables Have TurnedSurvivor JackpotThe Stakes Have Been RaisedEpisode: The Stakes Have Been RaisedThe Stakes Have Been RaisedWhat Happened on Exile, Stays on ExileParting Is Such Sweet SorrowCumulative Comment Counts by Episode for Season Game ChangersNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10kKatrinaSimonePatrickRoarkAlanDesiJessicaAliAshleyJPColeMikeDevonJoeLaurenRyanBenChrissy
The Survivor DevilNot Going to Roll Over and DieBuy One, Get One FreeFear of the UnknownPlaying with the DevilGet to Gettin%27This Is Why You Play SurvivorThe Past Will Eat You AliveI Don%27t Like Having Snakes AroundMy Kisses Are Very PrivateI%27m a Wild BansheeI%27m Not Crazy, I%27m ConfidentEpisode: I%27m Not Crazy, I%27m ConfidentI%27m Not Crazy, I%27m ConfidentThis Is Why You Play SurvivorNot Going to Roll Over and DieCumulative Comment Counts by Episode for Season Heroes vs. Healers vs. HustlersNumber of CommentsFirst NamePlayPause
plotly-logomark
010002000300040005000600070008000StephanieMorganBrendanDesireeJacobJamesSebastianLibbyDomenickJennaBradleyAngelaDonathanLaurelChelseaChrisKellynMichaelWendell
Always Be MovingA Giant Game of Bumper CarsThe Finish Line Is in SightIt%27s Like the Perfect CrimeThe Sea Slug SluggerFear Keeps You SharpGotta Risk It for the BiscuitFate Is the HomieA Diamond in the RoughTrust Your GutOnly Time Will TellCan You Reverse the Curse%3FEpisode: Can You Reverse the Curse%3FCan You Reverse the Curse%3FFate Is the HomieIt%27s Like the Perfect CrimeCumulative Comment Counts by Episode for Season Ghost IslandNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10kJessicaPatBiNataliaJeremyLyrsaElizabethJohnCarlNatalieAlisonDanKaraAlecDavieMikeGabbyNickAngelinaChristian
Are You Feeling Lucky%3FSo Smart They%27re DumbTribal Lines Are BlurredBreadth-First SearchYou Get What You GiveThere%27s Gonna Be Tears ShedAren%27t Brochachos Just Adorable%3FJackets and EggsTime to Bring About the CharmpocalypseI Am Goliath StrongThe Chicken Has Flown the CoopAppearances Are DeceivingEpisode: Appearances Are DeceivingAppearances Are DeceivingThere%27s Gonna Be Tears ShedCumulative Comment Counts by Episode for Season David vs. GoliathNumber of CommentsFirst NamePlayPause
plotly-logomark
02k4k6k8k10k12kReemKeithEricChrisAubryJuliaGavinJulieJoeWendyAuroraVictoriaRonWardogDavidLaurenKelleyRick
Idol or BustAwkwardFasten Your SeatbeltsBlood of a BlindsideY%27all Making Me CrazyI%27m the Puppet MasterThere%27s Always a TwistIt%27s Like the Worst Cocktail Party EverI Need a Dance PartnerBetrayals Are Going to Get ExposedOne of Us Is Going to Win the WarIt Smells Like SuccessEpisode: It Smells Like SuccessIt Smells Like SuccessI%27m the Puppet MasterCumulative Comment Counts by Episode for Season Edge of ExtinctionNumber of CommentsFirst NamePlayPause
plotly-logomark

Conclusions¶

This first jump into the analysis definitely gave us some interesting results! Some high-level takeaways:

  • r/survivor has been around since 2011. Over the years, engagement (in terms of comments) has increased by quite a lot.

  • Using absolute comment counts doesn't seem to be the way to go when comparing across seasons -- instead using relative comment counts to the total seemed to work best.

  • The players who are the most popular according to Reddit comments pass the common sense check -- Tony and Rob Mariano top the list.

  • Comments (or lack thereof) can be used as a proxy for in game events. We've only just started to see this (without looking at the body of the text), but we can already see that, in most cases a users' reddit comments will drop off to zero or near-zero after they are eliminated from the game. This could be interesting to dive a bit deeper into (especially if we want to eventually build a model!)

  • Generally, as we can see above, players who last longer tend to dominate the number of comments. For instance, in only a few rare instances do contestants who get voted out earlier than those in the final 4 or 5 end up surpassing these contestants in reddit comments. This passes the sniff test -- we will remember contestants, even those we wouldn't really like normally, who last longer simply because they will be more "in the running" with less contestants.

This has been an interesting look at the rich dataset of Reddit comments on r/survivor. But we aren't done yet!

The next iteration of the Survivor analysis Series will use Sentiment analysis and other attributes of the body of the text to try to glean information about the contestants and reddit users' behavior. Stay tuned, you won't want to miss it!

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.
0 Comments

Articles in Survivor

  • « Survivor: Outwit, Outplay, Out...analyze? (Part 1) Collecting The Data
  • Survivor: Outwit, Outplay, Out...analyze? (Part 3) Sentiment Analysis Exploratory Analysis »

All Articles

  • « Survivor: Outwit, Outplay, Out...analyze? (Part 1) Collecting The Data
  • Survivor: Outwit, Outplay, Out...analyze? (Part 3) Sentiment Analysis Exploratory Analysis »

Published

Nov 5, 2021

Category

Survivor

Tags

  • data analysis 3
  • project 4
  • reddit 3
  • survivor 4

Contact

  • Stats Works - Making Statistics Work for You