Analyzing the Game of Survivor -- Collecting The Data (1)¶

The Data of Survivor¶

Recently, I have been interested in the CBS reality/game show Survivor. Part of this has been because of the extra time I have had due to the recent circumstances surrounding COVID 19, and some of this is because it is a show that greatly resembles other games that I enjoy playing (social deduction games, etc.) If you have never watched it, I would highly suggest it! I'm not a fan of reality TV, but this show is a great combination of all of the elements of reality TV and a game show that make it enthralling.

After watching a few seasons, being a data scientist, I began to be interested in some of the results of the show. It seems that certain personalities are more or less likely to win, or to be voted out at any time in the game. Additionally, there seems to be a bit of a juggling act between the importance of challenges, strategy and social games.

This is the beginning of a series of posts relating to Survivor, data that can be gathered from it, analysis we can make with some of the sources, and perhaps ending in a few machine learning or statistical learning applications (to try to estimate the season winner, or who will be voted out on which days, etc!)

A quick primer on Survivor¶

To give a brief overview of some of the terms and information I will be using throughout this post, I have written a brief summary here. From wikipedia:

[Survivor] features a group of contestants deliberately marooned in an isolated location, where they must provide > food, water, fire, and shelter for themselves. The contestants compete in challenges for rewards and immunity from elimination. The contestants are progressively eliminated from the game as they are voted out by their fellow-contestants until only one remains to be awarded the grand prize and named the "Sole Survivor".

In Survivor, there are a few things of note. First:

Survivors must survive in the elements with one another as a tribe. Much of the game focuses on them being able to get basic necessities. However, it is hardly a game about survival in the wilderness, despite the name. The majority of the game is a social and challenge-based game.
Every episode, there is one elimination from the game. Contestants vote players off by majority votes.
In the beginning of the season, there are two or more tribes. These groups are separated from one another, and perform in challenges collectively.
Reward challenges provide a chance for a contestant (called castaways by the show) to win luxury prizes, like food, survival gear, or other items of interest. As the game progresses, the rewards usually get more and more enticing.
Immunity challenges provide a strategic advantage in the game. When there are two (or more) tribes, winning an immunity challenge allows the whole tribe to avoid elimination -- someone from the other tribe must leave. When there is one tribe, immunity challenges are individual, which prevents that person from being voted out.
The game changes substantially when the two (or more) tribes are merged into one, usually around halfway through the game. This is known as the merge and from this point on, it is an individual game.
Players tend to work in groups (or alliances) to achieve goals in the game. They can act as a voting block.
The game of Survivor is edited after the whole season is filmed -- meaning the people editing often know who the winner will be. This is important to keep in mind.
Additionally, each episode there are some number of confessionals -- which are segments where contestants speak to the camera away from the other contestants. They may complain, for instance, about a particular contestant, or discuss other parts of the game.

While there is more to the show than just this (and it has evolved quite a bit over time), for our purposes it is enough to understand these basic ideas of the game before jumping in. If you're curious, Season 7 (Pearl Islands) is one of my favorite seasons and is a great place to start!

Detail of: The ETL Process¶

The first step in this process, of course, is getting the actual data to use. To this end, I have actually gone quite far in collecting and creating a workflow to update data as the seasons start. This behind-the-scenes work is a bit more data engineering than data science, but is probably the most important part of any of the analysis I will be doing!

ETL stands for Extract Transform Load which are the three steps to building a data pipeline. This general structure can generally describe any process using data. Most of the time, we spend a lot of time as data scientists considering the Transform part of this equation.

You may not realize it, but any data analysis work can usually be abstracted to fit into these broad three categories. In many of the toy examples, both on this site and elsewhere, the extract portion is limited to loading from an Excel or CSV file, and fitting the model. However, in real life (and in any job as a data scientist!) the method by which we gain access to the data, and which form it is in, is of crucial importance.

Don't think this is part of Data Science -- think again! You may have heard the line that more than 80% of data scientists' time is spent cleaning the data. While the reliability of this result is somewhat questionable (check out this survey for more recent results from 2018) it is without a doubt one of the most time consuming and important parts of a data scientists' toolkit.

Without good, clean, and easily accessible data, all of the analytical muscle and machine learning techniques you may use will be to no avail. As the old adage goes -- garbage in, garbage out.

Without further ado, I want to detail some of the goals of this project:

Find reliable data sources related to as many elements of the game as survivor as I can
Create reproducible code to pull this information moving forward (...Extract..)
Connect different data sources together, generate metrics based on them, and clean the data from these sources (...Transform...)
Load this information into a relational database
Update this information on a daily basis (as new seasons come out, etc.)

To this end, I have written a series of scripts to generalize this process. Then, I use Apache Airflow and a postgres database to periodically run the code and store it in a database. This is all stored on my homeserver at home. The Airflow tasks are run asynchonically on 5 Raspberry Pis. In the future, I may detail this infrastructure work I have done (which is outside the realm of data science, but may be of interest none the less.)

This whole process took me roughly 1 month. It was well worth it, thought -- the data I have collected is rich and from many different sources, including

True Dorks data on statistics on challenges (Excel spreadsheets)
This collection of confessional data from various seasons, with every confessional listed for each episode (Word documents)
The Survivor Wiki for information about episodes, seasons, contestants, tribes and alliances. (HTML, webpages)
Pushshift.io data for the reddit r/survivor subreddit, where people discuss all things related to survivor (API)
Caunce Character Types for each of the different contestants on the show (Excel)

As you can imagine, combining all of these data sources to tell a cohesive story was quite a challenge, but a worthwhile one all the same!

The remainder of this post will be discussing the data that I had collected, how it has been saved, and what I plan to do with it!

Querying the Database¶

I have stored all of the results in a Postgres database, hosted on my homeserver. There is information here that should be kept a secret, like my password, of course, so I will access it using environment varaibles. This was actually saved using the conda environment I have specified for this project, survivor_scraping.

In [6]:

import os
from sqlalchemy import create_engine
import pandas as pd

In [2]:

pg_un, pg_pw, pg_ip, pg_port = [os.getenv(x) for x in ['PG_UN', 'PG_PW', 'PG_IP', 'PG_PORT']]

In [3]:

def pg_uri(un, pw, ip, port):
    return f'postgresql://{un}:{pw}@{ip}:{port}'

In [4]:

eng = create_engine(pg_uri(pg_un, pg_pw, pg_ip, pg_port))

I will now use this engine, as well as the handy read_sql function of pandas, to read in some of the tables and show some of the interesting data we have here!

True Dorks Data¶

For the True Dorks data, we have a few different tables that were extracted from here. The first is the episode_performance_stats table, which details the results from the challenges and other events over the course of the entire episode.

In [12]:

pd.options.display.max_columns = 100

In [13]:

pd.read_sql('SELECT * FROM survivor.episode_performance_stats LIMIT 100', con=eng)

Out[13]:

	index	challenge_wins	challenge_appearances	sitout	voted_for_bootee	votes_against_player	total_number_of_votes_in_episode	tribal_council_appearances	votes_at_council	number_of_jury_votes	total_number_of_jury_votes	number_of_days_spent_in_episode	days_in_exile	individual_reward_challenge_appearances	individual_reward_challenge_wins	individual_immunity_challenge_appearances	individual_immunity_challenge_wins	tribal_reward_challenge_appearances	tribal_reward_challenge_wins	tribal_immunity_challenge_appearances	tribal_immunity_challenge_wins	season_id	tribal_reward_challenge_second_of_three_place	tribal_immunity_challenge_second_of_three_place	fire_immunity_challenge	tribal_immunity_challenge_third_place	contestant_id	episode_id	created	updated
0	5803	0.111111	0.211111	0.0	1	1.0	10.0	1.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	1.0	40.0	None	None	1.0	None	740	689.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
1	5804	0.000000	0.111111	0.0	1	0.0	9.0	1.0	1.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	40.0	None	None	0.0	None	740	690.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
2	5805	0.000000	0.125000	0.0	0	3.0	8.0	1.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	40.0	None	None	0.0	None	740	691.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
3	5806	0.142857	0.142857	0.0	0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	1.0	40.0	None	None	0.0	None	740	692.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
4	5807	0.000000	0.200000	0.0	1	0.0	5.0	1.0	1.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	40.0	None	None	0.0	None	740	693.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	5898	0.000000	1.000000	0.0	0	0.0	0.0	0.0	0.0	0.0	16.0	1.0	1.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	40.0	None	None	0.0	None	750	702.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
96	5899	0.100000	0.211111	0.0	1	3.0	10.0	1.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	1.0	40.0	None	None	1.0	None	751	689.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
97	5900	0.111111	0.111111	0.0	0	0.0	0.0	0.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	1.0	40.0	None	None	0.0	None	751	690.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
98	5901	0.125000	0.125000	0.0	0	0.0	0.0	0.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	1.0	40.0	None	None	0.0	None	751	691.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00
99	5902	0.000000	0.142857	0.0	1	1.0	9.0	1.0	0.0	0.0	0.0	2.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	1.0	0.0	40.0	None	None	0.0	None	751	692.0	2020-07-19 00:58:07.828136+00:00	2020-07-19 00:58:07.828136+00:00

100 rows × 30 columns

Here, you can see soem interesting information. Notice that these are based on contestant_ids, episode_ids and season_ids. These are foreign keys for the other three tables, contestant, episode and season. This information was pulled from the wiki.

In order to map these to the correct values, there was a semi-automatic process to find the correct contestant, episode and season based on the names. This is why the Transform portion of the ETL process was necessary -- to make sure all of these tables were consistent. While a lot of work, it has made the data much richer than any of these individual sources alone. From here out, you can assume id columns required matching with other data sources to ensure correctness and to conform with relational database paradigms.

Additionally, for each table you can see a created and updated section. This is present in all of the tables. These were updated most recently on July 19th. If the automated process (using Airflow and the Raspberry Pis) updates these tables, we will see the timestamp at which this was done in these tables. It uses an upsert method, so it should not overwrite the currently used values.

You can see this table details information at an episode level. For instance, if there was a reward challenge and an immunity challenge, this will have the results from both challenges. Some episodes will have multiple of each (especially toward the end) so this is important to take note of. Challenge wins are fractional if they wored as part of a tribe -- for instance, if I am in a tribe of 4 and I win a challenge, I will get .25 challenge wins. Same goes for the challenge appearances.

The remainder of the results here should be relatively self explanatory. More information is listed on the True Dorks site. Thanks, True Dorks!

Additional tables form this source and the immunity_challenge, reward_challenge and vote tables.

In [19]:

pd.read_sql('SELECT * FROM survivor.immunity_challenge LIMIT 100', con=eng)

Out[19]:

	index	team	win_pct	total_players_remaining	episode_win_pct	sitout	tc_number	season_id	contestant_id	episode_id	win	created	updated
0	2452	10.0	0.100000	20.0	0.100000	NaN	1.0	0	278	633	1.0	2020-07-18 02:11:20.917866+00:00	2020-07-18 02:11:20.917866+00:00
1	2453	10.0	0.000000	20.0	0.333333	NaN	1.0	0	515	633	0.0	2020-07-18 02:11:20.917866+00:00	2020-07-18 02:11:20.917866+00:00
2	2454	10.0	0.000000	20.0	0.000000	NaN	1.0	0	283	633	0.0	2020-07-18 02:11:20.917866+00:00	2020-07-18 02:11:20.917866+00:00
3	2455	10.0	0.000000	20.0	0.000000	NaN	1.0	0	163	633	0.0	2020-07-18 02:11:20.917866+00:00	2020-07-18 02:11:20.917866+00:00
4	2456	10.0	0.000000	20.0	0.000000	NaN	1.0	0	362	633	0.0	2020-07-18 02:11:20.917866+00:00	2020-07-18 02:11:20.917866+00:00
...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	108	12.0	0.000000	24.0	NaN	NaN	1.0	36	674	136	0.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
96	109	12.0	0.000000	24.0	NaN	NaN	1.0	36	673	136	0.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
97	110	12.0	0.000000	24.0	NaN	NaN	1.0	36	672	136	0.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
98	111	12.0	0.000000	24.0	NaN	NaN	1.0	36	671	136	0.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
99	112	12.0	0.083333	24.0	NaN	NaN	1.0	36	664	136	1.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

100 rows × 13 columns

In [21]:

pd.read_sql('SELECT * FROM survivor.reward_challenge LIMIT 100', con=eng)

Out[21]:

	index	total_players_remaining	challenge_number	tc_number	season_id	sitout	contestant_id	episode_id	win	win_pct	team	episode_win_pct	created	updated
0	57	24.0	1.0	1.0	36	None	676	136	1.0	0.083333	12.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	58	24.0	1.0	1.0	36	None	653	136	0.0	0.000000	12.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	59	24.0	1.0	1.0	36	None	654	136	0.0	0.000000	12.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	60	24.0	1.0	1.0	36	None	655	136	0.0	0.000000	12.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	61	24.0	1.0	1.0	36	None	656	136	0.0	0.000000	12.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	152	20.0	1.0	5.0	36	None	653	144	0.0	0.000000	9.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
96	153	20.0	1.0	5.0	36	None	664	144	0.0	0.000000	9.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
97	154	20.0	1.0	5.0	36	None	667	144	1.0	0.111111	9.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
98	155	20.0	1.0	5.0	36	None	668	144	1.0	0.111111	9.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
99	156	20.0	1.0	5.0	36	None	669	144	1.0	0.111111	9.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

100 rows × 14 columns

The two tables above are pretty self explanatory -- they explain each contestants result in an episode for each particular challenge in that episode.

In [22]:

pd.read_sql('SELECT * FROM survivor.vote LIMIT 100', con=eng)

Out[22]:

	index	total_players_remaining	tc_number	season_id	contestant_id	voted_for_id	episode_id	vote_number	created	updated
0	58	18	3	5	212	571.0	490	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	59	18	3	5	6	177.0	490	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	72	16	5	5	575	6.0	491	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	73	16	5	5	541	6.0	491	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	74	16	5	5	43	6.0	491	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
...	...	...	...	...	...	...	...	...	...	...
95	17	20	1	11	628	236.0	562	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
96	18	20	1	11	616	31.0	562	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
97	19	20	1	11	445	31.0	562	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
98	29	19	2	11	285	445.0	563	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
99	30	19	2	11	236	52.0	563	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

100 rows × 10 columns

The above table, also from TrueDorks, lists each vote in ech episode. Note that the contestant and voted_for_id both refer to values in the contestant_season table.

Confessionals Data¶

One of the more interesting data sources here, the confessionals data has the actual words that were said in confessionals by contestants in a select number of seasons. This information could be a very interesting analysis for some NLP work, which I plan to explore in future articles. This was sourced from the google drive here, and was scraped based on the names in the word documents.

In some cases, as you'll notice below, there is not anything mentioned in the content field. In these cases, it was recorded that that particular contestant_id had spoken, but not what they had said. While this is a bit disappointing, the counts per episode or per contestant can surely be an interesting analysis.

In [29]:

pd.read_sql('SELECT * FROM survivor.confessional LIMIT 1', con=eng)

Out[29]:

	index	content	day	n_from_player	n_in_episode	total_confessionals_in_episode	contestant_id	episode_id	season_id	created	updated
0	5337	None	7	4	16	5	38.0	262	25	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
1	5338	None	7	2	17	4	592.0	262	25	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
2	5339	None	7	2	18	4	342.0	262	25	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
3	6310	Denise was like, “Hey, guys, what are you thin...	34	5	33	6	756.0	701	40	2020-07-11 00:50:29.685756+00:00	2020-07-11 00:50:29.685756+00:00
4	6556	Last night at Tribal Council, I just said to J...	30	1	4	3	744.0	701	40	2020-07-11 00:50:29.685756+00:00	2020-07-11 00:50:29.685756+00:00
...	...	...	...	...	...	...	...	...	...	...	...
95	4663	Yeah, this whole thing is just a game. Scout's...	22	2	11	6	442.0	238	26	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
96	4664	I've been fed up with Eliza since Day 2. I'm t...	22	1	12	2	120.0	238	26	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
97	5419	None	26	3	15	4	53.0	269	25	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
98	4710	Twila is just insecure. She’s scared, and you ...	21	5	29	6	152.0	237	26	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00
99	5420	None	26	3	16	4	342.0	269	25	2020-07-11 00:44:59.947552+00:00	2020-07-11 00:45:58.492724+00:00

100 rows × 11 columns

Survivor Wikia¶

The information scraped from the Survivor wiki is the following:

Episodes
Seasons
Contestants
Tribes
Alliances
Final Words
Story Quotes
Voting Confessionals

Hopefully, this rich data collected by the people at that wiki will be able to be used to good use! In the process, I was also able to identify small edits which I pushed up to the wikia. Hopefully, I'll be able to find even more!

Seasons, Episodes and Contestants are the building blocks of what makes Survivor survivor. Results pulled from the wiki tended to be of the "more is better" mentality, so data from summary and story sections were included as well.

In [41]:

pd.read_sql('SELECT * FROM survivor.episode LIMIT 10', con=eng)

Out[41]:

	index	summary	story	challenges	trivia	image	firstbroadcast	viewership	wiki_link	season_episode_number	overall_episode_number	overall_slot_rating	survivor_rating	season_id	episode_id	episode_name	created	updated
0	488	She Annoys Me Greatly is the season premiere o...	Across the islands of Caramoan in the eastern ...	Challenge: Water SlaughterTwo members from eac...	* A clip of Jeff Probst filming the "39 days,...	https://vignette.wikia.nocookie.net/survivor/i...	2013-02-13	894000000.0	https://survivor.fandom.com/wiki/She_Annoys_Me...	1.0	383.0	7.0	2.4	5.0	488.0	She Annoys Me Greatly	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	489	Honey Badger is the second episode of Survivor...	Brandon is upset that Francesca got voted out ...	Reward/Immunity Challenge: Plunge, Pull, PopFo...	* This episode marks the first time that Malc...	https://vignette.wikia.nocookie.net/survivor/i...	2013-02-20	932000000.0	https://survivor.fandom.com/wiki/Honey_Badger	2.0	384.0	7.0	2.4	5.0	489.0	Honey Badger	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	502				* \n\n\tExplore Wikis\n* \n\n\tCommunity Centr...	https://vignette.wikia.nocookie.net/survivor/i...	2013-05-12	813000000.0	https://survivor.fandom.com/wiki/Reunion_(Cara...	15.0	397.0	6.0	2.2	5.0	502.0	Reunion (Caramoan)	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	27	None	None	None	* \n\n\tExplore Wikis\n* \n\n\tCommunity Centr...	https://vignette.wikia.nocookie.net/survivor/i...	2000-08-23	NaN	https://survivor.fandom.com/wiki/Survivor:_The...	14.0	14.0	NaN	NaN	6.0	27.0	Question of Trust	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	490	There's Gonna Be Hell to Pay is the third epis...		Reward/Immunity Challenge: Cell Block SeaTribe...	* This is the second episode where there is a...	https://vignette.wikia.nocookie.net/survivor/i...	2013-02-27	917000000.0	https://survivor.fandom.com/wiki/There%27s_Gon...	3.0	385.0	8.0	2.6	5.0	490.0	There%27s Gonna Be Hell to Pay	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	491	Kill or Be Killed is the fourth episode of Sur...		Challenge: Head and ShouldersTwo members of ea...	* This is the ninth episode of Survivor to fe...	https://vignette.wikia.nocookie.net/survivor/i...	2013-03-06	958000000.0	https://survivor.fandom.com/wiki/Kill_or_Be_Ki...	4.0	386.0	7.0	2.6	5.0	491.0	Kill or Be Killed	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	492	Persona Non Grata is the fifth episode of Surv...	Tribal Council was held at the location of the...	Challenge: Nut BucketTwo tribe members would e...	* In an interview with Dalton Ross for Entert...	https://vignette.wikia.nocookie.net/survivor/i...	2013-03-13	989000000.0	https://survivor.fandom.com/wiki/Persona_Non_G...	5.0	387.0	8.0	2.7	5.0	492.0	Persona Non Grata	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	500	Don't Say Anything About My Mom is the penulti...	Day 35's Tree Mail was a Sprint HTC Evo 4G LTE...	Challenge: Dizzy GillespieEach castaway must t...	* This episode marks the first time in Surviv...	https://vignette.wikia.nocookie.net/survivor/i...	2013-05-08	NaN	https://survivor.fandom.com/wiki/Don%27t_Say_A...	13.0	395.0	NaN	NaN	5.0	500.0	Don%27t Say Anything About My Mom	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	493	Operation Thunder Dome is the sixth episode of...	On Day 14, when the tribes arrived at what loo...	Immunity Challenge: Crate OutdoorsEach tribe s...		https://vignette.wikia.nocookie.net/survivor/i...	2013-03-20	979000000.0	https://survivor.fandom.com/wiki/Operation_Thu...	6.0	388.0	8.0	2.6	5.0	493.0	Operation Thunder Dome	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	494	Tubby Lunchbox is the seventh episode of Survi...	After Tribal Council, Phillip and Corinne stil...	Challenge: Hot PursuitThe two tribes would rac...	* The Reward Challenge was used in "Dangerous...	https://vignette.wikia.nocookie.net/survivor/i...	2013-03-27	943000000.0	https://survivor.fandom.com/wiki/Tubby_Lunchbox	7.0	389.0	7.0	2.5	5.0	494.0	Tubby Lunchbox	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

In [42]:

pd.read_sql('SELECT * FROM survivor.season LIMIT 10', con=eng)

Out[42]:

	index	season_id	days	n_episodes	history	location	season_number	summary	n_survivors	trivia	twists	version	viewership	name	type	runnerup_0_id	runnerup_1_id	winner_id	filming_started	filming_ended	showing_started	showing_ended	viewership_in_millions	created	updated
0	0	0	39.0	14.0	The season was filmed in the summer of 2017, o...	Mamanuca Islands, Fiji	36.0	The jury vote ended in a tie between two playe...	20	* This is the first season since Survivor: Sa...	* Ghost Island:[4] The tribe that wins a Rewa...	United States	NaN	Ghost Island	Survivor	0	296.0	278	2017-06-05	2017-07-13	2018-02-28	2018-05-23	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	1	39.0	15.0	The season was filmed once again in the vicini...	Upolu, Samoa	24.0	Relations between the men's tribe and the wome...	18	* The designs for this season's props, such a...	* One World Format: The two competing tribes,...	United States	1.163600e+09	One World	Survivor	574	406.0	546	2011-08-01	2011-09-08	2012-02-15	2012-05-13	1.163600e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	2	39.0	14.0	The season was filmed shortly after filming Su...	Palaui Island, Santa Ana, Cagayan, Philippines	28.0	Cagayan is famed among Survivor seasons for it...	18	* The font used for Cagayan's logo is a modif...	* Three Tribes: The castaways are divided int...	United States	NaN	Cagayan	Survivor	284	NaN	601	2013-07-10	2013-08-17	2014-02-26	2014-05-21	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	3	3	39.0	15.0	^1 Matthew and Rob did not vote as they could ...	Rio Negro, Amazonas, Brazil	6.0	The winner is Jenna Morasca, a 21-year-old swi...	16	* For the first time, the contestants were di...	* Tribe Composition: The sixteen castaways we...	United States	1.997000e+09	The Amazon	Survivor	603	NaN	415	2002-11-04	2002-12-12	2003-02-13	2003-05-11	1.997000e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	4	4	39.0	14.0	Filming for the season was scheduled to start ...	Mamanuca Islands, Fiji	33.0	The season used a variant of the tribe divisio...	20	* The font used for this season is a variant ...	* Millennials vs. Gen X: The 20 castaways will...	United States	NaN	Millennials vs. Gen X	Survivor	516	400.0	331	2016-04-04	2016-05-12	2016-09-21	2016-12-14	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	5	5	39.0	15.0	This season repeats the back-to-back filming s...	Caramoan, Camarines Sur, Philippines	26.0	None	20	* The theme of the season is inspired from th...	* Fans vs. Favorites: Similar to past season ...	United States	1.081500e+09	Caramoan	Survivor	405	43.0	123	2012-05-21	2012-06-28	2013-02-13	2013-05-12	1.081500e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	6	6	39.0	14.0	Due to overwhelming interest in the show's for...	Pulau Tiga, Sabah, Borneo, Malaysia	1.0	Among Survivor fans, the term "Pagonging" has ...	16	* This is the first season in which both trib...	* Tribe Composition: The sixteen castaways we...	United States	2.830000e+09	Borneo	Survivor	627	NaN	257	2000-03-13	2000-04-20	2000-05-31	2000-08-23	2.830000e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	7	7	39.0	15.0	Casting for this season was originally done as...	Koror, Palau	16.0	Survivor: Micronesia, also known as Survivor: ...	20	* This is the first season to have an even-nu...	* Fans vs. Favorites: One tribe, Airai, is ...	United States	1.361000e+09	Micronesia	Survivor	325	NaN	513	2007-10-29	2007-12-06	2008-02-07	2008-05-11	1.361000e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	8	8	39.0	15.0	The theme and name of the season were announce...	San Juan del Sur, Rivas, Nicaragua	29.0	Survivor: San Juan del Sur, also known as Surv...	18	* Based on the logo and the names of the trib...	* Blood vs. Water: Nine pairs of castaways, ea...	United States	NaN	San Juan del Sur	Survivor	508	450.0	10	2014-06-01	2014-07-10	2014-09-24	2014-12-17	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	9	9	39.0	15.0	Applications were due June 16, 2006. Mellisa M...	Macuata, Vanua Levu, Fiji	14.0	Survivor: Fiji is the fourteenth season of Uni...	19	* Due to Mellisa McNulty dropping out only ho...	* Haves vs. Have Nots: The castaways (after th...	United States	1.480000e+09	Fiji	Survivor	215	265.0	126	2006-10-30	2006-12-07	2007-02-08	2007-05-13	1.480000e+09	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

Contestants were broken into two tables, to separate the idea of contestants in a particular season vs contestants on the whole. Some recurring characters act very differently season to season, so this information may be useful to separate.

In [43]:

pd.read_sql('SELECT * FROM survivor.contestant LIMIT 10', con=eng)

Out[43]:

	index	wiki_survivor_text	wiki_postsurvivor_text	trivia	birthdate	other_profile	hometown	current_residence	occupation_self_reported	hobbies	pet_peeves	three_words	claim_to_fame	inspiration	three_things	most_similar_self_reported	reason	why_survive	previous_season	first_name	last_name	nickname	twitter	sex	image	wikia	contestant_id	created	updated
0	0	A superfan of the show, AK started the game wi...	None	None	1987-08-03	Getting to that Final Tribal Council is AK's u...	Adelaide, South Australia	None	Wedding DJState: South AustraliaTribe: Samatau	None	None	None	None	None	None	None	None	None	None	AK	Knight	None	None	M	https://vignette.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/AK_Knight	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	2	At the start of the game, Aaron was placed on ...	None	None	1975-04-25	None	Venice, California	None	Surfing Instructor	None	None	None	None	None	None	None	None	None	None	Aaron	Reisberger	None	None	M	http://vignette2.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Aaron_Reisberger	3	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	3	Originally in the majority Sporty Seven allian...	None	None	1991-01-07	"I was so close last time, purely by being mys...	Melbourne, VictoriaPerth, Western Australia	None	AFL Premiership Winner	None	None	None	None	None	None	None	None	None	None	Abbey	Holmes	None	None	F	https://vignette.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Abbey_Holmes	4	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	4	Abi-Maria was part of the yellow Tandang tribe...	None	None	1979-10-21	Name: Abi-Maria Gomes\nSeason 25\nSurvivor: Ph...	Los Angeles, CA[3]	None	Business Student	Languages, hiking, dancing, surfing and skiing.	Complainers.	Driven, creative and charming.	None	Steve Jobs' words, "Stay Hungry. Stay foolish."	None	Parvati – she is as charming as I am.	The money!	I bring social skills and team work. I am a mo...	None	Abi-Maria	Gomes	None	None	F	http://vignette2.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Abi-Maria_Gomes	5	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	5	None	None	None	1980-09-03	None	Naples, Florida	None	Jewelry Sales & Photographer	None	None	None	None	None	None	None	None	None	None	Ace	Gordon	None	None	M	http://vignette2.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Ace_Gordon	6	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	6	Adam started out on the Rarotonga tribe with P...	None	None	1978-08-21	None	San Diego, California	None	Copier Sales	None	None	None	None	None	None	None	None	None	None	Adam	Gentry	None	None	M	http://vignette1.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Adam_Gentry	7	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	24	Amanda formed an early alliance with fellow Fe...	None	None	1984-08-03	Tribe: Heroes\nCurrent Residence: Los Angeles,...	Kalispell, Mont.	None	Hiking GuideAspiring Designer	None	None	None	None	"God."	None	None	None	None	None	Amanda	Kimmel	None	None	F	http://vignette3.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Amanda_Kimmel	25	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	8	None	None	None	None	Retrieved from TVNZ.co.nz\nName: Adam\nWith a ...	Auckland	None	Self Employed	None	None	None	None	None	None	None	None	None	None	Adam	O'Brien	None	None	M	https://vignette.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Adam_O'Brien	9	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	9	Alan started as a member of the Levu tribe whe...	None	None	1985-03-29	Current Residence: Houston, Texas\n	Detroit, Mich.	None	NFL Player	Golf, scuba diving, and riding my motorcycle.	I can't stand liars or people who feel entitle...	Intelligent, athletic, and clever.	Being a seventh round draft pick and grinding ...	My parents. They are my best friends. I respec...	Hot sauce because it makes everything better, ...	I honestly don't think you will see gameplay l...	Playing football has fed my drive to compete p...	Everything I've accomplished I've had to scrat...	None	Alan	Ball	None	None	M	https://vignette.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Alan_Ball	10	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	10	^1 In "Double Agent", the vote ended with a 6...	None	None	1985-03-31	Tribe designation: Upolu\nInspiration in life:...	Plantation, Florida	Plantation, Fla.	Baseball/Dating Coach	Poker, writing and exercise.	I have little to no patience for ignorance. Pe...	Versatile, dynamic and resourceful.	I hit my first college home run off of a guy t...	None	None	None	None	None	None	Albert	Destrade	None	None	M	http://vignette2.wikia.nocookie.net/survivor/i...	http://survivor.wikia.com/wiki/Albert_Destrade	11	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

In [44]:

pd.read_sql('SELECT * FROM survivor.contestant_season LIMIT 10', con=eng)

Out[44]:

	index	contestant_season_id	occupation	location	age	placement	days_lasted	votes_against	individual_wins	season_id	attempt_number	tribe_0	tribe_1	tribe_2	tribe_3	alliance_0	alliance_1	alliance_2	contestant_id	character_id	created	updated
0	71	71	Aspiring Writer	Los Angeles, CA	33	17.0	11.0	6.0	0.0	7	1.0	3.0	NaN	NaN	NaN	NaN	NaN	None	474	9.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	73	73	Social media marketer	Cambridge, Massachusetts	29	2.0	39.0	6.0	0.0	18	1.0	39.0	45.0	96.0	NaN	46.0	NaN	None	49	23.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	74	74	Social media marketer	Cambridge, Massachusetts	30	5.0	1.0	0.0	0.0	30	2.0	87.0	84.0	87.0	91.0	32.0	NaN	None	49	23.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	75	75	Pharmaceutical Sales	Saint Louis, MO	26	7.0	30.0	5.0	0.0	32	1.0	35.0	158.0	NaN	NaN	28.0	57.0	None	660	14.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	76	76	Vacation Club Sales	Fort Worth, Texas	18	14.0	20.0	4.0	0.0	4	1.0	6.0	144.0	NaN	NaN	78.0	NaN	None	458	14.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	77	77	Vacation Club Sales	Fort Worth, Texas	26	7.0	1.0	0.0	0.0	30	2.0	87.0	84.0	87.0	91.0	32.0	NaN	None	458	21.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	78	78	Dog trainer	Jackson Springs, NC	56	6.0	36.0	11.0	3.0	23	1.0	27.0	4.0	86.0	NaN	72.0	52.0	None	277	12.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	79	79	Administrative Assistant	Beaver, PA	22	6.0	33.0	6.0	0.0	28	1.0	101.0	161.0	NaN	NaN	70.0	NaN	None	27	16.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	80	80	Administrative Assistant	Beaver, PA	25	1.0	39.0	6.0	0.0	13	2.0	110.0	24.0	NaN	NaN	40.0	NaN	None	27	16.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	81	81	Maintenance Supervisor	Portland, OR	25	14.0	15.0	6.0	1.0	19	1.0	108.0	NaN	NaN	NaN	65.0	NaN	None	222	11.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

Something else that may be interesting to look at is the tribes and the alliances. Who is alligned with whom? How strong are they? This could be interesting to look at historically and for future seasons

In [45]:

pd.read_sql('SELECT * FROM survivor.tribe LIMIT 10', con=eng)

Out[45]:

	index	tribe_id	summary	tribal_history	trivia	name	tribenameorigin	tribetype	dayformed	status	lowest_placing_member	insigniaimage	flagimage	buffimage	image	opponent_0	opponent_1	opponent_2	season_id	highest_placing_member	created	updated
0	139	140	Galang is a tribe from Survivor: Blood vs. Wat...	\n\n\n	* Galang is the only tribe of Returning Playe...	Galang	Filipino word meaning "respect"	Starting Tribe	Day 1	Merged with Tadhana on Day 19	68.0	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	33.0	NaN	None	16.0	18.0	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:15.035735+00:00
1	1	2	Jaburu is a tribe from Survivor: The Amazon.\n...	None	* Jaburu is the first all-female tribe in Sur...	Jaburu	The Jabiru stork	Starting tribe	Day 1	Merged with Tambaqui on Day 20	335.0	None	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	116.0	NaN	None	3.0	415.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	13	171				David	None	None	None	None	NaN	None	None	None	None	NaN	NaN	None	NaN	NaN	2020-07-18 02:59:05.938012+00:00	2020-07-18 02:59:05.938012+00:00
3	3	4	La Flor (known informally as the Younger Tribe...	\n\nDuring the marooning on Day 1, the twenty ...	* La Flor is the third tribe in Survivor hist...	La Flor	Spanish word for "the flower"	Starting Tribe	Day 1	Merged with Espada on Day 19	190.0	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	27.0	NaN	None	23.0	47.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	137	138	Villains is a tribe from Survivor: Heroes vs....	The Villains.From the beginning, almost every ...	* Russell Hantz was not known by the Heroes v...	Villains	"Deception, Manipulation, and Duplicity"	Starting Tribe	Day 1	Merged with Heroes on Day 25	23.0	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	NaN	NaN	None	15.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:15.035735+00:00
5	38	39	Chan Loh (ចន្លុះ) is a tribe from Survivor: Ka...	\n\nThe "Brains" tribe of the season, Chan Loh...	* The Chan Loh beach eventually became Ta Keo...	Chan Loh	Koh Chanloh, an island off Sihanoukville, Camb...	Starting Tribe	Day 1	Merged with Gondol on Day 17	389.0	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	45.0	NaN	None	18.0	73.0	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:15.035735+00:00
6	7	8	Tandang is a tribe from Survivor: Philippines....	\n\nThe minute the Tandang tribe reached their...	* Yellow is one of the three colors represent...	Tandang	Filipino word meaning, "rooster"	Starting Tribe	Day 1	Merged with Kalabaw on Day 17	500.0	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	77.0	66.0	None	35.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	8	9	Gitanos is the merged tribe of Casaya and L...	None	* Gitanos merged the earliest (Day 16), later...	Gitanos	Spanish for "gypsies"	Merged tribe	Day 16	None	548.0	None	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	NaN	NaN	None	22.0	28.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	10	11	Kalo Kalo is the merged tribe of Mokuta and Va...	None	None	Kalo Kalo	Fijian word meaning "star"	Merged tribe	Day 29	None	825.0	None	None	None	None	NaN	NaN	None	44.0	821.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	126	127	Kasama is the merged tribe of Galang and Tadh...		* Kasama is the first merged tribe that Monic...	Kasama	Filipino word meaning "companion"	Merged tribe	Day 19	None	29.0	None	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	https://vignette.wikia.nocookie.net/survivor/i...	NaN	NaN	None	16.0	18.0	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:15.035735+00:00

In [46]:

pd.read_sql('SELECT * FROM survivor.alliance LIMIT 10', con=eng)

Out[46]:

	index	summary	history	trivia	name	dayformed	image	alliance_id	season_id	founder_0	founder_1	founder_2	lowest_placing_member	highest_placing_0	highest_placing_1	created	updated
0	0	None	On Day 6, after reading the clue to the Hidden...	* Jill is the only member of the alliance to n...	Espada Alliance	Day 6	https://vignette.wikia.nocookie.net/survivor/i...	1	23.0	459.0	NaN	None	35.0	356.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	None	When Russell Hantz was revealed that he would ...	* This alliance is the first to successfully v...	Zapatera Six	Day 7	https://vignette.wikia.nocookie.net/survivor/i...	2	29.0	329.0	NaN	None	625.0	112.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	None	After losing their first Immunity Challenge on...	* Lindsey Cascaddan is the only member of the...	Escameca Alliance	Day 11	https://vignette.wikia.nocookie.net/survivor/i...	3	10.0	348.0	NaN	None	479.0	279.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	7		After Nagarote lost its first challenge on Day...	* Will is the only member of the alliance to m...	Nagarote Alliance	Day 5	https://vignette.wikia.nocookie.net/survivor/i...	8	10.0	12.0	NaN	None	209.0	NaN	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:14.421798+00:00
4	4	None	The Tandang Alliance was originally formed by ...	* Every member of the Tandang tribe was a part...	Tandang Alliance	Day 1	https://vignette.wikia.nocookie.net/survivor/i...	5	35.0	235.0	500.0	None	500.0	623.0	529.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	10				Final Four Alliance	None	None	112	NaN	NaN	NaN	None	NaN	NaN	NaN	2020-07-13 03:35:14.421798+00:00	2020-07-13 03:35:14.421798+00:00
6	6	None	On Day 13, the Zhan Hu tribe received a note f...	* This was the first pre-merge 3-person allian...	Zhan Hu Alliance	Day 13	https://vignette.wikia.nocookie.net/survivor/i...	7	24.0	178.0	NaN	None	178.0	349.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	40		The alliance of original Matsing tribe members...	* This was the first alliance to be created by...	Matsing Alliance	Day 1	https://vignette.wikia.nocookie.net/survivor/i...	41	35.0	549.0	216.0	None	351.0	549.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-13 03:35:14.421798+00:00
8	30	None	Although not created until after the eliminati...	* Jaison Robinson is the only original member...	Foa Foa Four	Day 18	https://vignette.wikia.nocookie.net/survivor/i...	31	21.0	251.0	NaN	None	494.0	624.0	NaN	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	23				Witches Coven	None	None	113	NaN	NaN	NaN	None	NaN	NaN	NaN	2020-07-13 03:35:14.421798+00:00	2020-07-13 03:35:14.421798+00:00

Finally, to compliment the confessionals from above, some episodes have a wealth of quotes from contestants in particular situations.

Final words contains the final words spoken by contestants at the end of the episode:

In [47]:

pd.read_sql('SELECT * FROM survivor.final_words LIMIT 10', con=eng)

Out[47]:

	index	contestant_id	content	season	episode_id	created	updated
0	0	782	I came in claiming that I had the most knowled...	37	0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	113	Well, I think this has been, uh, an awesome ex...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	7	I'm a little sad to leave these people because...	6	3	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	3	784	I'm feeling real gutted to be voted out. I don...	37	4	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	4	551	They kicked off their bug-eating hero instead ...	6	5	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	5	242	I think the first two days that I was sick jus...	6	7	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	6	795	My Survivor journey's finally come to an end. ...	37	8	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	7	34	I guess I want to start by just thanking the L...	6	9	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	8	332	I wish I could've made it a little longer. I t...	6	10	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	9	785	I am so devastated to lose and to be going hom...	37	12	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

Voting Confessionals contain words spoken in the voting area each episode (often censored or redacted for dramatic suspense):

In [48]:

pd.read_sql('SELECT * FROM survivor.voting_confessional LIMIT 10', con=eng)

Out[48]:

	index	voter_id	season	episode_id	type_of_vote	initial_or_changed	for_or_against	content	recipient_id	created	updated
0	0	789	37	0	vote	initial	against	Dee, you're an awesome girl, but, this is a tr...	782.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	259	6	1	vote	initial	against	Got to remove the weakest link in the... crew.	113.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	318	6	1	vote	initial	against	My vote really kills me because I love her, bu...	113.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	3	257	6	1	vote	initial	against	Stacey. Um, tough call. (sighs) Subtle reasons...	551.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	4	551	6	1	vote	initial	against	He's just... He's an ornery guy... doesn't rea...	58.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	5	58	6	1	vote	initial	against	I'm picking her because I think she's the reas...	113.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	6	34	6	1	vote	initial	against	Just physically it's too much. That's it. That...	113.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	7	113	6	1	vote	initial	against	I'll miss his, uh, skills and, um, his speakin...	58.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	8	780	37	2	vote	initial	against	Really sorry mate. But it's just what I need t...	795.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	9	793	37	2	vote	initial	against	You're awesome, Tony. I'm sorry.	795.0	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

Finally, story quotes contain information from the entire episode. There may be overlap with the confessionals section, or it may be within the tribes (not considered a confessional). Additionally, when there are no contestants, it may represent voice overs or things spoken by people outside of the main contestants.

In [49]:

pd.read_sql('SELECT * FROM survivor.story_quotes LIMIT 10', con=eng)

Out[49]:

	index	contestant_id	content	season	episode_id	created	updated
0	0	NaN	The sixteen contestants have been separated in...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	NaN	The Tagi tribe, who will always wear orange, c...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	NaN	The Pagong tribe, who will always wear yellow,...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	3	NaN	Here, it's the impressions you make on the oth...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	4	NaN	Throughout their time on the island, tribes wi...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	5	NaN	This is Tribal Council, where each week one me...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	6	58.0	Paddling over, uh, we had two or three of thos...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	7	58.0	The hardest part is hanging around with all th...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	8	58.0	Up until, uh, probably last night, I never gav...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	9	627.0	He was yelling at everybody "Let's lose the bo...	6	1	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

NOTE: You may notice that there is a wealth of information here from the wiki itself. While some of this hasn't been cleaned entirely, the text fields so far have been mostly untampered with. What has been cleaned are numeric, or datetime, columns, and relational foreign keys. Until a good use is found for some of the text data, it may not make sense to clean this yet, as I can see there being a lot of information in there that I don't want to slice just yet.

Pushshift.io¶

Pushift.io is a public api for requesting archived Reddit data. The nice thing is that it does not have a small limit to accessing the data, like Reddit's API does, and does not limit on total requests. This data is updated every day, and contains all of the posts since the inception of the r/survivor subreddit in 2011. r/survivor is actually a thriving community, and a rich data source for engagement, as well as predictions for the show. We have two tables here:

Submissions contains data on the submissions as topics to the subreddit.

Comments contains data on the comments in the subreddit.

In [50]:

pd.read_sql('SELECT * FROM survivor.reddit_submissions LIMIT 10', con=eng)

Out[50]:

	index	author	author_flair_css_class	author_flair_text	created_utc	domain	full_link	id	is_self	media_embed	num_comments	over_18	permalink	score	selftext	subreddit	subreddit_id	thumbnail	title	url	author_created_utc	author_fullname	edited	distinguished	media	link_flair_css_class	link_flair_text	mod_reports	retrieved_on	secure_media_embed	stickied	user_reports	secure_media	post_hint	preview	locked	banned_by	contest_mode	spoiler	brand_safe	suggested_sort	author_cakeday	thumbnail_height	thumbnail_width	is_video	approved_at_utc	banned_at_utc	can_mod_post	view_count	...	link_flair_template_id	author_flair_background_color	author_flair_text_color	send_replies	no_follow	subreddit_subscribers	is_original_content	previous_visits	wls	pwls	media_only	author_id	is_meta	all_awardings	allow_live_comments	awarders	gildings	is_robot_indexable	total_awards_received	treatment_tags	upvote_ratio	author_patreon_flair	author_premium	media_metadata	author_flair_template_id	removed_by_category	event_end	event_is_live	event_start	archived	can_gild	category	content_categories	hidden	quarantine	removal_reason	subreddit_name_prefixed	collections	updated_utc	steward_reports	og_description	og_title	removed_by	poll_data	created_dt	most_recent_season	within_season	most_recent_episode	created	updated
0	16384	MrUnderdawg	S31Pregame	Spencer	1430250605	self.survivor	https://www.reddit.com/r/survivor/comments/346...	346vvp	True	{}	17.0	False	/r/survivor/comments/346vvp/what_would_you_guy...	0.0	I am 14 and love Survivor and have seen most s...	survivor	t5_2qhu3	self	What would you guys think of a teenage version...	http://www.reddit.com/r/survivor/comments/346v...	1.376778e+09	t2_csnqs	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 19:50:05	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	16385	DantheManFoley	S31Pregame	Savage	1430253710	youtube.com	https://www.reddit.com/r/survivor/comments/347...	34739a	False	{'content': '<iframe class="embedly-embed" ...	6.0	False	/r/survivor/comments/34739a/cool_behind_the_sc...	0.0	None	survivor	t5_2qhu3	http://b.thumbs.redditmedia.com/pzXfFWQiveSigX...	Cool behind the scenes Survivor Samoa	https://www.youtube.com/watch?v=Z2r4V8TohTQ	1.428826e+09	t2_muy3e	None	None	{'oembed': {'author_name': 'CBS', 'author_url'...	None	None	None	1.440766e+09	{'content': '<iframe class="embedly-embed" ...	False	None	{'oembed': {'author_name': 'CBS', 'author_url'...	rich:video	{'images': [{'id': 'WberhPZBCTrBBVqeOcPSBlPvUw...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 20:41:50	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	16386	AnghellicKarma	S31Pregame	Tasha	1430254375	self.survivor	https://www.reddit.com/r/survivor/comments/347...	3474so	True	{}	21.0	False	/r/survivor/comments/3474so/spoilers_what_are_...	0.0	Granted, anything can happen, and we know that...	survivor	t5_2qhu3	self	(Spoilers) What are the likely paths to victor...	http://www.reddit.com/r/survivor/comments/3474...	1.325176e+09	t2_6je30	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 20:52:55	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	16387	LenoxTillman_is_ANTM	Player	Shirin	1430255552	self.survivor	https://www.reddit.com/r/survivor/comments/347...	3477jd	True	{}	45.0	False	/r/survivor/comments/3477jd/spoilers_i_think_i...	17.0	No matter what Sierra is going to be in the mi...	survivor	t5_2qhu3	self	[SPOILERS] I think ____ is screwed	http://www.reddit.com/r/survivor/comments/3477...	1.416627e+09	t2_jit7n	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 21:12:32	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	16388	numberonepassion	S31Pregame	Varner	1430257442	self.survivor	https://www.reddit.com/r/survivor/comments/347...	347bzu	True	{}	205.0	False	/r/survivor/comments/347bzu/least_deserving_pl...	7.0	Any of the players that have returned on any o...	survivor	t5_2qhu3	self	Least deserving player to ever be brought back?	http://www.reddit.com/r/survivor/comments/347b...	1.430160e+09	t2_n5ppp	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 21:44:02	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	16389	[deleted]	None	None	1430259006	proprofs.com	https://www.reddit.com/r/survivor/comments/347...	347fhg	False	{}	0.0	False	/r/survivor/comments/347fhg/puzzlleee/	1.0	None	survivor	t5_2qhu3	default	puzzlleee	http://www.proprofs.com/games/puzzle/sliding/s...	NaN	None	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 22:10:06	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	16390	[deleted]	None	None	1430261862	imgur.com	https://www.reddit.com/r/survivor/comments/347...	347ln3	False	{}	0.0	False	/r/survivor/comments/347ln3/the_closest_i_will...	1.0	None	survivor	t5_2qhu3	default	the closest I will ever get to playing survivor.	http://imgur.com/2F3kKWe	NaN	None	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 22:57:42	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	16391	lbblur	S31Pregame	Keith	1430263877	self.survivor	https://www.reddit.com/r/survivor/comments/347...	347pqa	True	{}	0.0	False	/r/survivor/comments/347pqa/how_different_woul...	1.0	None	survivor	t5_2qhu3	default	How different would Survivor South Pacific hav...	http://www.reddit.com/r/survivor/comments/347p...	NaN	None	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 23:31:17	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	16392	numberonepassion	S31Pregame	Varner	1430264587	self.survivor	https://www.reddit.com/r/survivor/comments/347...	347r5o	True	{}	57.0	False	/r/survivor/comments/347r5o/predictions_for_th...	6.0	What do you think will happen? What irrelevant...	survivor	t5_2qhu3	self	Predictions for the reunion show?	http://www.reddit.com/r/survivor/comments/347r...	1.430160e+09	t2_n5ppp	None	None	None	None	None	None	1.440766e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-04-28 23:43:07	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	16845	[deleted]	None	None	1430868753	mobile.twitter.com	https://www.reddit.com/r/survivor/comments/34z...	34zzhu	False	{}	3.0	False	/r/survivor/comments/34zzhu/survivors_got_a_bi...	1.0	None	survivor	t5_2qhu3	default	Survivor's Got a Big Announcement Tomorrow Eve...	https://mobile.twitter.com/Survivor_Tweet/stat...	NaN	None	None	None	None	None	None	None	1.440753e+09	{}	False	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	...	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-05-05 23:32:33	None	None	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

10 rows × 116 columns

In [51]:

pd.read_sql('SELECT * FROM survivor.reddit_comments LIMIT 10', con=eng)

Out[51]:

	index	author	author_created_utc	author_flair_css_class	author_flair_text	author_fullname	body	created_utc	distinguished	id	link_id	nest_level	parent_id	reply_delay	retrieved_on	score	score_hidden	subreddit	subreddit_id	edited	user_removed	mod_removed	stickied	author_cakeday	can_gild	collapsed	collapsed_reason	is_submitter	gildings	permalink	permalink_url	updated_utc	subreddit_type	no_follow	send_replies	author_flair_template_id	author_flair_background_color	author_flair_richtext	author_flair_text_color	author_flair_type	rte_mode	subreddit_name_prefixed	all_awardings	associated_award	author_patreon_flair	author_premium	awarders	collapsed_because_crowd_control	locked	total_awards_received	treatment_tags	steward_reports	top_awarded_type	created_dt	most_recent_season	within_season	most_recent_episode	submission_id	created	updated
0	647761	sseidl88	1.383535e+09	S31Pregame	Spencer	t2_drjpb	She bout to lose it next episode	1443106261	None	cvclc5e	t3_3m4uta	2.0	t1_cvc187u	50932.0	1.444555e+09	3.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:51:01	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	647762	jsreid	1.283268e+09	S31Pregame	Fishbach	t2_4arr8	Episode 1 spoiler or worse?	1443106288	None	cvclcso	t3_3m58ai	5.0	t1_cvchlyq	6819.0	1.444555e+09	2.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:51:28	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	647763	SawRub	1.321492e+09	S31Pregame	Spencer	t2_69agp	Maybe Spencer sees this as the good three-Brai...	1443106290	None	cvclcus	t3_3m4uta	3.0	t1_cvc70r9	40938.0	1.444555e+09	5.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:51:30	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	647764	sseidl88	1.383535e+09	S31Pregame	Spencer	t2_drjpb	He was trying so hard not to laugh	1443106349	None	cvcle88	t3_3m4uta	2.0	t1_cvc1r7c	50089.0	1.444555e+09	3.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:52:29	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	647765	InquisitorialSquad	1.406635e+09	S31Pregame	Fishbach	t2_hmisk	I've been off this sub-reddit in a while... is...	1443106370	None	cvclepd	t3_3m576l	3.0	t1_cvc38a0	47585.0	1.444555e+09	1.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:52:50	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	647766	[deleted]	NaN	None	None	None	[deleted]	1443106372	None	cvcler6	t3_3m4uta	4.0	t1_cvc7u7g	39388.0	1.444555e+09	1.0	None	survivor	t5_2qhu3	None	True	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:52:52	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	647767	lex_machine	1.402710e+09	S31Pregame	Varner	t2_gz19g	He's got the full schemer edit:\n\n- Aras is t...	1443106406	None	cvclfk9	t3_3m6nsz	2.0	t1_cvckx52	796.0	1.444555e+09	1.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:53:26	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	647768	zallirog23	1.387086e+09	S31Pregame	Wiglesworth	t2_eczt1	She was talking to people, no one mentioned he...	1443106433	None	cvclg80	t3_3m70bk	2.0	t1_cvchrgp	6619.0	1.444555e+09	15.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:53:53	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	647769	zallirog23	1.387086e+09	S31Pregame	Wiglesworth	t2_eczt1	Got her bracelet back, it's a victory.	1443106467	None	cvclgzw	t3_3m70bk	2.0	t1_cvchrgf	6653.0	1.444555e+09	25.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:54:27	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	647770	SkyborneScout	NaN	S31Pregame	Monica	None	Didn't Kelly Goldsmith vote out Diane in Episo...	1443106484	None	cvclhek	t3_3m59f8	3.0	t1_cvc6hs1	42115.0	1.444555e+09	11.0	None	survivor	t5_2qhu3	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	None	2015-09-24 14:54:44	11.0	11.0	562.0	None	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

There is a wealth of information that Pushift.io provides here. In terms of connecting to the other tables, we have a most_recent_season, within_season and most_recent_episode column which relates the date of the post (or comment) to the most recent season/episode in our tables. This will be the subject of our first analysis!

Caunce Character Types¶

Last, and certainly not least, are the Caunce Character types. These are character types assigned by Angie Caunce a reporter on the show on a popular podcast, Rob has a Podcast. I actually don't know too much about this, but it is talked about quite a bit on the subreddit and other forums. Each contestant is linked to the character type to which Angie has assinged them. These have ids, like the other tables, and look like this:

In [53]:

pd.read_sql('SELECT * FROM survivor.role LIMIT 10', con=eng)

Out[53]:

	index	role_id	role	description	created	updated
0	0	0	Good Ol' Boy	Southern accent, country roots, often a farmer...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
1	1	1	Know It All	Superfan, highly intelligent, understands stra...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
2	2	2	Seduce and Destroy	Young professional (marketing/sales), “ladies ...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
3	3	3	The Social Butterfly	Sometimes gay, super social, witty, extremely ...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
4	4	4	Alpha Male Control Freak	CEO or doctor, rich and powerful, bossy, contr...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
5	5	5	The Specialist	Eccentric, little crazy, full of himself, huge...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
6	6	6	True Grit	Retired pro athlete or military guy, cop, fire...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
7	7	7	John McClane	25-35 regular Joe (blue collar job), aggressiv...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
8	8	8	Surfer Dude	Long hair, easy going, very athletic, new agey...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00
9	9	9	Mr. Miagi	Kind, wise, intelligent, well spoken, not inte...	2020-07-11 01:03:00.566347+00:00	2020-07-11 01:03:00.566347+00:00

Okay, cool. So What?¶

Totally understand you asking yourself this question, if you've made it this far. So what? Who cares about all of this data?

Well, I wanted to make a post about this since, for the next few months (and possibly beyond...) I will be posting an analysis weekly on Survivor based on these tables. And since it can certainly be confusing, I figured the best place to start is where all analysis starts -- data collection and cleaning. And while the next few analyses may be the most fun to dig into, they will by far be much less time consuming, and ultimately much less important, than this step.

One of the coolest parts about all of this is that the generation of these tables is now put into production in my little mini homeserver environment. Now, my Raspberry Pis will be put to great use scraping and otherwise collecting data from all of these sources every day. Even better, I don't even have to think about it anymore!

As a data scientist, this is a beautiful situation -- I am able to be entirely self sufficient to do all of the cool data science stuff that I want! :)

Also, I'm a huge nerd and hope you are too and hope that seeing all of this data makes you drool! If you would like for me to share this data with you, please feel free to reach out to me!

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.

Comments

Survivor: Outwit, Outplay, Out...analyze? (Part 1) Collecting The Data

Analyzing the Game of Survivor -- Collecting The Data (1)¶

The Data of Survivor¶

A quick primer on Survivor¶

Detail of: The ETL Process¶

Querying the Database¶

True Dorks Data¶

Confessionals Data¶

Survivor Wikia¶

Pushshift.io¶

Caunce Character Types¶

Okay, cool. So What?¶

Published

Category

Tags

Contact