Analysis of a Podcast: The Joe Rogan Experience

Michael Dollar

Posted on Jul 14, 2019

The Joe Rogan Experience has made frequent appearances in ranking lists of podcasts in the last few years. It has enjoyed numerous awards and mentions for its overall popularity as well as being a comedy podcast. While The Joe Rogan Experience invariably has some element of comedy in every episode, the topics of conversation vary considerably. It is this versatility, in combination with the conversational format of the show, that peaks my interest in collecting data about this particular podcast. What are the primary topics discussed on JRE? Is it possible to label JRE with any one of its topics of conversation? How does the popularity of the show vary with the topics discussed during each year of its run so far?

Data Acquisition and Cleaning

The data were scraped from a third-party website using Scrapy, and then was cleaned and organized using a combination of Python and R. The variables collected were episode title, air date, runtime, likes, dislikes, and ratio. The episode title included the name of the guest(s), while the likes and dislikes were obtained via YouTube data by the third party. The ratio is simply the ratio of likes to dislikes. The runtime is the time-length of the episode in the format of hh:mm:ss.

df_table | Data Science Blog — A sample of the data set without tags

A second scrape yielded a data set with a new column, 'tags.' This column gives the category that the guest falls into.

dft_table | Data Science Blog — Sample of the data set with tag column

There were many "Best of.." episodes that were snippets of full-length episodes, and because of their redundancy, they were dropped from the data set. Also dropped were episodes less than 55 minutes, since they were mainly snippets that didn't sport the easily sorted title starting with "Best of." Lastly, any episode that didn't explicitly name the guest in the title was dropped for simplicity, because they are a minority in the data set.

Results

The initial look at the time series data for number of views, likes, and dislikes shows very tall spikes. I believe these spikes are viewers who tuned in specifically to see a certain guest. The guests that correspond with those spikes will be addressed later, but before I begin further analysis, I will remove outliers.

After removal of data points greater than three standard deviations, the time series looks a little more stable.

When looking at the scarcity of views between the years of 2013 and the middle of 2015, I think it can be inferred that something significant happened to begin a steady increase from then on, but I am not sure what it is.

The next few plots compares several variables across the tags which have at least 30 episodes associated with them. The variables are number of views, likes, dislikes, runtime, and ratio.

views_box_or | Data Science Blog — All box plots are heavily skewed to the top

All categories in this plot start at zero, probably because of the low numbers found in the earlier times of the podcast. Because of this, I would like to compare across categories for the last two years. In order to accomplish this, I eliminated all tags from the comparison that had less than thirty observations.

views_box_or_last_2yrs-2 | Data Science Blog — Here authors has the highest minimum amount of views

likes_box_or_last_2yrs-1 | Data Science Blog — The number of likes within comedians and miscellaneous vary the most here but at least a quarter of the podcasts in each category have more than 23000 likes

dislikes_box_or_last_2yrs-1 | Data Science Blog — Athletes fighters is the category with the tightest spread of dislikes and the lowest median

ratio_box_or_last_2yrs-1 | Data Science Blog — The ratio for authors has the highest variance while writers seems to have the lowest

runtime_box_or_last_2yrs-1 | Data Science Blog — The duration of the episode seems to be consistent among categories but with writers there seems to be a sharper likelihood being around 160 175 min long

The Outliers

From the first graph that included the outliers in the data set, I was curious to know who those guests were. Below are lists of episodes that were outliers for their respective variables. The length of each list represents the number of outliers in that particular variable.

views_top16_outliers | Data Science Blog — That large spike in the time series data is Elon Musk

likes_top14_outliers | Data Science Blog — Elon Musk also tops the chart with number of likes

dislikes_top9_outliers | Data Science Blog — Jack Dorsey has an overwhelming lead in dislikes for this list

ratio_top12_outliers | Data Science Blog

There was only one outlier for the variable of runtime: an episode with Alex Jones that ran approximately 280 minutes.

Comparison by Category: Episode Count and Average Number of Views

From the side-by-side plots above, you can see that the categories of comedians and athletes-fighters have the highest episode count while the categories of politics and business command the most attention. In fact, the category of comedians doesn't even show up in the plot on the right. This directed me to create a new table, grouped and indexed by category(column labeled 'tag') and add a new column for the average number of views per number of episodes. I removed the categories that had less than 30 episodes.

avg_views_per_epcount | Data Science Blog

Conclusions

While there is not a lot of variance of variables among categories, there is certainly a disconnect between the number of episodes per category and the number of views per category. While the majority of podcasts are tagged with the category of comedians, the category of politics gets substantially higher numbers of views. This suggests that JRE gets more attention when the topic of conversation or category of guest deviates from comedians and fighters. This is especially true when the deviation is toward politics or business.

Analysis of a Podcast: The Joe Rogan Experience

Data Acquisition and Cleaning

Results

The Outliers

Comparison by Category: Episode Count and Average Number of Views

Conclusions

About Author

Michael Dollar

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Analysis of a Podcast: The Joe Rogan Experience

Data Acquisition and Cleaning

Results

The Outliers

Comparison by Category: Episode Count and Average Number of Views

Conclusions

About Author

Michael Dollar

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!