Analysis of a Podcast: The Joe Rogan Experience
The Joe Rogan Experience has made frequent appearances in ranking lists of podcasts in the last few years. It has enjoyed numerous awards and mentions for its overall popularity as well as being a comedy podcast. While The Joe Rogan Experience invariably has some element of comedy in every episode, the topics of conversation vary considerably. It is this versatility, in combination with the conversational format of the show, that peaks my interest in collecting data about this particular podcast. What are the primary topics discussed on JRE? Is it possible to label JRE with any one of its topics of conversation? How does the popularity of the show vary with the topics discussed during each year of its run so far?
Data Acquisition and Cleaning
The data were scraped from a third-party website using Scrapy, and then was cleaned and organized using a combination of Python and R. The variables collected were episode title, air date, runtime, likes, dislikes, and ratio. The episode title included the name of the guest(s), while the likes and dislikes were obtained via YouTube data by the third party. The ratio is simply the ratio of likes to dislikes. The runtime is the time-length of the episode in the format of hh:mm:ss.
A second scrape yielded a data set with a new column, 'tags.' This column gives the category that the guest falls into.
There were many "Best of.." episodes that were snippets of full-length episodes, and because of their redundancy, they were dropped from the data set. Also dropped were episodes less than 55 minutes, since they were mainly snippets that didn't sport the easily sorted title starting with "Best of." Lastly, any episode that didn't explicitly name the guest in the title was dropped for simplicity, because they are a minority in the data set.
The initial look at the time series data for number of views, likes, and dislikes shows very tall spikes. I believe these spikes are viewers who tuned in specifically to see a certain guest. The guests that correspond with those spikes will be addressed later, but before I begin further analysis, I will remove outliers.
After removal of data points greater than three standard deviations, the time series looks a little more stable.
When looking at the scarcity of views between the years of 2013 and the middle of 2015, I think it can be inferred that something significant happened to begin a steady increase from then on, but I am not sure what it is.
The next few plots compares several variables across the tags which have at least 30 episodes associated with them. The variables are number of views, likes, dislikes, runtime, and ratio.
All categories in this plot start at zero, probably because of the low numbers found in the earlier times of the podcast. Because of this, I would like to compare across categories for the last two years. In order to accomplish this, I eliminated all tags from the comparison that had less than thirty observations.
From the first graph that included the outliers in the data set, I was curious to know who those guests were. Below are lists of episodes that were outliers for their respective variables. The length of each list represents the number of outliers in that particular variable.
There was only one outlier for the variable of runtime: an episode with Alex Jones that ran approximately 280 minutes.
Comparison by Category: Episode Count and Average Number of Views
From the side-by-side plots above, you can see that the categories of comedians and athletes-fighters have the highest episode count while the categories of politics and business command the most attention. In fact, the category of comedians doesn't even show up in the plot on the right. This directed me to create a new table, grouped and indexed by category(column labeled 'tag') and add a new column for the average number of views per number of episodes. I removed the categories that had less than 30 episodes.
While there is not a lot of variance of variables among categories, there is certainly a disconnect between the number of episodes per category and the number of views per category. While the majority of podcasts are tagged with the category of comedians, the category of politics gets substantially higher numbers of views. This suggests that JRE gets more attention when the topic of conversation or category of guest deviates from comedians and fighters. This is especially true when the deviation is toward politics or business.