Analysis of a Podcast: The Joe Rogan Experience

Posted on Jul 14, 2019

The Joe Rogan Experience has made frequent appearances in ranking lists of podcasts in the last few years.  It has enjoyed numerous awards and mentions for its overall popularity as well as being a comedy podcast.  While The Joe Rogan Experience invariably has some element of comedy in every episode, the topics of conversation vary considerably.  It is this versatility, in combination with the conversational format of the show, that peaks my interest in collecting data about this particular podcast.  What are the primary topics discussed on JRE?  Is it possible to label JRE with any one of its topics of conversation?  How does the popularity of the show vary with the topics discussed during each year of its run so far?

 

Data Acquisition and Cleaning

The data were scraped from a third-party website using Scrapy, and then was cleaned and organized using a combination of Python and R.  The variables collected were episode title, air date, runtime, likes, dislikes, and ratio.  The episode title included the name of  the guest(s), while the likes and dislikes were obtained via YouTube data by the third party.  The ratio is simply the ratio of likes to dislikes.  The runtime is the time-length of the episode in the format of hh:mm:ss.

A sample of the data set without tags.

A second scrape yielded a data set with a new column, 'tags.'  This column gives the category that the guest falls into.

Sample of the data set with 'tag' column.

There were many "Best of.." episodes that were snippets of full-length episodes, and because of their redundancy, they were dropped from the data set.  Also dropped were episodes less than 55 minutes, since they were mainly snippets that didn't sport the easily sorted title starting with "Best of."  Lastly, any episode that didn't explicitly name the guest in the title was dropped for simplicity, because they are a minority in the data set.

Results

The initial look at the time series data for number of views, likes, and dislikes shows very tall spikes.  I believe these spikes are viewers who tuned in specifically to see a certain guest.  The guests that correspond with those spikes will be addressed later, but before I begin further analysis, I will remove outliers.

After removal of data points greater than three standard deviations, the time series looks a little more stable.

When looking at the scarcity of views between the years of 2013 and the middle of 2015, I think it can be inferred that something significant happened to begin a steady increase from then on, but I am not sure what it is.

The next few plots compares several variables across the tags which have at least 30 episodes associated with them.  The variables are number of views, likes, dislikes, runtime, and ratio.

All box plots are heavily skewed to the top.

All categories in this plot start at zero, probably because of the low numbers found in the earlier times of the podcast.  Because of this, I would like to compare across categories for the last two years.  In order to accomplish this, I eliminated all tags from the comparison that had less than thirty observations.

Here, authors has the highest minimum amount of views.
The number of likes within comedians and miscellaneous vary the most here, but at least a quarter of the podcasts in each category have more than 23000 likes.
Athletes-fighters is the category with the tightest spread of dislikes and the lowest median.
The ratio for authors has the highest variance while writers seems to have the lowest.
The duration of the episode seems to be consistent among categories, but with writers, there seems to be a sharper likelihood being around 160-175 min long.

The Outliers

From the first graph that included the outliers in the data set, I was curious to know who those guests were.  Below are lists of episodes that were outliers for their respective variables.  The length of each list represents the number of outliers in that particular variable.

That large spike in the time series data is Elon Musk.
Elon Musk also tops the chart with number of likes.
Jack Dorsey has an overwhelming lead in dislikes for this list.

There was only one outlier for the variable of runtime:  an episode with Alex Jones that ran approximately 280 minutes. 

Comparison by Category:  Episode Count and Average Number of Views

From the side-by-side plots above, you can see that the categories of comedians and athletes-fighters have the highest episode count while the categories of politics and business command the most attention.  In fact, the category of comedians doesn't even show up in the plot on the right.  This directed me to create a new table, grouped and indexed by category(column labeled 'tag') and add a new column for the average number of views per number of episodes.  I removed the categories that had less than 30 episodes.

 

Conclusions

While there is not a lot of variance of variables among categories, there is certainly a disconnect between the number of episodes per category and the number of views per category.  While the majority of podcasts are tagged with the category of comedians, the category of politics gets substantially higher numbers of views.  This suggests that JRE gets more attention when the topic of conversation or category of guest deviates from comedians and fighters.  This is especially true when the deviation is toward politics or business.

About Author

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI