Data Analysis on Streaming Platforms
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Photo by Ivan Marc on Shutterstock
GitHub Repository | LinkedIn
Motivation
As a dedicated Netflix user for the past several years, I often find myself scrolling over and over trying to find a TV series to watch. At some point, I came to a realization that maybe it is about time to move to a different streaming data platform because nothing seemed to draw my attention.
However, it is never an easy decision to make to cancel my Netflix subscription and subscribe to another service. This is largely because, there are over 200 streaming services out there including Netflix, HBO Max, Hulu, and it is hard to access the content catalog from the websites without a subscription or without looking into secondary websites such as whats-on-netflix.com.
That being said, I wanted to compare the content from 5 major streaming services (Amazon Prime, Disney Plus, HBO Max, Hulu, and Netflix) to help the audience and myself understand which streaming service is the best fit for them. This project will especially be informative for those people low on a budget looking for one streaming service to subscribe to.
Data Collection
The dataset for this project was collected by scraping the entire content (over 27,000 movies and TV shows in total) from reelgood.com which is a San Francisco based streaming hub where you can connect your subscriptions to track, search, or stream TV shows and movies. As seen above, I took advantage of their vast content catalog from various streaming services. The Scrapy spider I created was able to crawl through the list of the content and obtain the information I specified it to scrape. By doing so, I collected information such as the title, content type, year, IMDb score, and genre.
Questions of Interest
With the data I collected from reelgood.com, I tried to tackle the following questions that may help the audience to understand which streaming service is the best fit for them.
- Which service has the most content?
- How does the distribution of the IMDb scores look for each service?
- How many high-quality movies and TV shows are there for each service?
- Is there a trend in the count of high-quality content?
- How does the genre breakdown look for high-quality content?
- How much content do you get for one dollar spent on a subscription?
Baseline Data Information: Membership Cost
Let's first look into how much these top 5 streaming services cost. It seems that HBO Max is the most expensive service and Disney Plus is the cheapest. While membership costs vary extensively, more factors need to be taken into account to decide which platform to subscribe to. It would be rare for someone to decide which service to subscribe to solely based on price. Later on, the membership costs will be considered again when I look into cost-efficiency. So let us begin with the first question!
Q. Which Service Has The Most Content?
According to the plot above, streaming services seem to have no competition for Amazon Prime when it comes to movies. Amazon is the leading platform for having the most movies and TV shows followed by Netflix and HBO Max. Yet, when it comes to TV shows, Amazon Prime, Netflix, and Hulu have similar counts of shows. Another thing that is noteworthy is the fact that Hulu has more TV shows than movies, unlike other platforms. This gives us a hint that Hulu focuses more on TV shows compared to movies.
Overall, Amazon Prime seems to be taking over the spotlight. Does this mean that you should subscribe to Amazon Prime simply because it tends to have the most content? One might argue that what matters more is the quality of the content rather than the volume of content. So, now let's look into the distribution of the IMDb scores for each service.
Q. How Does The IMDb Score Distribution Look?
Before discussing the ratings for each service, I want to point out the difference in average for TV shows and movies. Interestingly, the average rating of TV shows is higher than that of movies. While there is no means to confirm why it appears such a way, my hypothesis is that TV shows tend to have dedicated fans that are committed to watching the series throughout the seasons. On the other hand, movies tend to be a one-time experience, hence resulting in receiving lower scores than TV shows.
In addition to the difference between the average ratings of TV shows and movies, we can notice that HBO Max tends to have content with high ratings for both TV shows and movies compared to other services. We can already understand that the volume of content is not all that matters. Moreover, there is just so many badly rated content as we can see from all the outliers on the bottom. Therefore, I want to focus on high-quality content from this point and compare high-quality content.
Q. How Much High-Quality Content Is There?
In order to count the quality and high-quality content, I respectively used the threshold of 50% and 75% from the IMDb scores distribution. Please note that the thresholds used for TV shows and movies are different.
What we can take away from these two plots above is that, when it comes to quality and high-quality movies, Amazon Prime is still the leading platform. However, Netflix wins first place for having the most quality and high-quality TV shows. Therefore, based on whether you like watching movies or TV shows, you might want to look into different platforms. Another interesting fact is that, percentage-wise, HBO Max is a safer choice for those who do not check ratings because the majority of its content seems to be quality or high-quality content.
However what if these good movies and TV shows are all old, and you don't want to watch classic movies? Let's examine if there is any trend in high-quality content. From now on, I will only look into high-quality content, as those movies and TV shows may be the content that the audience is interested in.
Q. Is There A Data Trend In The Count of HQ Content?
It seems that the general public mostly enjoys the content made in the 2000s. Also, we can easily notice the increasing trend in the count of high-quality Netflix content, which may be the result of Netflix's focus on creating their original content.
Now that we have an understanding of the general trend of high-quality content for each service, I want to break it down by different genres. Since the target audience for this project is those using the streaming services on a daily/weekly basis, I want to focus on the high-quality content made in the 2000s as the audience may be more interested in relatively newer content.
Q. How Does The Genre Breakdown Look For HQ Content Made In The 2000s?
While 27 genres are listed in reelgood.com, the results presented below will only cover some of the highlights of the genre breakdown.
- Movie Genres
In general, the genre breakdown of movies shows that Amazon Prime is the leading platform for movies in different genres. Especially the presence Amazon Prime has in the documentary and independent movie genres is not comparable with other services.
Therefore, documentary fans should definitely check out Amazon Prime. Compared to Amazon Prime, Netflix is typically doing well with its stand-up talk, animation, and anime movies. We can also notice that, unsurprisingly, Disney Plus has many highly-rated animations. Moreover, one of the most interesting observations made from the above plot is that HBO Max has a tight competition with Amazon Prime for science-fiction movies.
- TV Show Genres
In contrast to the genre breakdown of movies, Netflix tends to be the leading platform for high-quality TV shows made in the 2000s for most of the genres. Netflix specifically has strong content in crime and family genre TV shows. However, Amazon Prime is still the top 1 platform for documentary TV shows just as it is for documentary movies.
What is also worthwhile to mention from the above figure is the strong competition Hulu has in the TV show market. Especially for animation and anime TV shows, Hulu tends to have a tight match with Netflix. This is interesting because Hulu was the weakest platform when it comes to animation and anime movies. Moreover, Hulu is the best platform for reality TV shows. We can again confirm from the genre breakdown that Hulu focuses on TV shows rather than movies.
Q. How Much Content Do You Get For One Dollar?
Now that we have thoroughly examined the nature of each streaming site, let's look into the cost efficiency of the services. How many movies or TV shows are you getting with the one dollar you paid for a subscription? It seems that, if you are into well-rated movies, you should subscribe to Amazon Prime.
You get over 200 movies per dollar on Amazon Prime, while you only get over 70 movies per dollar on Netflix, which is relatively less content. On the other hand, TV series lovers should subscribe to Netflix. But they should also pay attention to Hulu and Amazon Prime because Netflix has a tight match with them for TV shows. As a result, further checking the count of shows by genres is recommended to further understand which service is a good fit.
Summary
In this article, I compared and visualized the content from five major streaming services after scraping over 27,000 movies and TV shows from reelgood.com. My project came to the conclusion that Amazon Prime is the leading platform for movies, while Netflix is that for TV shows. However, one might want to further explore the genre breakdown when it comes to TV shows as several platforms seem to have a tight match in the TV show market.
For further approach, I may attempt to compare classic and contemporary content or analyze the content that appears in multiple services. I could also build a dashboard to help users search and filter the content by different criteria. Thank you for reading my blog post and please stay tuned!