Data Analysis on Streaming Platforms

Posted on Aug 6, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Photo by Ivan Marc on Shutterstock

GitHub Repository | LinkedIn


As a dedicated Netflix user for the past several years, I often find myself scrolling over and over trying to find a TV series to watch. At some point, I came to a realization that maybe it is about time to move to a different streaming data platform because nothing seemed to draw my attention.

However, it is never an easy decision to make to cancel my Netflix subscription and subscribe to another service. This is largely because, there are over 200 streaming services out there including Netflix, HBO Max, Hulu, and it is hard to access the content catalog from the websites without a subscription or without looking into secondary websites such as

That being said, I wanted to compare the content from 5 major streaming services (Amazon Prime, Disney Plus, HBO Max, Hulu, and Netflix) to help the audience and myself understand which streaming service is the best fit for them. This project will especially be informative for those people low on a budget looking for one streaming service to subscribe to.

Data Collection

Data Analysis on Streaming Platforms

The dataset for this project was collected by scraping the entire content (over 27,000 movies and TV shows in total) from which is a San Francisco based streaming hub where you can connect your subscriptions to track, search, or stream TV shows and movies. As seen above, I took advantage of their vast content catalog from various streaming services. The Scrapy spider I created was able to crawl through the list of the content and obtain the information I specified it to scrape. By doing so, I collected information such as the title, content type, year, IMDb score, and genre.

Questions of Interest

With the data I collected from, I tried to tackle the following questions that may help the audience to understand which streaming service is the best fit for them.

  1. Which service has the most content?
  2. How does the distribution of the IMDb scores look for each service?
  3. How many high-quality movies and TV shows are there for each service?
  4. Is there a trend in the count of high-quality content?
  5. How does the genre breakdown look for high-quality content?
  6. How much content do you get for one dollar spent on a subscription?

Baseline Data Information: Membership Cost

Data Analysis on Streaming Platforms

Let's first look into how much these top 5 streaming services cost. It seems that HBO Max is the most expensive service and Disney Plus is the cheapest. While membership costs vary extensively, more factors need to be taken into account to decide which platform to subscribe to. It would be rare for someone to decide which service to subscribe to solely based on price. Later on, the membership costs will be considered again when I look into cost-efficiency. So let us begin with the first question!

Q. Which Service Has The Most Content?

Data Analysis on Streaming Platforms

According to the plot above, streaming services seem to have no competition for Amazon Prime when it comes to movies. Amazon is the leading platform for having the most movies and TV shows followed by Netflix and HBO Max. Yet, when it comes to TV shows, Amazon Prime, Netflix, and Hulu have similar counts of shows. Another thing that is noteworthy is the fact that Hulu has more TV shows than movies, unlike other platforms. This gives us a hint that Hulu focuses more on TV shows compared to movies.

Overall, Amazon Prime seems to be taking over the spotlight. Does this mean that you should subscribe to Amazon Prime simply because it tends to have the most content? One might argue that what matters more is the quality of the content rather than the volume of content. So, now let's look into the distribution of the IMDb scores for each service.

Q. How Does The IMDb Score Distribution Look?

Before discussing the ratings for each service, I want to point out the difference in average for TV shows and movies. Interestingly, the average rating of TV shows is higher than that of movies. While there is no means to confirm why it appears such a way, my hypothesis is that TV shows tend to have dedicated fans that are committed to watching the series throughout the seasons. On the other hand, movies tend to be a one-time experience, hence resulting in receiving lower scores than TV shows.

In addition to the difference between the average ratings of TV shows and movies, we can notice that HBO Max tends to have content with high ratings for both TV shows and movies compared to other services. We can already understand that the volume of content is not all that matters. Moreover, there is just so many badly rated content as we can see from all the outliers on the bottom. Therefore, I want to focus on high-quality content from this point and compare high-quality content.

Q. How Much High-Quality Content Is There?

In order to count the quality and high-quality content, I respectively used the threshold of 50% and 75% from the IMDb scores distribution. Please note that the thresholds used for TV shows and movies are different.

What we can take away from these two plots above is that, when it comes to quality and high-quality movies, Amazon Prime is still the leading platform. However, Netflix wins first place for having the most quality and high-quality TV shows. Therefore, based on whether you like watching movies or TV shows, you might want to look into different platforms. Another interesting fact is that, percentage-wise, HBO Max is a safer choice for those who do not check ratings because the majority of its content seems to be quality or high-quality content.

However what if these good movies and TV shows are all old, and you don't want to watch classic movies? Let's examine if there is any trend in high-quality content. From now on, I will only look into high-quality content, as those movies and TV shows may be the content that the audience is interested in.

Q. Is There A Data Trend In The Count of HQ Content?

It seems that the general public mostly enjoys the content made in the 2000s. Also, we can easily notice the increasing trend in the count of high-quality Netflix content, which may be the result of Netflix's focus on creating their original content.

Now that we have an understanding of the general trend of high-quality content for each service, I want to break it down by different genres. Since the target audience for this project is those using the streaming services on a daily/weekly basis, I want to focus on the high-quality content made in the 2000s as the audience may be more interested in relatively newer content.

Q. How Does The Genre Breakdown Look For HQ Content Made In The 2000s?

While 27 genres are listed in, the results presented below will only cover some of the highlights of the genre breakdown.

  • Movie Genres

In general, the genre breakdown of movies shows that Amazon Prime is the leading platform for movies in different genres. Especially the presence Amazon Prime has in the documentary and independent movie genres is not comparable with other services.

Therefore, documentary fans should definitely check out Amazon Prime. Compared to Amazon Prime, Netflix is typically doing well with its stand-up talk, animation, and anime movies. We can also notice that, unsurprisingly, Disney Plus has many highly-rated animations. Moreover, one of the most interesting observations made from the above plot is that HBO Max has a tight competition with Amazon Prime for science-fiction movies.

  • TV Show Genres

In contrast to the genre breakdown of movies, Netflix tends to be the leading platform for high-quality TV shows made in the 2000s for most of the genres. Netflix specifically has strong content in crime and family genre TV shows. However, Amazon Prime is still the top 1 platform for documentary TV shows just as it is for documentary movies.

What is also worthwhile to mention from the above figure is the strong competition Hulu has in the TV show market. Especially for animation and anime TV shows, Hulu tends to have a tight match with Netflix. This is interesting because Hulu was the weakest platform when it comes to animation and anime movies. Moreover, Hulu is the best platform for reality TV shows. We can again confirm from the genre breakdown that Hulu focuses on TV shows rather than movies.

Q. How Much Content Do You Get For One Dollar?

Now that we have thoroughly examined the nature of each streaming site, let's look into the cost efficiency of the services. How many movies or TV shows are you getting with the one dollar you paid for a subscription? It seems that, if you are into well-rated movies, you should subscribe to Amazon Prime.

You get over 200 movies per dollar on Amazon Prime, while you only get over 70 movies per dollar on Netflix, which is relatively less content. On the other hand, TV series lovers should subscribe to Netflix. But they should also pay attention to Hulu and Amazon Prime because Netflix has a tight match with them for TV shows. As a result, further checking the count of shows by genres is recommended to further understand which service is a good fit.


In this article, I compared and visualized the content from five major streaming services after scraping over 27,000 movies and TV shows from My project came to the conclusion that Amazon Prime is the leading platform for movies, while Netflix is that for TV shows. However, one might want to further explore the genre breakdown when it comes to TV shows as several platforms seem to have a tight match in the TV show market.

For further approach, I may attempt to compare classic and contemporary content or analyze the content that appears in multiple services. I could also build a dashboard to help users search and filter the content by different criteria. Thank you for reading my blog post and please stay tuned!

About Author

Ryan Park

Ryan is a military-trained and detail-oriented data science professional who combines well-honed leadership and research skills with data science techniques to create informative translations of real-world data that generate business value. He currently works at YipitData as a...
View all posts by Ryan Park >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI