Netflix Titles Analysis using R Shiny

Posted on Aug 15, 2022

The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Home entertainment is one of the most growing and profitable businesses of all time. reaching up to 80 billion $ annually, from the early days of movie rentals to the current day of streaming media. Netflix was the one who introduced the concept of streaming and flipped the market, destroying any old-fashioned competition on its way and becoming the new buster!

The home entertainment market has been growing rapidly, and many companies jumped into the competition, even though, Netflix is still one of the big dominants in the market. that's why I decided to have a look and analyse the movies and Tv-shows dataset provided by Netflix to customers all over the world.

the dataset is provided and downloaded from Kaggle through the link: Netflix Dataset, it includes a total of 8,807 rows and 12 columns.

I have built a small  R Shiny App to preview and analyze the dataset, which is an R package that makes it easy to build an interactive web app straight from the R programming language.

the App can be accessed easily through the link: Netflix R Shiny App, the app itself explains a lot about the process done and the findings that were produced from the dataset and here are some of them.

The Analysis

In this section, I concentrate on some findings I found to be interesting about the Netflix dataset based on different criteria.

By the Type :

the dataset is mainly split into two types of media: Movies ( 6128 titles) and Tv-shows (2676) with the first type having a triple concentration more than the second one as we can see in the following chart.

By Genre :

There are so many genres these days, I focused on the top 20 most and least popular genres in the Netflix media set.


As shown in the charts above, International Movies and TV Shows are showing up as the dominant genre in both Movies and TV Shows, followed by Dramas and Comedies. Where the least desired genres in Movies are LGBTQ, Sports, Sci-fi & Fantasy. And the least desired genres on Tv-Shows are Classic and Culture,|Stand-up comedy, Sci-Fi and Fantasy.

By Time :

It seems like 2016 was the year that production took off, the rate of both Movies and Tv-Shows production increased exponentially ; (113 to 6128) for Movies, and (50 to 2676 ) for Tv-Shows in just five years ( 2016-2021) which is  (%5423 and %5352 increase consequently ).

By Media Rating :

Most produced content is TV-MA ( Mature Audience ) rated, then TV-14 which is for 14+ years of age.

By Country :

The United States is the leading country in content production followed by India and the United Kingdom. On the other hand, Japan, South Korea, Taiwan and Singapore are producing more Tv-shows than Movies as demonstrated below :

By Duration :

India has the longest avg movie duration (127 min) which makes sense, as we know their movies tend to be long in general but what was interesting was that the following countries were South Korea (111 min), then Taiwan (106 min) !!

As A data scientist, I took the initiative to further investigate what is happening with the movie duration in South Korea and Taiwan, since we know that these countries are producing more Tv-shows than Movies I decided to check the duration of the Tv-shows produced by these two countries, and as I expected 90% of Tv-Shows produced in these countries tend to end in only one season which gives the idea that maybe it's popular over there to end the story in a short period for audiences.

Recommendation and future work

The Recommendation :

Regarding far east countries, especially South Korea and Taiwan, it might be a good idea to try to produce shorter Movies since the production is concentrating on mostly Tv-Shows with a short duration (only one season).

Future Work :

I plan to integrate the current dataset with IMDB or Rotten Tomatoes and add more attributes to the research and analysis like (User and Critics Ratings). And also try to model the system into and create a prediction system using ML and maybe regression.

About Author

Al Mutasim Bakathir Al Kindi

A data scientist from Oman
View all posts by Al Mutasim Bakathir Al Kindi >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI