Netflix Titles Analysis using R Shiny
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Home entertainment is one of the most growing and profitable businesses of all time. reaching up to 80 billion $ annually, from the early days of movie rentals to the current day of streaming media. Netflix was the one who introduced the concept of streaming and flipped the market, destroying any old-fashioned competition on its way and becoming the new buster!
The home entertainment market has been growing rapidly, and many companies jumped into the competition, even though, Netflix is still one of the big dominants in the market. that's why I decided to have a look and analyse the movies and Tv-shows dataset provided by Netflix to customers all over the world.
the dataset is provided and downloaded from Kaggle through the link: Netflix Dataset, it includes a total of 8,807 rows and 12 columns.
I have built a small R Shiny App to preview and analyze the dataset, which is an R package that makes it easy to build an interactive web app straight from the R programming language.
the App can be accessed easily through the link: Netflix R Shiny App, the app itself explains a lot about the process done and the findings that were produced from the dataset and here are some of them.
In this section, I concentrate on some findings I found to be interesting about the Netflix dataset based on different criteria.
By the Type :
the dataset is mainly split into two types of media: Movies ( 6128 titles) and Tv-shows (2676) with the first type having a triple concentration more than the second one as we can see in the following chart.
By Genre :
There are so many genres these days, I focused on the top 20 most and least popular genres in the Netflix media set.
As shown in the charts above, International Movies and TV Shows are showing up as the dominant genre in both Movies and TV Shows, followed by Dramas and Comedies. Where the least desired genres in Movies are LGBTQ, Sports, Sci-fi & Fantasy. And the least desired genres on Tv-Shows are Classic and Culture,|Stand-up comedy, Sci-Fi and Fantasy.
By Time :
It seems like 2016 was the year that production took off, the rate of both Movies and Tv-Shows production increased exponentially ; (113 to 6128) for Movies, and (50 to 2676 ) for Tv-Shows in just five years ( 2016-2021) which is (%5423 and %5352 increase consequently ).
By Media Rating :
Most produced content is TV-MA ( Mature Audience ) rated, then TV-14 which is for 14+ years of age.
By Country :
The United States is the leading country in content production followed by India and the United Kingdom. On the other hand, Japan, South Korea, Taiwan and Singapore are producing more Tv-shows than Movies as demonstrated below :
By Duration :
India has the longest avg movie duration (127 min) which makes sense, as we know their movies tend to be long in general but what was interesting was that the following countries were South Korea (111 min), then Taiwan (106 min) !!
As A data scientist, I took the initiative to further investigate what is happening with the movie duration in South Korea and Taiwan, since we know that these countries are producing more Tv-shows than Movies I decided to check the duration of the Tv-shows produced by these two countries, and as I expected 90% of Tv-Shows produced in these countries tend to end in only one season which gives the idea that maybe it's popular over there to end the story in a short period for audiences.
Recommendation and future work
The Recommendation :
Regarding far east countries, especially South Korea and Taiwan, it might be a good idea to try to produce shorter Movies since the production is concentrating on mostly Tv-Shows with a short duration (only one season).
Future Work :
I plan to integrate the current dataset with IMDB or Rotten Tomatoes and add more attributes to the research and analysis like (User and Critics Ratings). And also try to model the system into and create a prediction system using ML and maybe regression.