Data Analysis on Netflix Content

Posted on Feb 2, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

What's on Netflix?

Link to the online interactive data application.

​​​​Netflix has had incredible growth over the last decade, both in terms of the number of subscribers and in terms of revenue generated. The company has demonstrated an ability to exceeded even the most ambitious revenue and subscribers projections time and time again.

As of the end of 2019, Netflix is valued at more than $150 billion and has more than 167 million paid subscribers worldwide.

Netflix is available for viewers to enjoy in 110 countries, with international subscribers fueling most of its impressive rise over the last several years.

But the question is, given all of this impressive 'bottom line' results, what is actually on Netflix? What kind of content do people enjoy so much that it allowed Netflix almost to double their revenue in just 3 years, and to demonstrate $20.15 billion revenue in 2019, with the number of paid subscribers continuing to grow despite a consistent increase in the price of subscription?​​​

​​Does Netflix have more TV shows or Movies? Is there more mature or kids-friendly content? How 'old' is the content on Netflix? Are there more documentaries or comedies, thrillers or actions, stand-ups or dramas? What actors do we see more often on Netflix? Is the average length of a movie gradually increases as consumers have more and more opportunities to enjoy them in the comfort of their home?

The Data

Research is mainly based on the dataset obtained from kaggle.com, which in turn is primarily based on the data collected from the website flixable.com that allows users to browse all the TV Shows and Movies currently available to stream on Netflix in the United States. The dataset contains information gathered in December of 2019 and keeps being updated. Data includes information on the title's name, type (show or movie), director, cast, countries involved in the production, date of availability on Netflix, original release year, rating (MPAA or TV Parental Guidelines), duration, genre, and a brief description of the title.

After being cleaned, restructured and reformatted, the original dataset has been merged with the data on the number of subscribers that Netflix has had at the moment of the title being added to the Netflix streaming library.

Owing to the fact that the majority of the non-original titles on Netflix are being deleted from the resource on a regular basis, the actual share of the content available fro streaming that was added 5 or more years ago is considerably small. Therefore, the decision was made to group all the content added in 2015 and prior, into a single group, in order to give it an increased significance, thus visibility in the future analysis.

Data in the App

The online-accessible interactive app was created using the Shiny Tools Package in R.  The app itself consists of the six main categories (tabs), that visualize various aspects of the analyzed data, as well as two additional tabs where the clean data can be accessed, and the brief description of the research can be viewed.

General

The introduction page consists of a brief overview of some of the general facts regarding Netflix, in addition to the graph illustrating the 'title addition dynamics' reflecting periods during which current titles arrived at Netflix. These graphs allow us to see the structure of content by year added, by month & year added, as well as a pattern of the average number of titles added per month for all the years combined.

Data Analysis on Netflix Content

The most interesting finding from the analysis of when titles were added to Netflix is the fact that 70% of all the content on Netflix has only been added in the last 2 years, and 90% of the content has been added in the last three.

Data Analysis on Netflix Content

Subscribers

The next tab contains an overview of the information on the number of subscribers on Netflix, as well as the dynamics of subscribers base's growth and its correlation with the number of titles added. The bottom of the page contains the graph, which illustrates the variations in the average age of the content being added to Netflix for each month of the last 5 years.

Data Analysis on Netflix Content

Unsurprisingly the number of subscribers has been growing steadily, passing 167 million paid subscribers in 2019.

More than that, the average age of the added content has not shown any significant patterns over the last years, with only occasional increases in the age of Movies and TV Shows being added in the Fall-Winter holiday seasons.

 

Map

The next tab hosts an interactive map, which shows the information on the number of titles with which each country has been involved. Users can select specific continents to take a closer look at and hover a cursor over a country to get an exact number of titles for it.

Just like most of the users expected, the country with most titles is the United States, which has been involved in 36.4% of all the titles available on Netflix. The second most-involved country is India, which surpassed both the UK and Canada in the number of titles produced that can be streamed on Netflix. However, results may not seem as much of a surprise, given India's massive population and a booming moviemaking industry.

Content Type

The following section demonstrates the structure of the content by type - Movie vs. TV show, as well as the dynamics of the proportion of movies and tv shows added over the last five years. The final graph on the page shows an interesting dynamics of the increase in the average length of the movie being added to Netflix.

 

Interestingly 68.4% of the content on Netflix is Movies.

More than that, we can observe an intriguing uptick in the percentage of movies as a share of the total content added happening in 2017. While in 2016, 57.9% of all the content added were movies, in 2017 this number rose to 70.2%.

Finally, looking at the dynamics of the increase in the average length of the movie added to Netflix, we can witness a similar 'uptick' in the average length of the movie in 2017. The average length of the film on Netflix in 2016 was 84 mins, and in 2017 it rose to 97 mins. It demonstrates a 15% movie-length increase in just one year. The same year the number of movies as a share of overall content increased by almost 13%.

Both of the upticks taking place during the same year clearly demonstrate a certain shift in strategy towards the addition of more and of longer movies. A relationship that might be studied deeper with more information available.

Actors

This section contains a list of actors that appear most frequently in Netflix titles.

The app allows users to choose between displaying most popular actors worldwide and most popular US actors, as well as gives users the ability to choose to see up to 25 most popular actors.

By far, the most popular actors on Netflix globally are Asian and, in particular, Indian.

Out of 10 most popular actors on Netflix, 8 are from India, and 2 are from Japan. Having so many Indian actors appearing so often on Netflix is understandable at this stage, as we have observed previously in our analysis that India is the world's second-largest Netflix content producer.

Content Category

The final section uncovers the structure of content by genre and age rating.

In terms of age rating, Netflix offers plenty of content for its main age groups. Most of the content on Netflix is suitable for teenagers (13-14 and above), which constitutes 46% of all the content available on Netflix. However, most of the adults will also find plenty of content, which is suitable only for them, and it not advised for anyone in children or teenagers age groups. This type of content makes up 40.7% of all titles. 'Suitable for children' corresponds to only 9.5% of all available content.

This insight into the structure of the content helps us understand what kind of age groups are most valuable to Netflix in terms of revenue generation.

In the end, we can notice that the most popular genre on Netflix is 'International', with 47% of the content falling in that category. This should not come as a surprise since we have already discerned that 63% of all the content is being produced without the involvement of the United States. Other most popular genres include Dramas (35%), Comedies (25%), Actions (11.6%) and Documentaries (10.7%).

 

Further Opportunities

Due to the nature of the data set and due to the limited scope of the research, a certain amount of meaningful questions could not have been answered. With broader access to data, a number of additional valuable insights can be generated. Some of the issues worth exploring may include:

    • Title 'deletion' dynamics
    • Information on the average life-cycle of the title on Netflix, meaning the amount of time that content is available on Netflix before it is deleted
    • Detailed information on the total number of titles available to stream on Netflix at any given time (each month) since the inception of the company's streaming product
    • Correlation between the membership price increase and membership dynamics
    • Detailed user growth statistics (e.g., weekly, daily) and its correlation with the release of highly-anticipated titles (e.g., Stranger Things, The Crown, The Witcher) as well as correlation with holidays and holiday seasons.
    • Details on differences in title availabilities by country

Thank you!

Thank you for your interest and time that you've devoted to reading my blog post. You can access the app following this link and explore the data on your own. In addition, feel free to 'toggle' some of the data input options in order to receive more focused insights and visualizations on the issue that catches your interest. Please feel free to reach me out via my LinkedIn page and provide feedback or ask any questions that you may have.

Thank you!

About Author

Oleksii Khomov

Data Scientist with a strong analytical background in the fields of marketing, research and management consulting. Experienced in providing actionable insights derived from data to client organizations, including senior management. Oleksii holds a masters degree in marketing from...
View all posts by Oleksii Khomov >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI