Netflix: Scraping & Uncovering Predictors of Netflix Members

Fred (Lefan) Cheng - 程乐帆

Posted on Feb 1, 2020

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Project Code | Linkedin | Github | Presentation | Slides | Email: fredchengnyc@gmail.com

Introduction and Motivation

Netflix is a fantastic company. Its shares have steadily risen over ten years, with over 40 times growth since 2012. A $1,000 investment made on Jan 2007 would have been worth more than $110,000 in April 2019.

As an industry leader of video streaming services, Netflix has invested a whopping $13 billion on streaming content in 2018, comprising around 85% of the total spending. Keeping up high-quality original content is one of the core capabilities that keeps Netflix ahead of its competition. Its investment in original content has paid off. Looking further into what contributes to Netflix’s phenomenal success was the motivation for this project, which collects and explores the relevant data from Netflix Original.

By looking at the Netflix Quarter report of 1st Q in 2019 from their official website, we can see that the revenue from paid memberships comprises 98.22% of total revenue, which is pretty much the same for all quarters. That indicates that Netlix’s profit model centers around a single source. Therefore, being able to forecast paid memberships would be highly valuable since it can be a reliable indicator of Netflix's revenue and profit, which can directly influence the stock price.

NetflixData Resources

The data are collected using Scrapy (Python) from IMBD - Shows sorted by Netflix as distributors, Wikipedia - List of Netflix Original programming and film, and Netflix Media Center - Upcoming shows, with variables like Title, Genre, Premiere of each season, Length, Language, Distribution, Number of reviews, Count of rating, Average rating, etc.

After cleaning and merging, I got 565 rows and did a series of analyses, including the relationship between independent and dependent variables.

Netflix Key Findings

There is a perfect and stable growing pattern of quarter paid membership that matches with the quarter revenue, indicating that paid membership does act as a primary driver of revenue, which proves my hypothesis..

Taking a look at the pairwise correlation between independent and dependent variables, I found a positive and nearly linear correlation between the number of released shows and paid membership in quarters, which can be a strong predictor.

netflix

Since Netflix will release upcoming shows in the following months and quarters at the media center, we can utilize this predictor together with other relevant predictors to build models that can forecast the company’s profit., This is the approach employed by hedge funds to make predictions, as attested to by a friend of mine who works at one.

The other explorations Netflix:

Urban fantasy, political, thriller, and science fiction/thriller are the most popular genres of Netflix originals. If the upcoming released shows are include these popular genres, we can consider assigning them greater weight in the prediction.

netflix

English dominates and is followed by Spanish and Hindi. Spanish and Hindi may be the submarkets that they are investing in based on languages.

March, April, and May are the most productive months.

netflix

In future work, I may find popular and productive months for different genres. For instance, animation and cartoons can be more prevalent in the vacations of schools, and the reviews can be more from their parents.

netflix

Future works

I would find more meaningful variables to test their correlation with the paid membership and build up a prediction model based on them, improve the web scraping code to catch more completed datasets and do more data analysis.

About Author

Fred (Lefan) Cheng - 程乐帆

Fred Cheng is a certified data scientist who is working as a data science consultant in Zenon. He owns a Master’s Degree in Management and Systems from New York University with a bachelor’s in business management from The...

View all posts by Fred (Lefan) Cheng - 程乐帆 >

Python

Can the data from EA's FIFA Potential Rating Help Bettors?

Data Visualization

Using Data to Get Cats Adopted on petfinder.com

Data Visualization

Wine 101: Gathering Data From Vivino

Python

Using Data to Analyze The Library of Audible

Web Scraping

DATA STUDYING THE LABOR MARKET DURING A PANDEMIC

No comments found.

Netflix: Scraping & Uncovering Predictors of Netflix Members

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Project Code | Linkedin | Github | Presentation | Slides | Email: fredchengnyc@gmail.com

Introduction and Motivation

NetflixData Resources

Netflix Key Findings

The other explorations Netflix:

Future works

About Author

Fred (Lefan) Cheng - 程乐帆

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Netflix: Scraping & Uncovering Predictors of Netflix Members

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Project Code | Linkedin | Github | Presentation | Slides | Email: fredchengnyc@gmail.com

Introduction and Motivation

NetflixData Resources

Netflix Key Findings

The other explorations Netflix:

Future works

About Author

Fred (Lefan) Cheng - 程乐帆

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!