The Data Science of Binge-watching: Netflix
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Wanda Wang. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on her third class project - Python Web Scraping (due on the 6th week of the program).
Introduction
When were you "hooked" on House of Cards? The 3rd episode? When did you stop? (Was it when the pet dog passed ?) Amidst this current era of peak TV, shows on Netflix have permeated our everyday lives - taking up our ever-increasing attention. When evaluating new content deals or show renewals, Netflix measures for high engagement - namely, the expected hours of viewing for each single piece of content. The recommendation algorithm behind Netflix even utilizes Collaborative Filtering, which finds content suitable to your unique tastes, based on a similar group of people's tastes. Since I believe that my tastes are quite eclectic - I hope to gather data insights into my own streaming history.
Ove-netflixrview
Working with Python, I applied Selenium to complete the web-scraping task. Selenium simulated user-clicks, allowing me to quickly progress through several web pages in a seamless manner. The steps I followed are outlined below:
1) Login Screen - User Profile
<script src="https://gist.github.com/zelosa/527a5b719a1e8b735ea24ba7eac99d4a.js"></script>
2) Site Navigation - Viewing History
3) URL generation - per unique view
4) Retrieving unique data points
Questions:
How many shows did I binge-watch/ How often did I binge-watch?
Which shows did I binge-watch the most?
Is there a particular genre or actor I stuck to?
Challenges included deciphering the modern web language tags within the site.
Data
Data Analysis
Over the course of the year, I initially indulged in many episodes of Friends. Walking Dead was next. The difference in pace and genre of both shows demonstrates that binging is informally independent of that, as least for me.
Conclusion
As cord-cutters increasingly become more reliant on streaming for their TV fix - we'll likely see more research studies examining the exact moment we become "hooked". Was it a particular genre, number of episodes, unique to the actual content - an action scene, a storyline that held onto our attention?