Data Study on Television Trends as a Social Indicator

Posted on Feb 19, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Emil Parikh. He is currently in the NYC Data Science Academy 12-week, full-time Data Science Bootcamp program taking place between January 9th to March 31st, 2017. This post is based on his second class project - Web Scraping.

Links:   GitHub   |   App

 Introduction

There are various indicators in disciplines such as economics and politics that measure the state of different aspects of their fields. That is why—when events around the country in the past few years have caused people to question the state of the US and how surprised they are about "who this country is"—I am surprised there is no indicator that can tell us who we are and where we are going socially as a country; data shows there are a collection of indicators that describe the social environment in terms of such things as poverty, obesity and suicide rates, but these largely describe outcomes and consequences rather than preferences and personality.

Spoiler Alert! A full solution to such a complicated task is beyond the scope of this project; a full solution would require multiple scraping projects and continued feedback from professionals in social psychology. I will address this again in the next steps section. Instead, I used this time to take a first step in building a social indicator by scraping and visualizing information about television shows.

 

Data Collection

I used scrapy and IMDbPY to gather television data from Wikipedia and IMDb respectively. There was some information I could only get from Wikipedia and some only from IMDb.

Data Study on Television Trends as a Social Indicator

While show titles could be found in both, I needed to scrape them off of Wikipedia in order to

  • retrieve the Wikipedia URLs for the shows in order to get the network information and
  • specify in IMDbPY which shows I wanted information for

 

Screenshots of two Wikipedia pages I scraped TV show titles and URLs from:

 showlist1Data Study on Television Trends as a Social Indicator

 

Screenshots of a Wikipedia show page from which I retrieved information:

 Data Study on Television Trends as a Social Indicatorshowinfo2

 

For fields common to both Wikipedia and IMDb such as genre and start/end date, I still retrieved their information from Wikipedia; Once the scraping was finished, I filled in any missing data by collecting the same information from IMDb along with IMDb rating and number of votes.

 

A sample of my Wikipedia TV show scraper

 

Using IMDbPY to get information about TV shows using show titles gathered from Wikipedia as the search term:

 

Data Visualization and Analysis

 In the app, I have visualizations on

  • count of new shows created
  • median IMDb rating of new shows
  • median number of years shows ran for
  • total number of votes on IMDb

This information is displayed for each year from the 1940s until 2016 by genre and by network.

Screenshots of some of the visualizations:

Count of new shows by genre from 1940s to 2016:

comedy dramareality2

Count of new shows by network from 1940s to 2016:

abc cbs nbc

 

What we can get out of the genre plots is that the networks and show creators believe that audiences want more comedies and reality shows (shows that tend to require less thinking). Dramas have not spiked up as much. While the shows created in these genres have been on a consistent rise, the number of shows created by the major networks has been on a decline since the mid-1980s. I will need to look into this further.

Next Steps

TV show data alone is not enough to answer "who are we as a society?", especially without viewership data. Some future steps I would take to build upon this project are:

  • Scrape more lists of TV shows; it seems that the lists of TV shows I scraped may not have been thorough for 2015 and 2016.
  • Obtain numbers on the audience side, such as viewership of shows/genres/networks in order to get a better sense of audience preferences rather than just the creators' and networks' predictions of audience preferences
  • Compare various data (like genre, viewership) of traditional networks with streaming services such as Netflix along with viewership, as this may give a sense of demographic contributors
  • Include movies, music, books, magazines, news, etc to the analysis since one alone will not capture society
  • Expand beyond entertainment. Include the trend in degrees and jobs.

About Author

Emil Parikh

Data Scientist with professional experience in web scraping, predictive modeling, data visualization, and big data with intensive software development experience. Strength in interpreting and converting business needs into solutions. Quick learner and thorough planner with a passion for...
View all posts by Emil Parikh >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI