Data Study on Television Trends as a Social Indicator

Emil Parikh

Posted on Feb 19, 2017

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Emil Parikh. He is currently in the NYC Data Science Academy 12-week, full-time Data Science Bootcamp program taking place between January 9th to March 31st, 2017. This post is based on his second class project - Web Scraping.

Links: GitHub | App

Introduction

There are various indicators in disciplines such as economics and politics that measure the state of different aspects of their fields. That is why—when events around the country in the past few years have caused people to question the state of the US and how surprised they are about "who this country is"—I am surprised there is no indicator that can tell us who we are and where we are going socially as a country; data shows there are a collection of indicators that describe the social environment in terms of such things as poverty, obesity and suicide rates, but these largely describe outcomes and consequences rather than preferences and personality.

Spoiler Alert! A full solution to such a complicated task is beyond the scope of this project; a full solution would require multiple scraping projects and continued feedback from professionals in social psychology. I will address this again in the next steps section. Instead, I used this time to take a first step in building a social indicator by scraping and visualizing information about television shows.

Data Collection

I used scrapy and IMDbPY to gather television data from Wikipedia and IMDb respectively. There was some information I could only get from Wikipedia and some only from IMDb.

Data Study on Television Trends as a Social Indicator

While show titles could be found in both, I needed to scrape them off of Wikipedia in order to

retrieve the Wikipedia URLs for the shows in order to get the network information and
specify in IMDbPY which shows I wanted information for

Screenshots of two Wikipedia pages I scraped TV show titles and URLs from:

Screenshots of a Wikipedia show page from which I retrieved information:

Data Study on Television Trends as a Social Indicator showinfo2

For fields common to both Wikipedia and IMDb such as genre and start/end date, I still retrieved their information from Wikipedia; Once the scraping was finished, I filled in any missing data by collecting the same information from IMDb along with IMDb rating and number of votes.

A sample of my Wikipedia TV show scraper

Using IMDbPY to get information about TV shows using show titles gathered from Wikipedia as the search term:

Data Visualization and Analysis

In the app, I have visualizations on

count of new shows created
median IMDb rating of new shows
median number of years shows ran for
total number of votes on IMDb

This information is displayed for each year from the 1940s until 2016 by genre and by network.

Screenshots of some of the visualizations:

Count of new shows by genre from 1940s to 2016:

comedy drama reality2

Count of new shows by network from 1940s to 2016:

abc cbs nbc

What we can get out of the genre plots is that the networks and show creators believe that audiences want more comedies and reality shows (shows that tend to require less thinking). Dramas have not spiked up as much. While the shows created in these genres have been on a consistent rise, the number of shows created by the major networks has been on a decline since the mid-1980s. I will need to look into this further.

Next Steps

TV show data alone is not enough to answer "who are we as a society?", especially without viewership data. Some future steps I would take to build upon this project are:

Scrape more lists of TV shows; it seems that the lists of TV shows I scraped may not have been thorough for 2015 and 2016.
Obtain numbers on the audience side, such as viewership of shows/genres/networks in order to get a better sense of audience preferences rather than just the creators' and networks' predictions of audience preferences
Compare various data (like genre, viewership) of traditional networks with streaming services such as Netflix along with viewership, as this may give a sense of demographic contributors
Include movies, music, books, magazines, news, etc to the analysis since one alone will not capture society
Expand beyond entertainment. Include the trend in degrees and jobs.

About Author

Emil Parikh

Data Scientist with professional experience in web scraping, predictive modeling, data visualization, and big data with intensive software development experience. Strength in interpreting and converting business needs into solutions. Quick learner and thorough planner with a passion for...

View all posts by Emil Parikh >

Machine Learning

Beware of Feature Importance for Business Decisions

Capstone

LendingClub Grade Optimization

Data Visualization

Ames Iowa Home Sale Prediction

Data Visualization

Python Shows Factors Influencing University Retention Rates

Machine Learning

Boosting Real Estate Decisions

Cancel reply

You must be logged in to post a comment.

No comments found.

Data Study on Television Trends as a Social Indicator

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Emil Parikh. He is currently in the NYC Data Science Academy 12-week, full-time Data Science Bootcamp program taking place between January 9th to March 31st, 2017. This post is based on his second class project - Web Scraping.

Links: GitHub | App

Introduction

Data Collection

A sample of my Wikipedia TV show scraper

Data Visualization and Analysis

Next Steps

About Author

Emil Parikh

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Study on Television Trends as a Social Indicator

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Emil Parikh. He is currently in the NYC Data Science Academy 12-week, full-time Data Science Bootcamp program taking place between January 9th to March 31st, 2017. This post is based on his second class project - Web Scraping.

Links: GitHub | App

Introduction

Data Collection

A sample of my Wikipedia TV show scraper

Data Visualization and Analysis

Next Steps

About Author

Emil Parikh

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!