Data Visualizing Hollywood BoxOffice Revenue

Sricharan Maddineni

Posted on Feb 1, 2016

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Sricharan Maddineni. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

My goal was to analyze the accuracy of news headlines relating to Hollywood; including but not limited to the changes in domestic versus overseas BoxOffice revenue and the marketability of different genres overseas. Specifically, I focused on articles that had little or no visualizations, but drew clear conclusions based on general trends from data.

Outline

Data
Headlines
Visualizations
Conclusion

Data

I utilized two different websites for my analysis: IMDB and BoxOfficMojo. IMDB datasets were used to aggregate movie ratings, and the BoxOfficeMojo dataset was used for movie finance analysis. These datasets were cleaned and joined to create a single dataset containing movie ratings and revenues.

There was an alternative IMDB dataset which contained aggregate movie ratings and finances, but the IMDB datasets I chose had a subset of 668 IMDB users who all reviewed the same subset of movies. I considered this a more robust dataset since all 668 individual users reviewed the same set of movies, making their ratings more comparable - as opposed to the alternate dataset where each movie had a varying number of users that rated it.

First IMDB dataset containing movie ratings by movieID.

before after

Cleaning second IMDB dataset and then Joining.

Data Visualizing Hollywood BoxOffice Revenue

before

final cleaned IMDB dataset

BoxOfficeMojo Data Set

before

after

It was important to clean the years in both datasets because multiple movies had the same name and incorrect matching would occur without joining by year and title.

Final Movie Data Set

Headline #1

“Where Americans once were the only game in town for Hollywood, U.S. audiences are taking a back seat to moviegoers across the globe — particularly in Asia.”

“And foreign markets are getting the industry's highest-profile films first. Battleship opened in Asia and Europe more than a month before it reached the USA last May.

http://usat.ly/ZuKGrn

Data Visualization

Conclusion: Foreign revenue has accounted for an increasing percentage of total revenue every year since 1992 as shown by the increasing slope values for the regression lines.

Headline #2

“Big noisy spectacle travels best. Jason Statham, the close-cropped star of many a mindlessly violent film, is a particular Russian favourite. Films based on well-known literature (including cartoon books) and myths may also fare well.”

“Comedy travels badly: Will Ferrell and Adam Sandler provoke guffaws at home but incomprehension abroad"

http://econ.st/KAk82t

Data Visualization

Conclusion: The overseas density plot confirms that Drama and Comedy genres perform worse overseas when compared to domestically. A majority of Drama/Comedy movies generate less than 50 mil overseas, while Action and Animation genres show a much wider distribution of revenues (overseas).

Headline #3

“...little effort is being made to deliver sophisticated storytelling ... movies are crafted mainly to provoke visceral - as opposed to intellectual response”

bbc.com/culture/story/20130620-is-china-hollywoods-future

Visualization

Conclusion: The rating sweet spot that generates the most revenue is between 3.5 and 3.8. Movies that score greater than 4 show a sharp decline in revenues. This could be due to the fact that the average movie goer more easily appreciates an average movie (cough cough ** Michael Bay movies).

Visualization

Conclusion: There seems to be a linear trend between the number of movies a studio produces and it total domestic revenue. This doesn’t have to be the case; for example, Lionsgate produced 5 of the top 100 movies in 2015 and could have generated 100mil in revenue (~50mil actual). This leads me to believe that movie studios do equally well selecting which movies to produce.

Data Visualization

Conclusion: There seems to be a linear trend between how much revenue a movie made on its opening weekend and its lifetime domestic revenue. Hollywood considers opening weekend numbers as a good predictor of how well the movie will perform and this plot supports that theory.

Data Visualization

Conclusion: The Highest Grossing Movie per year accounted for a decreasing percentage of total BoxOffice Revenue. This could suggest that studio's are either making more money per movie, producing more movies or a combination of the two. Further analysis is required from different datasets.

Final Thoughts

My data visualizations confirmed many of the conclusions drawn in the news articles. What I found most interesting was how good a predictor opening weekend turns out to be for overall performance and that movie studios are evenly matched in terms of how well they select movies. Since audience reception is such a complex factor to predict, it's surprising that the studios are consistently able to make good decisions.

About Author

Sricharan Maddineni

Sricharan Maddineni was a Neuroscience undergrad at Rutgers university. He is a professional music producer turned Data Scientist who has worked with major artists like Kid Ink, Dj Mustard, BMG and garnered over 18 million plays. He has...

View all posts by Sricharan Maddineni >

Cancel reply

You must be logged in to post a comment.

Google May 7, 2021

Google The details mentioned in the post are a number of the most effective available.

Google April 30, 2021

Google Check below, are some entirely unrelated sites to ours, on the other hand, they are most trustworthy sources that we use.

Google September 15, 2019

Google The time to read or pay a visit to the content or web-sites we've linked to beneath.

Google September 14, 2019

Google Usually posts some incredibly fascinating stuff like this. If youre new to this site.

Facebook Hacking Tools October 3, 2016

Thanks for finally writing about >blog topic <Liked it!

Data Visualizing Hollywood BoxOffice Revenue

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Sricharan Maddineni. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

Outline

Data

Headlines

Visualizations

Conclusion

Data

First IMDB dataset containing movie ratings by movieID.