On Visualizing Hollywood BoxOffice Revenue

Sricharan Maddineni
Posted on Feb 1, 2016

Contributed by Sricharan Maddineni. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

My goal was to analyze the accuracy of news headlines relating to Hollywood; including but not limited to the changes in domestic versus overseas BoxOffice revenue and the marketability of different genres overseas. Specifically, I focused on articles that had little or no visualizations, but drew clear conclusions based on general trends.


Outline

  1. Data
  2. Headlines
  3. Visualizations
  4. Conclusion

Data

I utilized two different websites for my analysis: IMDB and BoxOfficMojo. IMDB datasets were used to aggregate movie ratings, and the BoxOfficeMojo dataset was used for movie finance analysis. These datasets were cleaned and joined to create a single dataset containing movie ratings and revenues.

There was an alternative IMDB dataset which contained aggregate movie ratings and finances, but the IMDB datasets I chose had a subset of 668 IMDB users who all reviewed the same subset of movies. I considered this a more robust dataset since all 668 individual users reviewed the same set of movies, making their ratings more comparable - as opposed to the alternate dataset where each movie had a varying number of users that rated it.


First IMDB dataset containing movie ratings by movieID.

https://gist.github.com/sriyoda/f85e24ce29e2155bb904

before befor             after after

Cleaning second IMDB dataset and then Joining.

https://gist.github.com/sriyoda/ce9de8a499d00027f720

c

before

d

final cleaned IMDB dataset

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BoxOfficeMojo Dataset

https://gist.github.com/sriyoda/47dd6d8e7108792c3e40

f

before

e

after

It was important to clean the years in both datasets because multiple movies had the same name and incorrect matching would occur without joining by year and title.

Final Movie Dataset

https://gist.github.com/sriyoda/e492640a7c1f3de4da70
g


Headline #1

h

“Where Americans once were the only game in town for Hollywood, U.S. audiences are taking a back seat to moviegoers across the globe — particularly in Asia.”

“And foreign markets are getting the industry's highest-profile films first. Battleship opened in Asia and Europe more than a month before it reached the USA last May.

http://usat.ly/ZuKGrn

Visualization

Screen Shot 2016-02-01 at 2.54.57 PM

Conclusion: Foreign revenue has accounted for an increasing percentage of total revenue every year since 1992 as shown by the increasing slope values for the regression lines.

https://gist.github.com/sriyoda/0215caff3f02ef234a4b


Headline #2

i

“Big noisy spectacle travels best. Jason Statham, the close-cropped star of many a mindlessly violent film, is a particular Russian favourite. Films based on well-known literature (including cartoon books) and myths may also fare well.”

“Comedy travels badly: Will Ferrell and Adam Sandler provoke guffaws at home but incomprehension abroad"

http://econ.st/KAk82t

Visualization

sc23

Conclusion: The overseas density plot confirms that Drama and Comedy genres perform worse overseas when compared to domestically. A majority of Drama/Comedy movies generate less than 50 mil overseas, while Action and Animation genres show a much wider distribution of revenues (overseas).

https://gist.github.com/sriyoda/a072ddd0e30166d3deb5


Headline #3

j

“...little effort is being made to deliver sophisticated storytelling ... movies are crafted mainly to provoke visceral - as opposed to intellectual response”

bbc.com/culture/story/20130620-is-china-hollywoods-future

Visualization

k

Conclusion: The rating sweet spot that generates the most revenue is between 3.5 and 3.8. Movies that score greater than 4 show a sharp decline in revenues. This could be due to the fact that the average movie goer more easily appreciates an average movie (cough cough ** Michael Bay movies).

https://gist.github.com/sriyoda/b227078f6e66aeed09fe


Visualization

cev

Conclusion: There seems to be a linear trend between the number of movies a studio produces and it total domestic revenue. This doesn’t have to be the case; for example, Lionsgate produced 5 of the top 100 movies in 2015 and could have generated 100mil in revenue (~50mil actual). This leads me to believe that movie studios do equally well selecting which movies to produce.

https://gist.github.com/sriyoda/e5247fbb40fada999820


 

Visualization

4b

Conclusion: There seems to be a linear trend between how much revenue a movie made on its opening weekend and its lifetime domestic revenue. Hollywood considers opening weekend numbers as a good predictor of how well the movie will perform and this plot supports that theory.

https://gist.github.com/sriyoda/f4e777a83142732efeef


Visualization

44

Conclusion: The Highest Grossing Movie per year accounted for a decreasing percentage of total BoxOffice Revenue. This could suggest that studio's are either making more money per movie, producing more movies or a combination of the two. Further analysis is required from different datasets.

https://gist.github.com/sriyoda/2a874d4c822938c85b5e


Final Thoughts

My data visualizations confirmed many of the conclusions drawn in the news articles. What I found most interesting was how good a predictor opening weekend turns out to be for overall performance and that movie studios are evenly matched in terms of how well they select movies. Since audience reception is such a complex factor to predict, it's surprising that the studios are consistently able to make good decisions.

 

About Author

Sricharan Maddineni

Sricharan Maddineni

Sricharan Maddineni was a Neuroscience undergrad at Rutgers university. He is a professional music producer turned Data Scientist who has worked with major artists like Kid Ink, Dj Mustard, BMG and garnered over 18 million plays. He has...
View all posts by Sricharan Maddineni >

Leave a Comment

Avatar
Google September 15, 2019
Google The time to read or pay a visit to the content or web-sites we've linked to beneath.
Avatar
Google September 14, 2019
Google Usually posts some incredibly fascinating stuff like this. If you’re new to this site.
Avatar
Facebook Hacking Tools October 3, 2016
Thanks for finally writing about >blog topic <Liked it!

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp