One Disney to rule them all?

Guilherme Strachan
Posted on May 21, 2018

Motivation

Disney's acquisitions over the years reinvigorated the company’s force in the film industry. As you can see in the highlighted table below, nine out of the 15 highest grossing movies are from Disney. The question is: are they really above average compared to other production companies or those films are just outliers? Do Pixar, Marvel and Lucasfilm have a considerable impact on its outcome?

To answer those questions, I decided to scrap the IMDB website to gather information from movies from 2010 to 2017. For each movie, I saved the title, year, budget, worldwide gross, USA gross, opening weekend gross and genres. I used Scrapy (Python Web Crawling Framework) to achieve that task.

 

Data

Even though many production companies were scraped, I selected the top six companies, which produced 505 movies, for analysis :

  • Walt Disney Pictures
  • Warner Bros.
  • Twentieth Century Fox
  • Universal Pictures
  • Columbia Pictures
  • Paramount Pictures

Doing some basic analysis, I discovered missing values on some of the features. The movies that didn't have any box office information were removed from the dataset, leaving us with 452 movies. The ones that didn't have the worldwide gross could be implied using the USA gross since the correlation between those variables is around 0.93. Building a simple linear regression model derived the worldwide gross from the USA gross. This model was used to predict 63 missing values.

I also created a new variable called net worldwide income to show the difference between the gross and budget amounts.

 

Analysis

The boxplot below shows the distribution of all movies per production company. We can clearly see that Disney has a higher average than the others. Based on the large interquartile range, Disney has also more variability than the other production companies. The scatter points on the side of each boxplot indicate that the distributions are right-skewed. For that reason, I had to use the Box-Cox transformation to perform a hypothesis test. Analysing the result of the test, I can conclude that Disney has a statistically significant difference in the average gross than the other big companies.

Examining the companies per year, we can see that Columbia Pictures had made most movies in the beginning of years analyzed; however, later on Warner Bros. and Universal Pictures were alternating the leadership. Even though Disney doesn't have the highest number of films, it is the one that has the highest total worldwide gross amount.

 

Trying to understand how Disney managed to beat the record of total worldwide gross in 2016, I analyzed their movies over the years considering their subdivisions. From the bubble graphs below we see that many of the Pixar, Marvel and Star Wars movies have greatly positively influenced Disney revenue. The size of the bubble shows the difference between the gross and budget to show which had the biggest net return. The link to the dashboard is at the end of the post and can shows interactively what each bubble represents and additional info.

In the year 2016 alone, we can see that every subdivision from Disney had released a major film. Under Marvel, it released Captain America: Civil War and Doctor Strange. Pixar released Finding Dory. From Lucasfilm, we had Rogue One: a Star Wars Story. Disney Animation had released two additional movies: Moana and Zootopia.

Future Work

Most of the production companies have divisions and subsidiaries. That could be a problem in how they are represented. For some movies, IMDB didn't include the parent company in the list of producers. To make up for that, Wikipedia can be scraped to gather the parent information of each subdivision for more accurate results.

It’s also possible to apply analysis to the distribution of the films over the year and try to extract some insights from there. For example, see how each production company makes its yearly planning.

 

Conclusions

Disney has been leading the box office war against other major production companies, and it will probably continue to. The indications for 2018 are good for Disney. Two Marvel movies are already in the top 10 box office list (Black Panther and Avengers: Infinity War). The Avengers movie reached 1 billion dollars in record time (10th day). Solo: A Star Wars Story (Lucasfilm) is anticipated to open to record-breaking numbers over Memorial Day. Pixar is releasing the second Incredibles movie in summer. And Wreck-It-Ralph (Disney Animation) is due at at the end of the year. With such a lineup, it’s possible that Disney will beat its own record this year.

 

Plotly Dashboard

Code in GitHub

About Author

Guilherme Strachan

Guilherme Strachan

Guilherme Strachan is a software developer but making his way to Data Science field. He has a Master Degree in Electrical Engineering with an emphasis in Computational Intelligence. He is skilled in problem solving, machine learning models and...
View all posts by Guilherme Strachan >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data Book Launch Book-Signing bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp