One Disney to rule them all?

Posted on May 21, 2018


Disney's acquisitions over the years reinvigorated the company’s force in the film industry. As you can see in the highlighted table below, nine out of the 15 highest grossing movies are from Disney. The question is: are they really above average compared to other production companies or those films are just outliers? Do Pixar, Marvel and Lucasfilm have a considerable impact on its outcome?

To answer those questions, I decided to scrap the IMDB website to gather information from movies from 2010 to 2017. For each movie, I saved the title, year, budget, worldwide gross, USA gross, opening weekend gross and genres. I used Scrapy (Python Web Crawling Framework) to achieve that task.



Even though many production companies were scraped, I selected the top six companies, which produced 505 movies, for analysis :

  • Walt Disney Pictures
  • Warner Bros.
  • Twentieth Century Fox
  • Universal Pictures
  • Columbia Pictures
  • Paramount Pictures

Doing some basic analysis, I discovered missing values on some of the features. The movies that didn't have any box office information were removed from the dataset, leaving us with 452 movies. The ones that didn't have the worldwide gross could be implied using the USA gross since the correlation between those variables is around 0.93. Building a simple linear regression model derived the worldwide gross from the USA gross. This model was used to predict 63 missing values.

I also created a new variable called net worldwide income to show the difference between the gross and budget amounts.



The boxplot below shows the distribution of all movies per production company. We can clearly see that Disney has a higher average than the others. Based on the large interquartile range, Disney has also more variability than the other production companies. The scatter points on the side of each boxplot indicate that the distributions are right-skewed. For that reason, I had to use the Box-Cox transformation to perform a hypothesis test. Analysing the result of the test, I can conclude that Disney has a statistically significant difference in the average gross than the other big companies.

Examining the companies per year, we can see that Columbia Pictures had made most movies in the beginning of years analyzed; however, later on Warner Bros. and Universal Pictures were alternating the leadership. Even though Disney doesn't have the highest number of films, it is the one that has the highest total worldwide gross amount.


Trying to understand how Disney managed to beat the record of total worldwide gross in 2016, I analyzed their movies over the years considering their subdivisions. From the bubble graphs below we see that many of the Pixar, Marvel and Star Wars movies have greatly positively influenced Disney revenue. The size of the bubble shows the difference between the gross and budget to show which had the biggest net return. The link to the dashboard is at the end of the post and can shows interactively what each bubble represents and additional info.

In the year 2016 alone, we can see that every subdivision from Disney had released a major film. Under Marvel, it released Captain America: Civil War and Doctor Strange. Pixar released Finding Dory. From Lucasfilm, we had Rogue One: a Star Wars Story. Disney Animation had released two additional movies: Moana and Zootopia.

Future Work

Most of the production companies have divisions and subsidiaries. That could be a problem in how they are represented. For some movies, IMDB didn't include the parent company in the list of producers. To make up for that, Wikipedia can be scraped to gather the parent information of each subdivision for more accurate results.

It’s also possible to apply analysis to the distribution of the films over the year and try to extract some insights from there. For example, see how each production company makes its yearly planning.



Disney has been leading the box office war against other major production companies, and it will probably continue to. The indications for 2018 are good for Disney. Two Marvel movies are already in the top 10 box office list (Black Panther and Avengers: Infinity War). The Avengers movie reached 1 billion dollars in record time (10th day). Solo: A Star Wars Story (Lucasfilm) is anticipated to open to record-breaking numbers over Memorial Day. Pixar is releasing the second Incredibles movie in summer. And Wreck-It-Ralph (Disney Animation) is due at at the end of the year. With such a lineup, it’s possible that Disney will beat its own record this year.


Plotly Dashboard

Code in GitHub

About Author

Guilherme Strachan

Guilherme Strachan is a software developer but making his way to Data Science field. He has a Master Degree in Electrical Engineering with an emphasis in Computational Intelligence. He is skilled in problem solving, machine learning models and...
View all posts by Guilherme Strachan >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI