Moneyball for Movies
Link to the project https://christopherwilson.shinyapps.io/hello_shiny/
Link to github https://github.com/jackparsons93/r_shiny_imdb_movies
Say, you want to produce a movie. What can you do to put the odds of success in your favor? You can look at the features of other successful movies and then plan to apply them to the one you’re producing. That’s the goal of the R Shiny app I built – to help movie makers best decide which movies are going to make money and be audience favorites by identifying what successful movies have in common from the Kaggle dataset of the top 1000 movies on IMDB.
Within the dataset is IMDb Score which is a voting system of IMDb users, metascore which is film reviews from professional critics, number of votes which is the total number of votes from the IMDb community. It also has the records of gross revenue and net gross, which is the sum of the revenue from all movies in a category. The app doesn’t just provide a viewer for actors and directors but identifies the top movies, top directors, top genres, and top actors from the most successful films. You can consider a “Moneyball” approach to movies.
Let’s start off by looking at a couple of word clouds.The top word cloud is the top movies by revenue in the dataset. The top grossing movie in the dataset is “Star Wars: Episode VII.” As you can see, the font is proportional to the amount the movie grossed, so the Star Wars movie is in the largest font. There was no director that outstripped the others to the same degree, which is why no one director’s name stands out as much in the second word cloud.
The word cloud immediately above is directors ranked by IMDb score
Now let’s take a look at some of the navigation features of the app.
The illustration above is the top navigation pane of my application. We have word clouds as we have already seen for the landing page, followed by director viewer, actor viewer and so on.
Here we have the left hand panel of the directors tab where we can select a director and a particular criterion as well.
The above image shows the 4 different criteria we can choose from after selecting a director and a criterion. We will see the bar plot in the right of the panel show bar charts for the directors movies.
This is the bar chart for one of the special directors, Darren Aronofsky. It shows the meta score, (a cumulative score of movies from professional critics) of his 3 movies in the dataset.
The next tab in my R shiny app is the actors tab where we can select any actor from the dataset and check out the same selection criteria as directors. The illustration below shows the results for Tom Cruise, which includes the gross earnings for all his movies in the dataset.
In the next tab of my IMDb movie explorer, we have genre analysis. Below is a bubble chart where the genre is color-coded and the gross revenue of the movie is proportional to the size of the bubble.
As this is consistent with the word cloud shared above, , “Star Wars episode 7” is the biggest bubble. This view shows the details revealed when it is hovered over.
In the next tab of the IMDb movie explorer, we see combined plots that shows 3 different plots: a violin plot of IMDb ratings across different genres, a scatter plot of IMDb score and revenue, and (shown below) a correlation plot between IMDb scores and meta scores.
On the next tab of the IMDb movie explorer we see top movies, and we can select by the same criteria as above: gross, metascore, imdb rating, and number of votes. There is also a slider that allows you to select how many movies you would like the bar chart to show.
Here is shown the top 25 movies by gross.
In the next tab we have top directors. The 5 criteria available for selection are , shown below.
There is also a slider that allows you to choose how many directors you would like to show in the bar chart. Below is an image that shows the top 25 directors by net gross.
In the next tab we have top actors that have the same 5 criteria for selection as directors and has the same slider for selecting how many actors you would like the bar chart to display.
Here we can see the top actor by gross receipts is Robert Downey Jr., followed by Tom Hanks.
In the final tab of my IMDb movie explorer, we have franchise analysis and we can select among Harry Potter, Lord of the Rings, or Star Wars in light of 3 criteria: gross revenue, meta score and IMBd score for all 3 franchises.Here is a screenshot of the Lord of the Rings franchise with meta score selected.
Conclusion
Running this app gives insights into which genres, actors, and directors correlate with higher revenue. On Genres, Sci-fi and adventure have the highest revenue and vote., IMDb score has some correlation to profit but a small one, so better ranked movies do not really make more profit. The directors most associated with success are Steven Spielberg and Anthony Russo. The top-grossing actors are Robert Downey Jr. or Tom Hanks. Of course, you’d also have to have the budget to hire these top names and a role that would interest them.
Where to go from here
I hope that this data will help filmmakers in the future decide which movies to make for profit and also to win the hearts and minds of its viewers. Also, I could use the aggregate of the description of top movies to send to an AI to make new movie ideas perhaps using an LLM. It may be possible in future to have the generative AI offer suggestions for the storyline, script, and even casting based on data analytics. Down the road, these movies may even be produced by individuals who use advances in technology like OpenAI’s Sora to produce movies without actors and sets, opening up new possibilities for the film industry’s evolution.