IMDB Top 1000 Films Diver
The illustrious IMDB Top 250 is a catalog of the highest rated movies ever produced. It includes cinematic masterpieces such as The Shawshank Redemption, Schindler’s List, and The Wizard of Oz, as well as masterful series like the Godfather trilogy, Christopher Nolan’s Batman films, and the Star Wars series. Action movies, dramas, comedies, romance, horror; all lumped together to comprise one phenomenal list of films to help any movie buff find their next obsession. To help its users navigate this list, IMDB offers several filters to narrow searches by genre, actor, release date, and more. What it lacks, however, is a breakdown of performance by these filters. "Who is the greatest actor/director of all time?" That question has been endlessly debated, yet there are still no good tools to provide film enthusiasts with a clear answer. To help fill this gap, I created an app using R Shiny to allow users to investigate the top films of all time using their own criteria.
The data used to build this app was found on Kaggle; it comprises the top 1000 Films by rating on IMDB spanning the years 1920-2020. It holds information such as the release year, director, top four actors/actresses, and up to three genre tags. It also lists the official IMDB score, as well as the gross earnings of the film.
Navigating the App
Upon opening the app, the user will be presented with an overview page that provides a simple guide to navigating the app and an explanation of its purpose. The instructions will be reiterated here. The app consists of three main sections that allow the user to investigate specific aspects of the dataset: Genre Performance, Actor/Actress Investigation, and Director Investigation.
The Genre Performance section provides information about the popularity and profitability of genres over time. It is further segmented into four additional tabs: Film Count, Gross Profit, Ratings, and Top Film Search. The first three tabs all share the same layout, including checkboxes for genre selection and a slide bar for release year. The resultant graph for each page demonstrates a key aspect of a genre’s performance over time, which is particularly useful for anyone looking to identify long term trends in genre production. For example, the changeover in popularity between western and sci-fi movies can be observed by selecting those two genres on each page.
The graph makes it clear that the heyday of western films was between 1950 and 1975, whereas sci-fi really took off later, around 1980. The Film Count and Ratings tabs particularly showcase each genre's most popular time frame. It’s reasonable to assume that the shift in popularity between these two genres could be linked to the improvement of technology over time, as westerns are relatively easy to film compared to sci-fi that usually requires more special effects. The Gross Profit tab also provides insight into how filmmaking has adapted over time and how lower budget films of the mid 20th century brought in less revenue than box office hits of today. The final tab in the Genre Performance section is a tiered list of movies separated by genre. This tab allows users to investigate top-performing movies in each genre.
The Actor and Actresses section allows users to investigate the popularity and success of individual actors, so long as the actor appeared in one of the top four roles in the movie (limitation due to the dataset). The section is divided into three tabs: Credits, Statistics, and Top Performance Search. The Credits tab is a list of all films in the IMDB top 1000 associated with each actor or actress. To start, the user would select an actor, such as Tom Hanks, and filter their results using the search bar at the top of the page.
By expanding the field to 25 maximum results, we can see that Tom has been in 14 top 1000 movies! This shouldn’t come as too much of a surprise, considering the stellar career the actor has had.
The Statistics tab gives broader information for each actor or actress’ credits in the IMDB top 1000. This tab allows the user to quickly view the number of credits, top genre, average rating and gross profit per film, as well as the top rating and gross profit film. At the bottom of the page are two plots showing the distribution of gross profit and rating for the selected actor or actress’ films over the length of their career.
Again, using Tom Hanks as an example, we observe the statistics of the actor’s illustrious career: 14 credits, top genre of drama, 8.02 average rating, over $210 million average profit, top rated film of 8.8 in Forrest Gump, and a top grossing movie at over $434 million in Toy Story 4. The plots showcase a curious trend for Hanks. His first 10 top film credits span from the early 90’s to the early 00’s. There is then a marked gap before a resurgence in the early 10’s. Further investigation shows that Hanks began to direct a larger majority of his films in the gap between his top films, limiting his availability to contribute as an actor.
The final tab in the Actor and Actresses section is a comparison search that allows the user to find the top-performing actors and actresses over a given time period, filtered again by genre. For any movie buff looking to answer the question, "Who is the top action star of all time?" this tool could prove to be quite useful. Using the genre dropdown, as well as the year slide, the user can filter any of four tables to find the top actor or actress by number of credits, average film rating, average gross profit, or total gross profit. For action, we see the credit king is Harrison Ford (possibly due to the actor continuing to work into his eighties and reappearing in popular series like Star Wars and Indiana Jones several decades after they first launched). The top average rating belongs to Aaron Eckart, top average gross profit to Daisy Ridley, and total gross profit goes to Iron Man himself, Robert Downey Jr.
The final section is the Directors section. It functions like the Actor and Actresses section and features the same three tabs: Credits, Statistics, and Top Performance Search. Each of these tabs operates in the same manner as their counterparts in the previous section. Unfortunately none of the films Tom Hanks directed appear in the top 1000 Films, so we will switch our focus to Christopher Nolan.
Nolan has 8 film credits as seen in the Credits and Statistics tabs. His average rating is a lofty 8.46, with a top genre of drama and average gross profit north of $242 million. Coming as no shock to fans of his Batman trilogy, the second film, The Dark Knight, is both his top-rated and highest-grossing movie at a score of 9 and a profit of nearly $535 million. In the Top Performance Search tab, we can investigate the drama category to see how Nolan compares to his fellow directors. The total credits crown is taken by Martin Scorsese with an outstanding 36 top 1000 films, while Nolan does not even crack the top 10. He does, however, land in each of the three other top tens, coming in 5th in average rating, 10th in average gross, and 2nd only to Peter Jackson in total gross all time for drama films.
Conclusion and Consideration for Future Work
The IMDB Top 1000 Films Diver provides a tool for movie enthusiasts to investigate trends in popularity for their favorite genres, actors/actresses, and directors. Using both visual features as well as filterable lists, the app gives the user the ability to create individual searches to answer specific questions they may have about this esteemed list of films.
To continue to improve upon this project, more data could be added. In a future version of this app, a web scraping program could be used to add more data directly from the IMDB website. Such a feature could also allow continuous updating from the website to constantly include the most up to date blockbuster movies. Additionally, machine learning techniques could be applied to add additional features. One such feature could be a movie recommendation page where the user provides a list of movies and the app returns a list of suggested films based off the user's input.