Filming Locations around New York City - Visualization using Shiny Dashboard
Introduction
Many movies have been filmed in New York City, but it is difficult to get a sense for where in the city these scenes occurred. I used a dataset based on the book Scences from the City by James Sanders, available on NYC Open Data, to create a navigable map which displays this information. This dataset included coordinates, allowing me to pinpoint exact locations. My goal was to visualize where in the city movies were filmed as well as provide additional information from IMDB, such as rating, poster images, and direct IMDB links. For this project I used R to clean and aggregate the data and Shiny to visualize it.
Datasets
Film Locations
Film scene locations for movies filmed in New York were obtained from the NYC Open Data website: https://data.cityofnewyork.us/Business/Filming-Locations-Scenes-from-the-City-/qb3k-n8mm
This dataset is based on the book Scenes from the City by James Sanders: https://www.amazon.com/Scenes-City-Filmmaking-New-York/dp/0847828905
This dataset does not include every movie filmed in New York (which would be difficult, yet interesting to collect!) and does not include movies filmed after 2006. This is a limitation that I hope to address in the future, but is important to keep in mind for the current application.
Kaggle IMDB-5000
This data set was created by Chuan Sun, who scraped data from the IMDB website. The data set provided me with ratings for some of the movies in my dataset: https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset
GGPlot2_movies
I also installed the ggplot2_movies package which allowed me to access the movies dataset, which provided additional IMDB ratings that were not present in the Kaggle data set.
Data Aggregation and Manipulation
Data Cleaning
In order to join the three datasets, the IMDB kaggle dataset required minor cleaning. First, I changed the titles from being factors to strings. After doing so, there was still white space following the title, which would have caused issues when joining the data sets, so this was removed. The film locations dataset also required the column "Year" to be changed to an integer.
Joining Databases
After cleaning, two left joins were performed, with the left table being the Movie Locations dataset, so that no films in the locations data set would be removed.
Shiny Dashboard
Interactive Map with active filtering
The core of the application uses leaflet to allow the user to zoom in and out of a map of New York with markers indicating the locations of scenes from movies. On the side panel, sliders for both the year of release and IMDB score allow the user to filter markers with a great degree of specificity.
Filtering by IMDB Score:
Filtering by Year:
Clicking on marker produces pop-up with information about the movie, and a direct link to the IMDB site for the movie
Database with active filtering
The Data panel allows users to view the database, which can also be filtered by year and IMDB score. From this panel, users can sort by column or perform a search.
The By Group tab allows the user to group by director, borough, or neighborhood
Graphs with active filtering
The user can view histograms (shown below), box plots or scatter plots of continuous variables, such as IMDB score, Budget, Gross income, and movie length
Conclusions
Although an incomplete data set, only using films from the book Scences from the City, the application provides an excellent overview of locations around New York where movies have been filmed. It allows users to filter by Year and IMDB score on an interactive map as well as group by functions and graphical displays of variables. Unsurprisingly, the vast majority of movies are filmed in Manhattan and achieve around a 7 rating on IMDB. Interestingly, there was a small dip in the number of movies filmed in the city between the years of 1975 and 1990, possibly related to increases in criminal activity during this time period.
(crime rate image from Reddit)