Filming Locations around New York City - Visualization using Shiny Dashboard

Avatar
Posted on Feb 2, 2017

Introduction

Many movies have been filmed in New York City, but it is difficult to get a sense for where in the city these scenes occurred. I used a dataset based on the book Scences from the City by James Sanders, available on NYC Open Data, to create a navigable map which displays this information. This dataset included coordinates, allowing me to pinpoint exact locations. My goal was to visualize where in the city movies were filmed as well as provide additional information from IMDB, such as rating, poster images, and direct IMDB links. For this project I used R to clean and aggregate the data and Shiny to visualize it.

 

Datasets

Film Locations

Film scene locations for movies filmed in New York were obtained from the NYC Open Data website: https://data.cityofnewyork.us/Business/Filming-Locations-Scenes-from-the-City-/qb3k-n8mm

This dataset is based on the book Scenes from the City by James Sanders: https://www.amazon.com/Scenes-City-Filmmaking-New-York/dp/0847828905

This dataset does not include every movie filmed in New York  (which would be difficult, yet interesting to collect!) and does not include movies filmed after 2006. This is a limitation that I hope to address in the future, but is important to keep in mind for the current application.

 

Kaggle IMDB-5000

This data set was created by Chuan Sun, who scraped data from the IMDB website. The data set provided me with ratings for some of the movies in my dataset: https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

 

GGPlot2_movies

I also installed the ggplot2_movies package which allowed me to access the movies dataset, which provided additional IMDB ratings that were not present in the Kaggle data set.

 

Data Aggregation and Manipulation

Data Cleaning

In order to join the three datasets, the IMDB kaggle dataset required minor cleaning. First, I changed the titles from being factors to strings. After doing so, there was still white space following the title, which would have caused issues when joining the data sets, so this was removed. The film locations dataset also required  the column "Year" to be changed to an integer.

Joining Databases

After cleaning, two left joins were performed, with the left table being the Movie Locations dataset, so that no films in the locations data set would be removed.

 

Shiny Dashboard

Interactive Map with active filtering

The core of the application uses leaflet to allow the user to zoom in and out of a map of New York with markers indicating the locations of scenes from movies. On the side panel, sliders for both the year of release and IMDB score allow the user to filter markers with a great degree of specificity.

Not filtered

 

Filtering by IMDB Score:

Filtered by IMDB Score

 

Filtering by Year:

Filtered by Year

 

Clicking on marker produces pop-up with information about the movie, and a direct link to the IMDB site for the movie

Clicking on a popup provides information related to the movie

 

Database with active filtering

The Data panel allows users to view the database, which can also be filtered by year and IMDB score. From this panel, users can sort by column or perform a search.

Interactive Data Table

 

The By Group tab allows the user to group by director, borough, or neighborhood

Group by Director

 

Graphs with active filtering

The user can view histograms (shown below), box plots or scatter plots of continuous variables, such as IMDB score, Budget, Gross income, and movie length

Histogram of IMDB Scores

Screen Shot 2017-02-05 at 5.29.41 PM

Conclusions

Although an incomplete data set, only using films from the book Scences from the City, the application provides an excellent overview of locations around New York where movies have been filmed. It allows users to filter by Year and IMDB score on an interactive map as well as group by functions and graphical displays of variables. Unsurprisingly, the vast majority of movies are filmed in Manhattan and achieve around a 7 rating on IMDB. Interestingly, there was a small dip in the number of movies filmed in the city between the years of 1975 and 1990, possibly related to increases in criminal activity during this time period.

 

murder_rate

(crime rate image from Reddit)

About Author

Avatar

Daniel Epstein

Daniel Epstein is a neuroscience PHD candidate at the University of Utah, expecting to graduate in summer 2017. While performing analyses on behavioral and neuroimaging data, he became interested in utilizing data science to understand human behavior and...
View all posts by Daniel Epstein >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp