Filming Locations around New York City - Visualization using Shiny Dashboard

Posted on Feb 2, 2017


Many movies have been filmed in New York City, but it is difficult to get a sense for where in the city these scenes occurred. I used a dataset based on the book Scences from the City by James Sanders, available on NYC Open Data, to create a navigable map which displays this information. This dataset included coordinates, allowing me to pinpoint exact locations. My goal was to visualize where in the city movies were filmed as well as provide additional information from IMDB, such as rating, poster images, and direct IMDB links. For this project I used R to clean and aggregate the data and Shiny to visualize it.



Film Locations

Film scene locations for movies filmed in New York were obtained from the NYC Open Data website:

This dataset is based on the book Scenes from the City by James Sanders:

This dataset does not include every movie filmed in New YorkΒ  (which would be difficult, yet interesting to collect!) and does not include movies filmed after 2006. This is a limitation that I hope to address in the future, but is important to keep in mind for the current application.


Kaggle IMDB-5000

This data set was created by Chuan Sun, who scraped data from the IMDB website. The data set provided me with ratings for some of the movies in my dataset:



I also installed the ggplot2_movies package which allowed me to access the movies dataset, which provided additional IMDB ratings that were not present in the Kaggle data set.


Data Aggregation and Manipulation

Data Cleaning

In order to join the three datasets, the IMDB kaggle dataset required minor cleaning. First, I changed the titles from being factors to strings. After doing so, there was still white space following the title, which would have caused issues when joining the data sets, so this was removed. The film locations dataset also requiredΒ  the column "Year" to be changed to an integer.

Joining Databases

After cleaning, two left joins were performed, with the left table being the Movie Locations dataset, so that no films in the locations data set would be removed.


Shiny Dashboard

Interactive Map with active filtering

The core of the application uses leaflet to allow the user to zoom in and out of a map of New York with markers indicating the locations of scenes from movies. On the side panel, sliders for both the year of release and IMDB score allow the user to filter markers with a great degree of specificity.

Not filtered


Filtering by IMDB Score:

Filtered by IMDB Score


Filtering by Year:

Filtered by Year


Clicking on marker produces pop-up with information about the movie, and a direct link to the IMDB site for the movie

Clicking on a popup provides information related to the movie


Database with active filtering

The Data panel allows users to view the database, which can also be filtered by year and IMDB score. From this panel, users can sort by column or perform a search.

Interactive Data Table


The By Group tab allows the user to group by director, borough, or neighborhood

Group by Director


Graphs with active filtering

The user can view histograms (shown below), box plots or scatter plots of continuous variables, such as IMDB score, Budget, Gross income, and movie length

Histogram of IMDB Scores

Screen Shot 2017-02-05 at 5.29.41 PM


Although an incomplete data set, only using films from the book Scences from the City, the application provides an excellent overview of locations around New York where movies have been filmed. It allows users to filter by Year and IMDB score on an interactive map as well as group by functions and graphical displays of variables. Unsurprisingly, the vast majority of movies are filmed in Manhattan and achieve around a 7 rating on IMDB. Interestingly, there was a small dip in the number of movies filmed in the city between the years of 1975 and 1990, possibly related to increases in criminal activity during this time period.



(crime rate image from Reddit)

About Author

Daniel Epstein

Daniel Epstein is a neuroscience PHD candidate at the University of Utah, expecting to graduate in summer 2017. While performing analyses on behavioral and neuroimaging data, he became interested in utilizing data science to understand human behavior and...
View all posts by Daniel Epstein >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI