NYPD Vehicle Collision Report

Posted on Feb 5, 2018


Every year there are more than 200,000 vehicle collisions in the five boroughs of New York City. The NYPD began publishing a data set of every recorded vehicle collision in the city in July of 2011. I have created an app to gain some insight into the data using the Shiny package for R.

Data Set

The data set, at the time of download, contained around 1.2 million lines of observations. Every observation contains information on the date, time, generally some sort of location data -- e.g. longitude and latitude, street intersection, or address -- number of injuries, vehicles involved and contributing factors. I used the information from July 2012 through July 2018. The problem with the location information is that there is no uniform method for officers or data entrants to enter the location information, type of vehicles or contributing factors in the data set. This will limit accuracy of statistics based on the location, and make analysis of the latter two features too difficult for the scope of this project, but the data set is still large enough that they will be usable.

Analysis App

The app allows the user to view a line chart of collisions per day of the type and in the time range they select. In addition, there are info boxes below the chart that give the maximum, minimum, mean and variance of collisions in that time range for the category selected. All years are given at the same time to show contrast and make the statistics more robust and meaningful.

On the left is a sidebar where the user can choose how to filter the data.

Feature Selection

A number of variables can be changed to select what data subset to analyze. The first variable is the category, where the user can choose to view all of the data or just collisions involving injuries, death, cyclists or pedestrians. Next is the borough selection. Only one borough at a time is allowed. next, the user can choose to view all data from the entire year or focus on a particular month. When the monthly radio button is chosen, an additional drop down for month selection appears. Finally, the user can focus on a particular time range within the day. The hours selected are shown below the drop down.


I built the app in Shiny for R. I started with a navbar page so I could add tabs with their own pages if necessary. I originally had separate pages for each category, but condensed them to one page using the drop down box and a switch command to simplify the app and try to cut down on memory usage. I did have to import css dependencies from Shiny dashboard to be able to use the infoBox tool to showcase the statistics, as the command for them is unique to that library.

I converted the csv from NYC Open Data to a SQLite database and built functions to construct the queries used to call the data. This will decrease memory usage needed, compared to using the csv, by limiting the number of tables loaded into memory. It does take a second on the initial load of the app to get everything into memory, but loads quickly thereafter. The functions that connect to the database and submit the query can be found on the queryfunc.R file.

Future Analysis and Updates to App

In the app.R code you can see that I have built a map page that shows either each location of a collision or a heatmap of collisions in the time range and for the category and borough selected by the user. I have limited the selections to individual month/year combinations to keep the map as readable as possible. It helps to zoom in to get a clearer picture. The map works exactly as intended on its own. However, it causes the app to crash when both parts are brought together. Due to time constraints I chose to push the app as is and try to fix this issue in future updates.

On the data side, I would like to fully clean the location data so that all observations can be used to provide a more thorough view of the situations. The time series analysis and predictions are something that can be looked into and provided.

Shiny app:Β https://gregmb.shinyapps.io/GregoryBrucchieriProj1/

Github repository:Β https://github.com/gregmb/TrafficCollisionApp

About Author

Gregory Brucchieri

Gregory has a Master of Arts in Economics from NYU. He is a former business analyst with Humana, Inc, where he maintained provider relations and contract databases for smaller, local networks Humana had paired with. He is driven...
View all posts by Gregory Brucchieri >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp