NYPD Vehicle Collision Report

Gregory Brucchieri
Posted on Feb 5, 2018

Introduction

Every year more there are more than 200,000 vehicle collisions in the five boroughs of New York City. The NYPD began publishing a data set of every recorded vehicle collision in the city in July of 2011. I have created an app to gain some insight into the data using the Shiny package for R.

Data Set

The data set, at the time of download, contained around 1.2 million lines of observations. Every observation contains information on the date, time, generally some sort of location data -- e.g. longitude and latitude, street intersection, or address -- number of injuries, vehicles involved and contributing factors. I used the information from July 2012 through July 2018. The problem with the location information is that there is no uniform method for officers or data entrants to enter the location information, type of vehicles or contributing factors in the data set. This will limit accuracy of statistics based on the location, and make analysis of the latter two features too difficult for the scope of this project, but the data set is still large enough that they will be usable.

Analysis App

The app allows the user to view a line chart of collisions per day of the type and in the time range they select. In addition, there are info boxes below the chart that give the maximum, minimum, mean and variance of collisions in that time range for the category selected. All years are given at the same time to show contrast and make the statistics more robust and meaningful.

On the left is a sidebar where the user can choose how to filter the data.

Feature Selection

A number of variables can be changed to select what data subset to analyze. The first variable is the category, where the user can choose to view all of the data or just collisions involving injuries, death, cyclists or pedestrians. Next is the borough selection. Only one borough at a time is allowed. next, the user can choose to view all data from the entire year or focus on a particular month. When the monthly radio button is chosen, an additional drop down for month selection appears. Finally, the user can focus on a particular time range within the day. The hours selected are shown below the drop down.

Code

I built the app in Shiny for R. I started with a navbar page so I could add tabs with their own pages if necessary. I originally had separate pages for each category, but condensed them to one page using the drop down box and a switch command to simplify the app and try to cut down on memory usage. I did have to import css dependencies from Shiny dashboard to be able to use the infoBox tool to showcase the statistics, as the command for them is unique to that library.

I converted the csv from NYC Open Data to a SQLite database and built functions to construct the queries used to call the data. This will decrease memory usage needed, compared to using the csv, by limiting the number of tables loaded into memory. It does take a second on the initial load of the app to get everything into memory, but loads quickly thereafter. The functions that connect to the database and submit the query can be found on the queryfunc.R file.

 

Future Analysis and Updates to App

In the app.R code you can see that I have built a map page that shows either each location of a collision or a heatmap of collisions in the time range and for the category and borough selected by the user. I have limited the selections to individual month/year combinations to keep the map as readable as possible. It helps to zoom in to get a clearer picture. The map works exactly as intended on its own. However, it causes the app to crash when both parts are brought together. Due to time constraints I chose to push the app as is and try to fix this issue in future updates.

On the data side, I would like to fully clean the location data so that all observations can be used to provide a more thorough view of the situations. The time series analysis and predictions are something that can be looked into and provided.

 

Shiny app: https://gregmb.shinyapps.io/GregoryBrucchieriProj1/

Github repository: https://github.com/gregmb/TrafficCollisionApp

 

 

About Author

Gregory Brucchieri

Gregory Brucchieri

Gregory has a Master of Arts in Economics from NYU. He is a former business analyst with Humana, Inc, where he maintained provider relations and contract databases for smaller, local networks Humana had paired with. He is driven...
View all posts by Gregory Brucchieri >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Spotlight alumni story Alumnus API artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Big Data bootcamp Bootcamp Prep Bundles California Cancer Research capstone Career citibike clustering Coding Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Industry Experts Job JP Morgan Chase Kaggle lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Open Data painter pandas Portfolio Development prediction Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest recommendation recommendation system regression Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Tableau Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping What to expect word cloud word2vec XGBoost yelp