Data Visualization on NYC Motor Vehicle Collision

Posted on Aug 5, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

As a New Yorker, I walk to school and work and so have wondered about the safety of the streets I must go through. We hear about drivers plowing into pedestrians. In May 2017 this happened at Times Square, and it resulted in the death of a young girl. Since then I have constant fear in my mind that what if someone rams car into the walkway? So with this fear on my mind, I decided to data visualize the NYC Motor Vehicle Collision dataset to determine some significant insights.

Link to My Shiny Interactive Application.

Link to my Github Repo. 


In New York, approximately 4,000 New Yorkers are seriously injured and more than 250 are killed each year in traffic crashes. Being struck by a vehicle is the leading cause of injury-related death for children under 14, and the second leading cause for seniors. On average, vehicles seriously injure or kill a New Yorker every two hours.

This status quo is unacceptable. The City of New York must no longer regard traffic crashes as mere "accidents," but rather as preventable incidents that can be systematically addressed. No level of fatality on city streets is inevitable or acceptable. This Vision Zero Action Plan is the City's foundation for ending traffic deaths and injuries on our streets.

Vision Zero

Vision Zero is a program created by New York City Mayor Bill de Blasio in 2014. Its purpose is to cut  the number traffic fatalities in half by 2025. On January 15, 2014, Mayor de Blasio announced the launch of  Vision Zero in New York City, based on a similar program of the same name that was implemented in Sweden. The original Swedish theory hypothesized that pedestrian deaths are not as much "accidents" as they are a failure of street design.

Data Source

It’s been five years since New York City signed the strongest open data law in the country and then launched an open data website called NYC Open Data. The site includes data for NYC Motor Vehicle Collision from 2012 to 2017 that is updated  every month. Currently, the dataset has 1.08 million rows and 29 columns. More information about the  dataset can be found here.

Data Wrangling

The dataset provided is clean but it has missing values. So we have to remove all the NAs first in order to visualize the data. Each observation (row) in dataset represents one accident. The date column has no missing values. As the date is in character format we have to convert it to date format in order to extract day, month and year from given date. Before removing NAs, we can visualize how many collisions occurs in each year.

Visualizing Data

The first visual is the line chart of Number of Accidents vs Year shows change in accidents from 2012 to 2017. We can clearly see that after vision zero initiative in 2104, the number of accidents are increasing and reaches its peak point in 2016. There is a huge drop in number of accidents from the beginning of 2017 until the present.

Data Visualization on NYC Motor Vehicle Collision

The next chart  shows the number of collision in each year by borough. It reveals that Brooklyn has highest number of collisions in each year, and Manhattan ranks second, closely followed by Queens.


Data Visualization on NYC Motor Vehicle Collision

How about which day of week has highest number of collision?

Data Visualization on NYC Motor Vehicle Collision

Friday! For each year, Friday has the highest number of collision. We can assume that people are keen to go home after finally finishing the work week.

According to above heat map of Hour of day as a function of Borough, in Brooklyn around 4 pm to 5 pm has the maximum number of collisions. Manhattan’s highest number of collisions occur from 2 pm to 4 pm. Queens has the most collisions at 8 am, as well as at 4 pm to 5 pm.

In the maps above, the heat map(left) shows the highest number of collision at almost all the avenues. It also shows the maximum number at the approach to Chinatown. Williamsburg Bridge. The cluster map (right) show that the , Lower East Side has maximum number of collision(1225) followed by Midtown(1060)m then comes  Chelsea with 971 collisions.

Let's visualize how many pedestrians got injured?

In the above Bar plot(left) of pedestrians injured per  year, Brooklyn ranks the highest, followed followed by Manhattan. For Manhattan the number of pedestrians injured gradually decreasing over a year. But in Brooklyn we see about the same number of injured pedestrians in both year 2015 and 2016. In heat map(right), 42nd Street and 8th Avenue shows the maximum number of pedestrians injured. In second place is 14th Street, which is a major cross street. Canal Street also shows more injured pedestrians than in  other places.

The number of pedestrians and number of collision in Manhattan overall are gradually decreasing, but the next line chart is quite shocking!

The line plot above shows the ratio of injured pedestrians to the total number of accidents per year., Surprisingly Manhattan has huge spike from 2016 to 2017. Even though the number of accidents are decreasing in 2017. The number of injured pedestrians is not. In 2016 Manhattan had 227,763 collisions and 2085 injured pedestrians. In 2017 Manhattan had 115,499 collisions, which is less than half the amount  of 2016, and the number of injured pedestrians were 1046. That's shocking!


What are the contributing factors to pedestrian injury?


Some major contributors to injury are identified in the heat map above. The  major factor is driver inattention and failure to yield right of way. Other major factor indicated by a yellow hue include backing unsafely and even pedestrian/cyclist/other pedestrians error/confusion.


The number of collisions are decreasing through the year, but the number of pedestrians getting injured is not decreasing.  Millions of tourist came to visit New York City every year, and everyday millions of New Yorker use a walkway as daily route to home or work. Government has to take an extra step for the safety of pedestrians. Government should reduce the speed limit, introduce slow zones and increased enforcement. For pedestrian confusion government has to make clear signs, more stop lights for pedestrians and increase pedestrians crossing time. Government have to make walkway with metal safety poles because now a days anybody would ram the car into walkway or plow into pedestrians.


About Author

Nachiket Patel

Nachiket graduated from the New York Institute of Technology with a Master's degree in Computer science. After completing Bachelors in Computer Science, he worked as a Software Engineer Data for two years. Nachiket enrolled in the NYC Data...
View all posts by Nachiket Patel >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI