Data Visualizing NYC Traffic Before and After Vision Zero

Posted on Feb 6, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

While New York City is a popular attraction for many things, data shows driving is not one of them. Joining the ranks of the Top 10 Worst U.S. Cities for Drivers, NYC is notorious for its road raging drivers, jaywalking pedestrians, and daring cyclists. With the Vision Zero initiative deployed in 2014, NYC saw its best year in 2016 in terms of accident reduction. This app uses Leaflet and Shiny to help us visualize these improvements. area.

Click here to visit the Shiny app!
Check out my Github!

Playing with Data

Sourcing the Data and Vision Zero

As a part of the Vision Zero initiative started in 2014, the NYPD has made its data for motor vehicle collisions available to the public with observations dating back to mid-2012. More information about this dataset can be found here. Vision Zero NYC was adopted in 2014 after its successful deployment in Sweden. The original ideology of Vision Zero identifies the root cause of all traffic related accidents and fatalities as results of flaws in street designs. The program aims to reduce traffic related fatalities by 50% in 2025 by implementing speed limit reductions, slow zones, increased enforcement, reduced downtime of traffic equipment, and more.

Manipulating the Data

Fortunately, the dataset that is provided is already in a very clean and easy to consume format. To make our lives even easier, we will create some additional columns to use in our analysis. First, observations with missing data were removed for this visualization. Next, by using the provided date column, we created a year column to allow us to group accidents by year. Since half of 2012's data has been left out, and 2017's data is still being compiled, we have removed this for now. Next, we converted the date and time columns from strings to their respective classes so that we can compare them as actual dates and times instead of strings.

Now that we have a proper date value, we can create another column for days of the week. The data provides us with the vehicle types that are involved in each accident. This is good, but they are split up in each of their own columns depending on the number of vehicles. Next, we created a column to count the number of vehicles in each accident. In a city like New York, many neighborhoods actually contain multiple zip codes. We will import another dataset into our project and merge this with our current data to get the names of each neighborhood.

Visualizing the Data

Charts and Maps

Our first visual is a very basic line graph that shows the change in accidents from 2013 to 2016. It is quite interesting to see that the number of accidents actually saw a slight increase immediately after the implementation of Vision Zero.
Data Visualizing NYC Traffic Before and After Vision Zero

However, if we filter the data to show fatal accidents only, there is definitely a strong downward trend with no indication of the increase in the prior graph.
Data Visualizing NYC Traffic Before and After Vision Zero

In both graphs we can see that 2016 was the best year in terms of reduction in number of accidents. We will explore more of this later.

Before playing with this dataset, I had always assumed that Manhattan had far more accidents than any of the other boroughs in New York, but the next visualization is a bit surprising.

Data Visualizing NYC Traffic Before and After Vision Zero

As it turns out, Brooklyn has far more accidents than any of other five boroughs, and Queens is actually very similar to Manhattan in terms of traffic accidents. This is slightly concerning to me since I currently live in Queens, so we will explore this in more detail later.

What about accidents by days of the week?

weekdayNot too surprisingly, the number of accidents seem to gradually increase throughout the week, with a spike in Friday before resetting during the weekend. You may want to keep that in mind the next time you're thinking about driving out on a Friday night.

Lets visualize these accidents on a map via longitude and latitude coordinates from the dataset.

2016

heat1

Wow... this is a very busy visualization. This is expected since we are mapping every single accident from 2013 - 2016. This makes it way too difficult to gain any insightful observations. We can do better. Maybe we can filter the dataset to show only accidents in Queens for now. (See the 'Making it Interactive!' section of this blog post, or visit the Shiny app here to explore other regions!)

queens1

2013

Better, but still too busy in my opinion. We are still showing all the data from 2013 - 2016. What if we were interested in what this looks like year by year? We can start by filtering the dataset to show only observations for 2013 - This will give us a good understanding of what Queens looked like before Vision Zero was implemented.

queens2 queens3

Looking at this map, we can see two major accident-ridden streets that run the entire span of Queens. If you are a Queens-native like I am, you wouldn't be surprised to learn that these streets are Northern Boulevard and Queens Boulevard. Next, lets try to identify a range of time when most of these accidents are occurring.

12:00 am - 6:00 am (A)                                           6:00 am - 12:00 pm (B)

queenstime1 queenstime2

12:00 pm - 6:00 pm (C)                                           6:00 pm - 12:00 am (D)

queenstime3 queenstime4

Findings

There seems to be quite a low volume of accidents from 12:00 am to 6:00 am, likely due to the majority of people sleeping. From 6:00 am to 12:00 pm, we can see an increase in the volume of accidents on Northern Blvd and Queens Blvd. Since these two streets look like major entry points into Midtown Manhattan, we can expect that these accidents are likely to be morning commutes. From 12:00 pm to 6:00 pm, we can see the number of accidents reach its peak compared to the 3 other maps. This could potentially be a result of commuters returning home after work. While there is still accident activity from 6:00 pm to 12:00 am, map C is definitely the worst out of the 4.

These are maps generated from 2013 - before the implementation of Vision Zero. Referring back to the very first chart, it seems clear that Vision Zero had a positive impact in reducing the number of accidents, as seen by the significant decrease in 2016. How does this look like on a map?

Before Vision Zero (2013 | 12:00 pm - 6:00 pm)                         After Vision Zero (2016 | 12:00 pm - 6:00 pm)

queenstime3 queenstime5

Taking a look at the two maps side by side, it is quite obvious that Northern Blvd and Queens Blvd have both improved dramatically since 2013. What is the approach taken by Vision Zero? If we look closely, the areas leading into Manhattan are clearing up quite well, but as we move east, there doesn't seem to be any significant improvement. The eastern half of Northern Blvd - Flushing, remains to be a problematic area for Queens. Is Vision Zero being executed with an inside-out approach, where the areas closer to Manhattan are receiving higher priority? Or is Flushing just that bad? This will be something that we could potentially revisit with future data.

Making it Interactive with Shiny

In order to make this dataset malleable for other users, I have created a Shiny app to allow users to set their own parameters for their own exploration. Users will be free to explore other regions of NYC outside of Queens. The link to the app can be found here!

In addition to the heat map of accidents previewed in this blog post, there are a few more features available to visualize the data with. This includes a map of fatalities for pedestrians, cyclists, and motorists involved in accidents, a filtered dataset showing the accidents by neighborhoods (grouped by zip codes), and a dataset filtered by the injury/death ratio to accidents. In the app, we can also take a deeper look into the factors for motorcycle accidents, which happens to the be vehicle type with the highest injury ratio to accidents.

Upon accessing the app, users will be greeted with a dashboard interface shown below:

shiny3

Users can navigate through the different features of the app using the menu on the left:

shiny4

Each tab on the menu will generate a visual in the form of a graph, map, or table. Each visual will have its own set of filters for the user to leverage for their exploration of the dataset. Some of the filters are shown below:

 shiny2 shiny1

Conclusion

It has been 3 years since the initial deployment of Vision Zero NYC. Is it working? According to the current data, the overall answer appears to be yes. With its best year in 2016, Vision Zero seems to have finally gained some traction. Moving forward, we should expect the program to continue its positive momentum, but only time (and data) will tell.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI