Data Visualizing NYC Traffic Before and After Vision Zero
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
While New York City is a popular attraction for many things, data shows driving is not one of them. Joining the ranks of the Top 10 Worst U.S. Cities for Drivers, NYC is notorious for its road raging drivers, jaywalking pedestrians, and daring cyclists. With the Vision Zero initiative deployed in 2014, NYC saw its best year in 2016 in terms of accident reduction. This app uses Leaflet and Shiny to help us visualize these improvements. area.
Playing with Data
Sourcing the Data and Vision Zero
As a part of the Vision Zero initiative started in 2014, the NYPD has made its data for motor vehicle collisions available to the public with observations dating back to mid-2012. More information about this dataset can be found here. Vision Zero NYC was adopted in 2014 after its successful deployment in Sweden. The original ideology of Vision Zero identifies the root cause of all traffic related accidents and fatalities as results of flaws in street designs. The program aims to reduce traffic related fatalities by 50% in 2025 by implementing speed limit reductions, slow zones, increased enforcement, reduced downtime of traffic equipment, and more.
Manipulating the Data
Fortunately, the dataset that is provided is already in a very clean and easy to consume format. To make our lives even easier, we will create some additional columns to use in our analysis. First, observations with missing data were removed for this visualization. Next, by using the provided date column, we created a year column to allow us to group accidents by year. Since half of 2012's data has been left out, and 2017's data is still being compiled, we have removed this for now. Next, we converted the date and time columns from strings to their respective classes so that we can compare them as actual dates and times instead of strings.
Now that we have a proper date value, we can create another column for days of the week. The data provides us with the vehicle types that are involved in each accident. This is good, but they are split up in each of their own columns depending on the number of vehicles. Next, we created a column to count the number of vehicles in each accident. In a city like New York, many neighborhoods actually contain multiple zip codes. We will import another dataset into our project and merge this with our current data to get the names of each neighborhood.
Visualizing the Data
Charts and Maps
Our first visual is a very basic line graph that shows the change in accidents from 2013 to 2016. It is quite interesting to see that the number of accidents actually saw a slight increase immediately after the implementation of Vision Zero.
In both graphs we can see that 2016 was the best year in terms of reduction in number of accidents. We will explore more of this later.
Before playing with this dataset, I had always assumed that Manhattan had far more accidents than any of the other boroughs in New York, but the next visualization is a bit surprising.
As it turns out, Brooklyn has far more accidents than any of other five boroughs, and Queens is actually very similar to Manhattan in terms of traffic accidents. This is slightly concerning to me since I currently live in Queens, so we will explore this in more detail later.
What about accidents by days of the week?
Not too surprisingly, the number of accidents seem to gradually increase throughout the week, with a spike in Friday before resetting during the weekend. You may want to keep that in mind the next time you're thinking about driving out on a Friday night.
Lets visualize these accidents on a map via longitude and latitude coordinates from the dataset.
Wow... this is a very busy visualization. This is expected since we are mapping every single accident from 2013 - 2016. This makes it way too difficult to gain any insightful observations. We can do better. Maybe we can filter the dataset to show only accidents in Queens for now. (See the 'Making it Interactive!' section of this blog post, or visit the Shiny app here to explore other regions!)
Better, but still too busy in my opinion. We are still showing all the data from 2013 - 2016. What if we were interested in what this looks like year by year? We can start by filtering the dataset to show only observations for 2013 - This will give us a good understanding of what Queens looked like before Vision Zero was implemented.
Looking at this map, we can see two major accident-ridden streets that run the entire span of Queens. If you are a Queens-native like I am, you wouldn't be surprised to learn that these streets are Northern Boulevard and Queens Boulevard. Next, lets try to identify a range of time when most of these accidents are occurring.
12:00 am - 6:00 am (A) 6:00 am - 12:00 pm (B)
12:00 pm - 6:00 pm (C) 6:00 pm - 12:00 am (D)
There seems to be quite a low volume of accidents from 12:00 am to 6:00 am, likely due to the majority of people sleeping. From 6:00 am to 12:00 pm, we can see an increase in the volume of accidents on Northern Blvd and Queens Blvd. Since these two streets look like major entry points into Midtown Manhattan, we can expect that these accidents are likely to be morning commutes. From 12:00 pm to 6:00 pm, we can see the number of accidents reach its peak compared to the 3 other maps. This could potentially be a result of commuters returning home after work. While there is still accident activity from 6:00 pm to 12:00 am, map C is definitely the worst out of the 4.
These are maps generated from 2013 - before the implementation of Vision Zero. Referring back to the very first chart, it seems clear that Vision Zero had a positive impact in reducing the number of accidents, as seen by the significant decrease in 2016. How does this look like on a map?
Before Vision Zero (2013 | 12:00 pm - 6:00 pm) After Vision Zero (2016 | 12:00 pm - 6:00 pm)
Taking a look at the two maps side by side, it is quite obvious that Northern Blvd and Queens Blvd have both improved dramatically since 2013. What is the approach taken by Vision Zero? If we look closely, the areas leading into Manhattan are clearing up quite well, but as we move east, there doesn't seem to be any significant improvement. The eastern half of Northern Blvd - Flushing, remains to be a problematic area for Queens. Is Vision Zero being executed with an inside-out approach, where the areas closer to Manhattan are receiving higher priority? Or is Flushing just that bad? This will be something that we could potentially revisit with future data.
Making it Interactive with Shiny
In order to make this dataset malleable for other users, I have created a Shiny app to allow users to set their own parameters for their own exploration. Users will be free to explore other regions of NYC outside of Queens. The link to the app can be found here!
In addition to the heat map of accidents previewed in this blog post, there are a few more features available to visualize the data with. This includes a map of fatalities for pedestrians, cyclists, and motorists involved in accidents, a filtered dataset showing the accidents by neighborhoods (grouped by zip codes), and a dataset filtered by the injury/death ratio to accidents. In the app, we can also take a deeper look into the factors for motorcycle accidents, which happens to the be vehicle type with the highest injury ratio to accidents.
Upon accessing the app, users will be greeted with a dashboard interface shown below:
Users can navigate through the different features of the app using the menu on the left:
Each tab on the menu will generate a visual in the form of a graph, map, or table. Each visual will have its own set of filters for the user to leverage for their exploration of the dataset. Some of the filters are shown below:
It has been 3 years since the initial deployment of Vision Zero NYC. Is it working? According to the current data, the overall answer appears to be yes. With its best year in 2016, Vision Zero seems to have finally gained some traction. Moving forward, we should expect the program to continue its positive momentum, but only time (and data) will tell.