Data Study on Traffic tickets
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction & Data
Anyone who has ever experienced driving in New York City can attest to the fact that it may at times be a less-than-satisfactory experience. With pedestrians who rightfully assume ownership of all parts of this city, data shows cyclists who do the same, and other drivers who, let's face it, are no different. This mess of movement is regulated by a few city agencies who have the power to issue traffic tickets to those of us who violate the rules of the road (luckily for some it does not include pedestrians).
After coming across a dataset of all traffic tickets issued in New York State over a four year period from 2014 through end of 2017, I was intrigued as to what sort of dark secrets may be hiding in the over 14 million traffic citations handed and received over that time period. Boiling down the data from the state level to include only the 5 boroughs I was left with a little over 4 million observations to sift through - challenge accepted. Unfortunately, due to size limitations on the host server I ended up reducing the dataset to 3 years of observations, from 2015 to 2017, however, I retained 2014 for a time series analysis that we'll touch on later.
R Dashboard & Findings
To visualize the data I decided to utilize Semantic Dashboard which is built on RStudio's Shiny architecture. The first thing that struck me, as you can see in the following map, is that Manhattan is the borough with the highest total number of tickets issued year over year - us Brooklynites falling to second place by a slim margin. This is perhaps unsurprising given the notoriety of Manhattan roads.
Number of tickets per borough. Darker shades indicate higher number of tickets.
Interestingly enough, on a per capita basis Manhattan is still number one for traffic citations, however, this time around Staten Island comes second even while being the least populated borough by a wide margin. This also makes sense once we consider that Staten Island is a much more car-oriented borough with few other transportation options.
Taking a deeper dive, the dashboard allows us to sort the data by offense type; borough in which citation was issued; age, sex, and license-issuing state of citation recipient; as well as by citation issuing agency. Interestingly enough, sorting the data by sex shows that men far outweigh women in the number of traffic tickets received citywide over the three year period considered. This is all the more startling given that more than 50% of licensed motorists in the United States are in fact female.
Male v Female Citywide Traffic Ticket Count 2014-2017
It is also interesting to note that most traffic tickets over this three year period are issued on Wednesday, regardless of borough. The reason for this is not immediately apparent but may be attributed in equal parts to mid-week attention span drops and NYPD conspiracy theories. There is also unfortunately no way to tell if the data accounts for duplicate or dismissed tickets - this is an area for future dashboard improvement.
Wednesday is the day for most ticket issuance consistent through the 5 boroughs.
Finally, the question we've all been asking ourselves, is which violations are drivers in the city most keen on committing. The data is illuminating on this point. It seems that New York drivers are most likely to get citations for running red lights/stop signs, using their phones while driving, and speeding.
New York's Top 3 Favorite Traffic Offenses
The last part of the dashboard is a time series of total monthly ticket counts over a four year period from 2014 to 2017. We have the option to select a moving average smoother to get a clearer picture of any trends as well as the ability to fit an AR (Auto-Regressive) model for forecasting future ticket issuance. Along with the point forecast we also plot an 80% confidence bound. It looks like the confidence bound is consistent with the overall variability of ticket counts over the 4 years. If the forecast is accurate, however, remains to be seen as data for 2018 is not yet available.
Total monthly tickets with a Simple Moving Average smoother and AR model of order 10.
In conclusion, we have unearthed a few of New York City drivers' secrets, however, many more remain and are up to the user to reveal via a full exploration of the dashboard. May the force be with you.