Data Study on Traffic tickets

Posted on Jun 30, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction & Data

Anyone who has ever experienced driving in New York City can attest to the fact that it may at times be a less-than-satisfactory experience. With pedestrians who rightfully assume ownership of all parts of this city, data shows cyclists who do the same, and other drivers who, let's face it, are no different. This mess of movement is regulated by a few city agencies who have the power to issue traffic tickets to those of us who violate the rules of the road (luckily for some it does not include pedestrians).

After coming across a dataset of all traffic tickets issued in New York State over a four year period from 2014 through end of 2017, I was intrigued as to what sort of dark secrets may be hiding in the over 14 million traffic citations handed and received over that time period. Boiling down the data from the state level to include only the 5 boroughs I was left with a little over 4 million observations to sift through - challenge accepted. Unfortunately, due to size limitations on the host server I ended up reducing the dataset to 3 years of observations, from 2015 to 2017, however, I retained 2014 for a time series analysis that we'll touch on later.

R Dashboard & Findings

To visualize the data I decided to utilize Semantic Dashboard which is built on RStudio's Shiny architecture. The first thing that struck me, as you can see in the following map, is that Manhattan is the borough with the highest total number of tickets issued year over year - us Brooklynites falling to second place by a slim margin.  This is perhaps unsurprising given the notoriety of Manhattan roads.

Data Study on Traffic ticketsNumber of tickets per borough. Darker shades indicate higher number of tickets.

Interestingly enough, on a per capita basis Manhattan is still number one for traffic citations, however, this time around Staten Island comes second even while being the least populated borough by a wide margin. This also makes sense once we consider that Staten Island is a much more car-oriented borough with few other transportation options.

Taking a deeper dive, the dashboard allows us to sort the data by offense type; borough in which citation was issued; age, sex, and license-issuing state of citation recipient; as well as by citation issuing agency. Interestingly enough, sorting the data by sex shows that men far outweigh women in the number of traffic tickets received citywide over the three year period considered. This is all the more startling given that more than 50% of licensed motorists in the United States are in fact female.

Data Study on Traffic ticketsMale v Female Citywide Traffic Ticket Count 2014-2017

It is also interesting to note that most traffic tickets over this three year period are issued on Wednesday, regardless of borough. The reason for this is not immediately apparent but may be attributed in equal parts to mid-week attention span drops and NYPD conspiracy theories. There is also unfortunately no way to tell if the data accounts for duplicate or dismissed tickets - this is an area for future dashboard improvement.

Wednesday is the day for most ticket issuance consistent through the 5 boroughs.

Finally, the question we've all been asking ourselves, is which violations are drivers in the city most keen on committing. The data is illuminating on this point. It seems that New York drivers are most likely to get citations for running red lights/stop signs, using their phones while driving, and speeding.

New York's Top 3 Favorite Traffic Offenses

The last part of the dashboard is a time series of total monthly ticket counts over a four year period from 2014 to 2017. We have the option to select a moving average smoother to get a clearer picture of any trends as well as the ability to fit an AR (Auto-Regressive) model for forecasting future ticket issuance. Along with the point forecast we also plot an 80% confidence bound. It looks like the confidence bound is consistent with the overall variability of ticket counts over the 4 years. If the forecast is accurate, however, remains to be seen as data for 2018 is not yet available.

Total monthly tickets with a Simple Moving Average smoother and AR model of order 10.

In conclusion, we have unearthed a few of New York City drivers' secrets, however, many more remain and are up to the user to reveal via a full exploration of the dashboard. May the force be with you.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI