NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > R > How safe is driving around New York City?

How safe is driving around New York City?

Arda Kosar
Posted on Apr 29, 2016

Contributed by Arda Kosar. He  graduated the NYC Data Science Academy 12 week full time Data Science Bootcamp program took place between April 11th to July 1st, 2016. This post is based on his first class project - R visualization (due on the 2nd week of the program).

I moved to New York City (NYC) in October, 2015. I wanted to explore around the city however after some limit it is difficult to commute by public transportation, therefore I decided to apply for my driver's license.

Starting from March, 2016 I have been in the process of getting my driver's license. I was looking for a dataset for this project in NYC Open Data and when I saw this dataset I thought that this will be a good analysis especially thinking about the fact that I will get my license in one month and start driving and exploring around the city. This will be a beneficial analysis for me in order to get the idea of death rates caused by motor vehicle collisions around NYC.

I downloaded the dataset from NYC Open Data website.(Link for the dataset)

Before starting my analysis I had three questions in mind:

  1. How does number of deaths effected by Borough
  2. How does number of deaths effected by specific driving time in the day?
  3. How does number of deaths effected by the driving location (according to zipcodes)?

The dataset has 769054 observations and 29 variables.

Note 1: Also at this point it is important to mention that the data for the years 2012 and 2016 is incomplete therefore trend-wise I will compare 2013-2014-2015 however from the death count point of view I still wanted to see how the boroughs included in each year.

Note 2: This analysis is just for exploratory visualization for the dataset.

Note 3: The whole code for this exploratory analysis can be find here.

I imported the dataset to R. The required libraries were as follows:

After importing the data, I grouped it relating to my exploration questions:

1.HOW DOES TOTAL DEATHS CHANGE BY BOROUGH?

For the first section I analyzed the data by year and boroughs since I was exploring the change in total number of deaths by boroughs.

Total_People_Killed_by_Year

From the graph we can see that there is a decreasing trend for the past 3 years; 2013-2014-2015. Also from the stack sizes, Queens and Brooklyn seem to have the most number of total deaths. Staten Island seem to have the least number of total deaths however this results were not normalized by the population.

In my dataset I also have the data about total cyclist, pedestrian and motorist deaths. For exploring more I dig down and plotted the total death numbers for cyclists, motorists and pedestrians.

1.1 Total Number of Cyclist Deaths by Borough

Total_Cyclist_Deaths

Cyclist deaths in total is low. The maximum death / Borough is 6 people and again Brooklyn and Queens seem to have the highest number except for the year 2014, in which Manhattan had a higher number compared to Brooklyn. But for the past three years Brooklyn seem to have a constant rate and Queens' death rate is also tripled for the past two years compared to 2013.

1.2 Total Number of Motorist Deaths by Borough

Total_Motorist_Deaths_Years
If we examine this graph, we can easily observe some trends in the motorist death rates of Boroughs. Both Staten Island and Bronx have an increasing number of motorist deaths for the past three years. In the same period Brooklyn and Manhattan have a decreasing trend and Queens also decreased between 2013-2014 however remained constant after that.

1.3 Total Number of Pedestrian Deaths by Borough

Total_Pedestrians_by_Year

When we take a look at the Total Pedestrian Deaths by Year graph we can again interpret some trends among the Boroughs. Manhattan has a decreasing trend for the past three years. Queens and Bronx had an important drop in total pedestrian deaths in 2014.

I think the important trend in this graph is Brooklyn. As one can recall from the previous graph, Total Motorist Deaths by Year, Brooklyn seem to have a decreasing trend however total pedestrian deaths increased. Although this point can be analyzed further in future analysis, as a starting point an interpretation can be made. For the years 2013-2014-2015 the number of total people killed in Brooklyn is 67 people/year and total cyclist killed is also 4 people/year. We have a decreasing trend of motorist deaths and an increasing trend of pedestrian deaths. We can say that maybe the accidents in Brooklyn for the past three years, involve more car-pedestrian collisions instead of car-car collisions. An increase of 10 person/year, I think, is not a small number so I think this point should be analyzed further.

 

2- HOW DOES NUMBER OF DEATHS EFFECTED BY SPECIFIC DRIVING TIME OF THE DAY?

For my second exploration question I wanted to analyze total death numbers by some time ranges during the day. For this I used the package Chron. Because the class of my TIME column is "character".  By using "times()" function in Chron package I converted my character TIME data to actual time data. The times are in 24 hour format since Chron is handling the TIME data in that way.

After that I filtered my original table according to following four ranges:

  1. Morning -  (05:00 - 11:59)
  2. Midday - (12:00 - 16:59)
  3. Evening - (17:00 - 23:59)
  4. Night - (00:00 - 04:59)

Total_Deaths_Time_of_Day

From the Time of Day graph we can observe some peak ranges for each Borough. Evening time range which is between 17:00 - 23:59 seems to be the highest in most of the years for each Borough. Further analysis can be carried out for this point and for a starting point I thought there can be two reasons for this. First one is the evening rush hour which includes a high volume of cars, pedestrians and cyclists on the roads. The second factor can be irresponsible driving. Since I have "Contributing Factors" variable in my dataset, analysis of this variable connected with other variables can lead to some valuable insights about the contributing factors to the accidents in the evening time range.

2.1 Monthly Seasonal Analysis

As an extension to my second exploration question, I wanted to explore if there is a trend going on in different seasons. In order to do this I grouped my data according to months and then filtered them according to the months of the seasons.

Total_Deaths_by_Months

What I was expecting before plotting this part was that there should be a trend towards the end of the year, since the weather condition gets worse. The graph confirms that. We can see a trend towards the end of the year and it reaches its peak at the end of the Fall and at the beginning of the Winter.

3 - HOW DOES NUMBER OF DEATHS EFFECTED BY THE DRIVING LOCATION(BY ZIPCODES)?

For my last question I wanted to see the numbers on a map since from the first part of my analysis I concluded some interesting numbers for Brooklyn and Queens. I wanted to see how the number of deaths are distributed to the boroughs.

I used choroplethrZip since I have the zipcodes of the collision locations.

Total_Deaths_by_Zipcodes

If we recall the analysis for the first question, Brooklyn and Queens have the highest number of deaths among the Boroughs and that we can also confirm from the Zipcode analysis. South East Brooklyn seem to have the highest number compared to other parts of Brooklyn, however the remaining part also has a high number compared to other Boroughs.

From the first part Queens also has a high death number however the north part of Queens seem to have 0 -2 total deaths which is low compared to other parts.

I was curious about the distributions of cyclist, pedestrian and motorist deaths on the map therefore I created 3 seperate maps for these 3 categories.

3.1 Total Cyclist Deaths by Zipcodes

Total_Cyclist_Deaths_by_Year

In part 1 we said that the total number of deaths for cyclists were really low. From the map we can confirm that interpretation. Light blue represents zero deaths which we see all around the city. Some parts in Brooklyn and Manhattan has the most number of deaths in cyclist category.

3.2 Total Pedestrian Deaths by Zipcodes

We can see a cluster around south-east Brooklyn in which we also saw that the total number of deaths were higher compared to other parts of Brooklyn and also other Boroughs. South West Queens also has a higher pedestrian death rate compared to other parts. Manhattan, Bronx and Staten Island are at the low ends compared to Brooklyn and Queens.

3.3 Total Motorist Deaths by Zipcodes

Total_Motorist_Death_by_Zipcodes

We expect to see more deaths in South-East Brooklyn and from the map we can see that this is the case. Since Brooklyn also has the highest rates for the other two categories, I think it is worth inspecting Brooklyn and Queens deeper.

4 Conclusions and Further Analysis

As a conclusion, just from our initial exploratory analysis we can draw the following conclusions:

  1. The total number of death because of Motor Vehicle Collisions in NYC has a decreasing trend for the past 3 years.
  2. Brooklyn and Queens have the highest total number of deaths among NYC Boroughs.
  3. From the initial exploration there seem to be a change in the collision types in Brooklyn since the pedestrian death have been increasing and motorist death have been decreasing. Brooklyn collisions seem to involve more car-pedestrian type of collisions.
  4. A seasonal analysis showed that the number of deaths increases towards the end of the year and the maximum number of deaths occur generally at the end of the Fall and beginning of Winter.

Regarding my initial exploratory analysis there are a couple of points that can be explored further:

  1. In my dataset I have the data for where the collisions occur, as a Longitude and Latitude data. On top of this data there is also the information about the street collision occured and if it was an intersection the names of both streets. From this data the most dangerous intersections around NYC can be plotted on a heat map and this will clearly give a better idea.
  2. Another part of my dataset was the contributing factors to the accidents. I have the contributing factors for all sides who was involved in the accident. A contributing factor analysis can give a better idea what is causing most of the accidents. This can be further extended by Boroughs and by this we can get a clear view of especially Brooklyn and Queens

About Author

Arda Kosar

With a background in Mechatronics Engineering and an MBA , Arda started his career in data science at NYC Data Science Academy. Arda currently works as a Data Scientist at Publicis Worldwide, Search&Data Science Team. Arda works in...
View all posts by Arda Kosar >

Related Articles

Data Analysis
Car Sales Report R Shiny App
Machine Learning
Ames House Prices Predictions
R Shiny
Forecasting NY State Tax Credits: R Shiny App for Businesses
R
R Shiny Shows Decline in Even Strongest Democracies
Data Visualization
Python Shows Factors Influencing University Retention Rates

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application