Using Data to Explore the Increase in NYC Subway Crimes

Posted on May 30, 2022

If you follow the news, you’ve probably seen several headlines about the increase of crime in the city lately, particularly about incidents in subways. This exploratory data analysis leverages NYPD complaints and MTA Daily Ridership datasets to understand if crime has actually increased in subways.

To begin my analysis, I realized that I had to account for the fact that ridership has also decreased during the pandemic. The charts below compare ridership and subway incident report volumes from 2015-2021. As you can see on the left, ridership volume drastically decreased in the past 2 years when COVID hit. Now, the number of crimes reported were also lower for those same years but we can visibly see that the drop in crime incidents did not go down as much as ridership. In fact, ridership in 2020 dropped about 60% from 2019 with crime decreasing only about 39%.  So, if we think about crime incidents per rider, it’s apparent that we've experienced an increase during the pandemic.

Have subway crimes increased? If so, by how much?

The Y axis on the chart below shows crime numbers per rider.  The X axis shows the years from 2015 to 2021.  As you can see, crime per rider has increased during the pandemic.  In fact, it has increased about 60%.  This is consistent with what we saw on the prior graph.  It's important to note that although crime rates have increased, rates are still relatively low when we think about just how many people ride the subways in New York.

Subway Incidents per ridership (2015-2021)

And when breaking out the view per borough, we see that all four subway boroughs have experienced this uptick similarly.

Subway Incidents per ridership per borough (2015-2020)

What types of crimes increased?

Given all of the tragic stories reported in the news, I was curious to see if there was a particular increase in serious crimes. The chart below represents the annual count of incidents reported per year along with the distribution of incident type within a given year.

Distribution of level of crimes per year per total volume

Fortunately, it looks like the distribution of felonies (which are classified as the most serious) have actually gone down slightly in the past 2 years, with higher distribution allocated towards misdemeanors in 2021 compared to all other years.

And again, breaking out for each borough, we see similar trends happening across with less distribution for felonies in the post covid years.

Distribution of level of crimes per year per total volume (for each borough)

Which stations have the most incidents? Was there a difference pre and post pandemic?

I was curious to see where the most incidents occurred and if there was a change at all in terms of location post covid.  For the most part, there did not seem to be drastic differences. Most of the activity appears to occur between the 125th St. stations up north and the larger & busier hub stations around 34th and Times Square; these were the locations that consistently ranked high for count of incidents each year.

Top 5 stations with most reported incidents

What about the differences in time of day when subway incidents occur?

When looking at the number of incidents over the course of a day, we notice spikes during the morning and afternoon rush hours; between the times from 8am-9am and then from 3pm-5pm. Again, this makes sense when we’re talking about total count of incidents and there are just more bodies in the subway during these times. These peaks are much more pronounced pre-covid years (2015-2019). The peaks during post-covid years (2020-2021) are much smaller and scattered; we can attribute this to much of the population working from home.

Number of incidents over the course of a day (2015-2021)

What was interesting to me was that there was relatively low activity during the late evenings (which is when one would assume more crimes happen) but when we’re thinking about total volume, it would make sense since there are significantly less people that take the subways during these late night hours.

When we look at the differences amongst the years, it's also worth pointing out that there does not seem to be big differences in number of incidents pre-covid and post covid in late night hours (even though ridership was significantly lower in 2020, 2021 as we saw in the earlier slides). We do see larger gap in volume differences during rush hour.

Do busier subways = higher likelihood of incidents?

The last two plots seem to suggest that more crimes may happen the busier the subway is. We've just learned that more crimes seem to happen during peak hours when there are more bodies in the stations and the stations that had the highest report like Times Square/Penn Station/Port Authority were all major hubs that are quite busy. So, is that true? Does a station's crowdedness mean there will be higher likelihood in crime incidents? Perhaps crowdedness and hectic environments induce more crime?

The chart below shows the raw counts of incidents for 10 of the busiest and 10 of the least busiest subways measured by annual ridership volume, as expected there are more incidents on high ridership volume subways. This positive linear relationship is expected because we’re looking at raw counts of incidents.

Number of incidents and Ridership Volume
Same graph as above with station names labelled

But when normalizing the counts and plotting the ratio of Incident per Rider across Ridership Volume, we see that there’s less of a relationship. This is especially true for the low volume stations as we see large variability among the rate of incidents per rider. For example, 145th station and Bowery have similar low annual ridership volume but their incident % have the largest difference.

% Incident per Rider and Ridership Volume

So it’s pretty clear that there are several other factors that contribute to incident rate. A few factors that come to mind could be overall crime rate for the given area the stations are within along with other socioeconomic factors for the neighborhood.

Recent subway incidents in the headlines

Lastly, I was curious to see how much of an impact big events that made news across media headlines had on New Yorkers and if these terrible incidents affected their commuting behaviors in any way.

One of those tragic events took place earlier this year, when a young woman was pushed onto the tracks of an upcoming train. Given the horrifying news and events, I wondered how much of an impact it had on other New Yorkers.

The plots here show the range of daily ridership volume, the one on the left is the range the week prior to the incident, with the middle showing the week of, then finally the last showing the week after.

There is potential indication that there was an effect, as we see the bottom of the interquartile stretching further down to lower daily volumes during the week of. However, the median for the week of the incident is actually higher. In fact both the distribution of median to the 75th quartile are quite high and concentrated above.

While the news was horrifying and terribly tragic; it seems like for the most part, this incident didn’t have a large effect on ridership. And this makes sense as the subway is often the only accessible and affordable mode of transportation to a lot of people and perhaps a singular random event would not be enough to deter most people from changing their behavior in any drastic way.

This next event unfortunately took place just a few months after when there was a senseless shooting in the sunset park Brooklyn station resulting in multiple injuries. And while thankfully, there were no fatalities during this event, it appears that this had a bigger impact on New Yorkers based on the dip in ridership the week of the event. This incident was at a larger scale and considered a terror attack so in that regard, it makes sense as to why there would be a larger impact.

The median and 25th percentile during the week of is lower along with a longer tail towards the lower end of the volume as opposed to the week before and after.

Summary & Next Steps

In summary, it’s clear that crime incidents per rider has gone up in addition to subway ridership not yet recovering back to pre pandemic volumes.

  • Even though incident rates have increased, they are still relatively low when considering the sheer volume of people who take the subway everyday.
  • Severe crimes (felonies) have actually decreased in the past 2 years.
  • There were higher subway incidents during morning and afternoon rush hours in pre-covid years.
  • Busier stations do not necessarily equal higher crime rates, there are likely several other factors that contribute to the crime levels of a given station.
  • With limited analysis on two publicized incidents, it appears that significant crimes and headlines could have a direct impact on ridership volume in the short term.

For next steps, I’d like to further explore the decrease in ridership and crime rate relationship in a different view. Asides from comparing various stations and their ridership volume, choose an example station and compare weekly ridership volume as they start to drop and the corresponding crime rate within that station.

I’d also like to look at other datasets that look into some of the other socioeconomic variables I mentioned that would possibly have larger affects on subway crime rates.

And finally I’d also like to incorporate statistical analysis and measure the correlations for the relationships mentioned above.

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI