New York City Reported Crimes Incidents

Posted on Aug 2, 2022

Introduction

While I was preparing to travel to New York City to join the NYC Data Science Academy. The thirst to know more about the city grew rapidly in me, and I started searching for more information in many fields, especially about how safe is new york city !!

New York City is one of the largest and most recognized cities globally, made from a combination of neighborhoods forming the five main boroughs: Manhattan, Queens, Brooklyn, the Bronx, and Staten island with a population of 8.4 million citizens.

As part of my first project which is mainly manipulating a chosen data set using the appropriate tools in pythons such as Pandas which is an open source Python package that is most widely used for data science/data analysis and machine learning tasks, I decided to work on the NYCPD crime complaints data set.

The Dataset

This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of the year (2020). and can be found on the NYCOpenData website.

Importing the dataset

The file is imported using the read_csv command in pandas which reads the original file and assigns it into a data frame in order to manipulate it.

Then describing the data and displaying the shape and the null values in each column - using the isna function - as the following:

and the null values total number and percentage out of all the data.

Cleaning and Tuning the dataset

As a start, most columns used in this project have been cleaned of their empty or unknown values - using the dropna function- and some other columns are validated based on specific criteria as the following :

  1. Age Groups: by having a positive number age group and a valid range up to 13o years old.
  2. Race: by the race groups of BLACK, WHITE HISPANIC, WHITE, ASIAN / PACIFIC ISLANDER, BLACK HISPANIC, AMERICAN INDIAN/ALASKAN NATIVE, OTHER.
  3. GENDER: by grouping them into MALES, FEMALES, and OTHER.
  4. Borough: by the five main boroughs in new york city.

unused columns were deleted and the rest of the columns were renamed into more readable names, for example: 'LAW_CAT_CD' to 'Crime_Category', 'CMPLNT_FR_DT' to 'Occurance_Date' etc.

the crime type column values were renamed also to a general group name using the str.contains method.

the newly cleaned data set was assigned to a new data set called new_df with a total number of 5,373,251 records and 11 columns.

The Analysis and Visualization

1. In General crimes analysis

using the value_counts method and some basic mathematical calculations the following results were found.

as we see below Women have been more exposed to crimes than any other gender type by at least 10%.

and here are the top 10 crimes reported in NYC.

it's easy to notice that LARCENY is the most frequent crime to be committed in NYC ( 25%) followed by OFFENCES (23%).

on the other hand, BROOKLYN  has the highest number of crime occurrences with almost 1.6 million reports as demonstrated below.

2. Crime time analysis

In order to get a clear idea about crime occurrences over all years, the following analysis and graphs were created . as shown below the graph shows the total number of crimes distributed by each year.

According to the split view of crimes above we can see that the crime rate was declining in general by a small percentage from the year 2006 up to 2018 then the curve started to flip up after probably due to the Covid-19 pandemic which occurred in late 2019 and onward.

At last, as shown below crimes tend to happen more in the summer in July and August and much less in February. it might be a valid point to correlate the crime rate with the tourism season in NYC.

3. A deeper look into a specific crime

As part of the analysis, I decided to dig deeper specifically into the Assault crime and learn more about it. the following result was found to the matter.

According to the graphs above, 60% of assaults happened to Black (45%) and White Hispanic (26%) Males mostly between the age of 25 to 44 with most of these incidents happening in Brooklyn and The Bronx (60% of all incidents ).

Recommendations and Future works

From previous analysis and overview, we can see the rate of crime distribution among several attributes such as age groups, race, borough, and crime types over the years.

The analysis concentrated on the victim's side due to the lack of solid data about the suspects. though I highly recommend the NYPD  re-assess their patrol routines in certain parts of NYC such as Brooklyn and certain times of the year like summer. also important to update the records list with more specific information such as the exact place and location and the type of injuries or the value of stolen goods.

In future work, I intend to add more attributes to my analysis such as longitude and latitude in order to create a heat map of crime occurrences and add the population factor of every borough to the analysis equation. keeping in mind adding more functionalities to the analysis and new factors and comparisons similar to investment rate versus the crime rates in certain areas of NYC.

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Al Mutasim Bakathir Al Kindi

A data scientist from Oman
View all posts by Al Mutasim Bakathir Al Kindi >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI