New York City Reported Crimes Incidents
While I was preparing to travel to New York City to join the NYC Data Science Academy. The thirst to know more about the city grew rapidly in me, and I started searching for more information in many fields, especially about how safe is new york city !!
New York City is one of the largest and most recognized cities globally, made from a combination of neighborhoods forming the five main boroughs: Manhattan, Queens, Brooklyn, the Bronx, and Staten island with a population of 8.4 million citizens.
As part of my first project which is mainly manipulating a chosen data set using the appropriate tools in pythons such as Pandas which is an open source Python package that is most widely used for data science/data analysis and machine learning tasks, I decided to work on the NYCPD crime complaints data set.
This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of the year (2020). and can be found on the NYCOpenData website.
Importing the dataset
The file is imported using the read_csv command in pandas which reads the original file and assigns it into a data frame in order to manipulate it.
Then describing the data and displaying the shape and the null values in each column - using the isna function - as the following:
and the null values total number and percentage out of all the data.
Cleaning and Tuning the dataset
As a start, most columns used in this project have been cleaned of their empty or unknown values - using the dropna function- and some other columns are validated based on specific criteria as the following :
- Age Groups: by having a positive number age group and a valid range up to 13o years old.
- Race: by the race groups of BLACK, WHITE HISPANIC, WHITE, ASIAN / PACIFIC ISLANDER, BLACK HISPANIC, AMERICAN INDIAN/ALASKAN NATIVE, OTHER.
- GENDER: by grouping them into MALES, FEMALES, and OTHER.
- Borough: by the five main boroughs in new york city.
unused columns were deleted and the rest of the columns were renamed into more readable names, for example: 'LAW_CAT_CD' to 'Crime_Category', 'CMPLNT_FR_DT' to 'Occurance_Date' etc.
the crime type column values were renamed also to a general group name using the str.contains method.
the newly cleaned data set was assigned to a new data set called new_df with a total number of 5,373,251 records and 11 columns.
The Analysis and Visualization
1. In General crimes analysis
using the value_counts method and some basic mathematical calculations the following results were found.
as we see below Women have been more exposed to crimes than any other gender type by at least 10%.
and here are the top 10 crimes reported in NYC.
it's easy to notice that LARCENY is the most frequent crime to be committed in NYC ( 25%) followed by OFFENCES (23%).
on the other hand, BROOKLYN has the highest number of crime occurrences with almost 1.6 million reports as demonstrated below.
2. Crime time analysis
In order to get a clear idea about crime occurrences over all years, the following analysis and graphs were created . as shown below the graph shows the total number of crimes distributed by each year.
According to the split view of crimes above we can see that the crime rate was declining in general by a small percentage from the year 2006 up to 2018 then the curve started to flip up after probably due to the Covid-19 pandemic which occurred in late 2019 and onward.
At last, as shown below crimes tend to happen more in the summer in July and August and much less in February. it might be a valid point to correlate the crime rate with the tourism season in NYC.
3. A deeper look into a specific crime
As part of the analysis, I decided to dig deeper specifically into the Assault crime and learn more about it. the following result was found to the matter.
According to the graphs above, 60% of assaults happened to Black (45%) and White Hispanic (26%) Males mostly between the age of 25 to 44 with most of these incidents happening in Brooklyn and The Bronx (60% of all incidents ).
Recommendations and Future works
From previous analysis and overview, we can see the rate of crime distribution among several attributes such as age groups, race, borough, and crime types over the years.
The analysis concentrated on the victim's side due to the lack of solid data about the suspects. though I highly recommend the NYPD re-assess their patrol routines in certain parts of NYC such as Brooklyn and certain times of the year like summer. also important to update the records list with more specific information such as the exact place and location and the type of injuries or the value of stolen goods.
In future work, I intend to add more attributes to my analysis such as longitude and latitude in order to create a heat map of crime occurrences and add the population factor of every borough to the analysis equation. keeping in mind adding more functionalities to the analysis and new factors and comparisons similar to investment rate versus the crime rates in certain areas of NYC.