Data Study on Crime and Demographics in New York City

Posted on Feb 5, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Some American fixations: football, taxes, and crime. Like its kin, crime sits squarely in the national consciousness; untold resources have been devoted to understanding, dissecting, and data analyzing all its facets. Obsession over criminal activity is perhaps nowhere more salient than in New York City, a city which found itself mired in crisis in the 1970s and 80s. The 4/5/6 subway line, which today handles the greatest share of riders, was affectionately called the "Mugger's Express" due to high incidences of daylight robbery. Meanwhile, gangs, prostitutes, and corrupt officials roamed the city unchecked.

Of course, if you're reading this, you know the end of this story already. With mayors David Dinkins, Rudy Giuliani, and Michael Bloomberg in office, New York crime plunged to unprecedented levels. Soho, once an industrial wasteland of sweatshops and abandoned factories, is now one of the most gentrified neighborhoods on the Eastern Seaboard. Brooklyn, once afflicted with staggering amounts of criminal activity, is now a hot zone for the new generation of yuppies. Indeed, The Economist ranked New York City as the #10 safest city in the world on its Safe Cities Index, all but memorializing the Big Apple's transformation into an alpha city.

The Data

So how did New York dramatically reduce its crime rate? Any prospective analyst would find challenge not in finding an answer (of which there are many), but rather in crafting a succinct narrative from the enormous hoard of American crime data. Approaches could be as varied as measuring the effectiveness of stop-and-frisk, or evaluating the impact of strict gun control introduced under Michael Bloomberg.

For my project, I chose to look at two separate data sets: the New York Police Department's (NYPD) Historical Crime Data, and the Census Bureau's American Community Survey (ACS).

The NYPD dataset grew out of Rudy Giuliani's Compstat initiative introduced in 1994. This initiative enforced a statistically-driven approach to crime-reduction; since its inception, all criminal offenses have been logged in a central database, along with relevant data on geographical location, offense type, and time. These data are further grouped by precinct. Datasets are updated weekly, providing impressive granularity and access to New York's crime trends. The currently available data span from 2000 to 2016.

ACS is a nationwide demographic survey conducted by the United States Census Bureau, which was founded in 2005 out of a need for annually aggregated household data. The ACS contacts approximately 3.5 million households per year and presents the data in an open-source, easily accessible format. Data are gathered on multiple categories, including income, education, and ethnic information. High geographic resolution has also been recently introduced by the Census Bureau in the form of Public Use Microdata Areas (PUMAs), which in essence are census blocks. Interestingly, these blocks do not correspond to any other geographic delineation.

Vision and Limitations of DataΒ 

My initial vision was to unify the NYPD and ACS datasets. In doing so I would construct a longitudinal study comparing demographic data with crime rate, grouped by geographic sub-areas within New York City.

Ideally, I would have tried to analyze the initial decline in crime rate which occurred throughout the late 80s and 90s. The criminal offense data were either not available online or were not recorded altogether. Thus any study seeking to use NYPD data could only feasibly catch the tail-end of the crime decline, from 2000 on.

I ran into further limitations with the ACS data. While nationwide New York ACS data are available online from 2000, data standardized into PUMA are not available until 2011. Any longitudinal study, combining NYPD and ACS data grouped by geography, could then only take in years 2011 or after.

But the most serious limitation came when I discovered that the geodata I had been using could not be overlaid on top of each other on my data visualization. And while it was indeed possible to collate the data in a different format, the problem was discovered too close to the project deadline to make a change. When I revisit this project, I will seek to rectify this problem and give the visualization the treatment it deserves.

Ultimately, I could not combine the data geographically, and I could not compare the datasets directly. But I decided that I could construct two separate studies and qualitatively assess the impact of certain variables. What you see below is an amalgamation of two different data visualization studies: a longitudinal study of crime in New York grouped by NYPD precinct, and a demographic snapshot of the city grouped by ACS PUMA.

Data Visualization

My first goal was to visualize crime and demographic data in a choropleth map. Below you can see each precinct color coded by crime rate (with a drop-down menu allowing selections between different types of crime [i.e. major felonies, minor felonies, misdemeanors, and violations], and a slider allowing selection of different years from 2000 to 2016). Figure 1.1 depicts the former, and showcases the hover-over function I implemented into the map.

Data Study on Crime and Demographics in New York City

Figure 1.1 - Choropleth Map of New York City, color coded by volume of crime from 2000 - 2016

Figure 1.2 is another close-up of the crime data choropleth. Data generally hold up with commonly-held assumptions: deep Brooklyn and the Bronx exhibit high rates of crime. An interesting outlier is New York's midtown and NoHo regions, where crime rates fall into the highest crime rate bucket. I could not glean a reason from the data that I had, but it presents an interesting problem for future analysis, should I revisit the data.
Data Study on Crime and Demographics in New York City

Figure 1.2 - Choropleth Map, color coded by volume of major felonies in New York City from 2000 - 2016






Highest Crime Rates

My next step was to go through the same process, but with the ACS data. Figures 1.3 and 1.4 depict the same process, but with PUMAs instead of NYPD precincts. You will notice that areas of high crime (i.e. The Bronx and South Brooklyn) from Figures 1.1 and 1.2 roughly tend to correlate with areas of high unemployment and high labor force disengagement, with the exception of mid-town Manhattan. The outer edges of New York proper also exhibit high rates of labor force disengagement. I posit this is due to the outskirts being a more suitable residential area for the retired and family-rearing population, a trend we see in suburban commuters.

Data Study on Crime and Demographics in New York City

Figure 1.3 - Choropleth Map of New York City, color coded by % population unemployed

Figure 1.4 - Choropleth Map, color coded by % population not in the labor force


My second goal was to construct a handful of graphs which visually represented the data and exposed interesting bi-variate trends. I first confirmed that the New York crime rate had indeed dropped significantly (see Figure 2.1). What's astounding is that since the turn of the millennium, the city-wide crime rate dropped from just under 250,000 offenses per year to a little above 150,000 offenses per year, almost a 40% decrease since the beginning of the NYPD data set. Some inter-borough disparities in crime volume can be explained by each borough's population size, with Brooklyn having by far the largest population. But in hindsight, a similar graph adjusting for population size would have been interesting to ponder.

Note: Staten Island data were incomplete from the dataset between 2000 and 2012 and an executive decision was made to disqualify these years

Figure 2.1 - New York City crime rate (in # of offenses committed), color coded by Borough from 2000 - 2016

I next looked at income bracket distributions throughout the city to see if income correlated with crime rate. Not surprisingly, in 2015, Manhattan had the most number of families that made more than $200,000 a year. The Bronx stands apart with the most number of households with the least amount of income, and the least number of households with a high amount of income. Brooklyn exhibits a similar pattern, with a bolstered right tail, probably due to the gentrification of neighborhoods such as Williamsburg and Brooklyn Heights. Qualitatively, neighborhoods with fewer rich households in proportion to poor households seem to have a higher crime rate.

Median Income vs Unemployment

Figure 2.2 - New York City income bracket distribution, color-coded by Borough, 2015

Finally, I plot mean and median income against unemployment rate. We can see that there is a relatively strong correlation between the two variables.

Figure 2.3 - New York City mean household income vs. unemployment rate, color-coded by Borough, 2015

Figure 2.4 - New York City median household income vs. unemployment rate, color-coded by Borough, 2015


About Author

Mitchell Hung

How can data be used for societal introspection? What kind of civic solutions can be constructed from diving into deep data? I graduated with a B.A. in History from the University of Pennsylvania in 2016, before going on...
View all posts by Mitchell Hung >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI