Crime and Demographics in New York City

Mitchell Hung
Posted on Feb 5, 2018


Some American fixations: football, taxes, and crime. Like its kin, crime sits squarely in the national consciousness; untold resources have been devoted to understanding, dissecting, and analyzing all its facets. Obsession over criminal activity is perhaps nowhere more salient than in New York City, a city which found itself mired in crisis in the 1970s and 80s. The 4/5/6 subway line, which today handles the greatest share of riders, was affectionately called the "Mugger's Express" due to high incidences of daylight robbery. Meanwhile, gangs, prostitutes, and corrupt officials roamed the city unchecked.

Of course, if you're reading this, you know the end of this story already. With mayors David Dinkins, Rudy Giuliani, and Michael Bloomberg in office, New York crime plunged to unprecedented levels. Soho, once an industrial wasteland of sweatshops and abandoned factories, is now one of the most gentrified neighborhoods on the Eastern Seaboard. Brooklyn, once afflicted with staggering amounts of criminal activity, is now a hot zone for the new generation of yuppies. Indeed, The Economist ranked New York City as the #10 safest city in the world on its Safe Cities Index, all but memorializing the Big Apple's transformation into an alpha city.

The Data

So how did New York dramatically reduce its crime rate? Any prospective analyst would find challenge not in finding an answer (of which there are many), but rather in crafting a succinct narrative from the enormous hoard of American crime data. Approaches could be as varied as measuring the effectiveness of stop-and-frisk, or evaluating the impact of strict gun control introduced under Michael Bloomberg.

For my project, I chose to look at two separate data sets: the New York Police Department's (NYPD) Historical Crime Data, and the Census Bureau's American Community Survey (ACS).

The NYPD dataset grew out of Rudy Giuliani's Compstat initiative introduced in 1994. This initiative enforced a statistically-driven approach to crime-reduction; since its inception, all criminal offenses have been logged in a central database, along with relevant data on geographical location, offense type, and time. These data are further grouped by precinct. Datasets are updated weekly, providing impressive granularity and access to New York's crime trends. The currently available data span from 2000 to 2016.

The ACS is a nationwide demographic survey conducted by the United States Census Bureau, which was founded in 2005 out of a need for annually aggregated household data. The ACS contacts approximately 3.5 million households per year and presents the data in an open-source, easily accessible format. Data are gathered on multiple categories, including income, education, and ethnic information. High geographic resolution has also been recently introduced by the Census Bureau in the form of Public Use Microdata Areas (PUMAs), which in essence are census blocks. Interestingly, these blocks do not correspond to any other geographic delineation.

Vision and Limitations

My initial vision was to unify the NYPD and ACS datasets. In doing so I would construct a longitudinal study comparing demographic data with crime rate, grouped by geographic sub-areas within New York City.

Ideally, I would have tried to analyze the initial decline in crime rate which occurred throughout the late 80s and 90s. The criminal offense data were either not available online or were not recorded altogether. Thus any study seeking to use NYPD data could only feasibly catch the tail-end of the crime decline, from 2000 on.

I ran into further limitations with the ACS data. While nationwide New York ACS data are available online from 2000, data standardized into PUMA are not available until 2011. Any longitudinal study, combining NYPD and ACS data grouped by geography, could then only take in years 2011 or after.

But the most serious limitation came when I discovered that the geodata I had been using could not be overlaid on top of each other on my data visualization. And while it was indeed possible to collate the data in a different format, the problem was discovered too close to the project deadline to make a change. When I revisit this project, I will seek to rectify this problem and give the visualization the treatment it deserves.

Ultimately, I could not combine the data geographically, and I could not compare the datasets directly. But I decided that I could construct two separate studies and qualitatively assess the impact of certain variables. What you see below is an amalgamation of two different data visualization studies: a longitudinal study of crime in New York grouped by NYPD precinct, and a demographic snapshot of the city grouped by ACS PUMA.

Data Visualization

My first goal was to visualize crime and demographic data in a choropleth map. Below you can see each precinct color coded by crime rate (with a drop-down menu allowing selections between different types of crime [i.e. major felonies, minor felonies, misdemeanors, and violations], and a slider allowing selection of different years from 2000 to 2016). Figure 1.1 depicts the former, and showcases the hover-over function I implemented into the map.

Figure 1.1 - Choropleth Map of New York City, color coded by volume of crime from 2000 - 2016

Figure 1.2 is another close-up of the crime data choropleth. Data generally hold up with commonly-held assumptions: deep Brooklyn and the Bronx exhibit high rates of crime. An interesting outlier is New York's midtown and NoHo regions, where crime rates fall into the highest crime rate bucket. I could not glean a reason from the data that I had, but it presents an interesting problem for future analysis, should I revisit the data.

Figure 1.2 - Choropleth Map, color coded by volume of major felonies in New York City from 2000 - 2016






My next step was to go through the same process, but with the ACS data. Figures 1.3 and 1.4 depict the same process, but with PUMAs instead of NYPD precincts. You will notice that areas of high crime (i.e. The Bronx and South Brooklyn) from Figures 1.1 and 1.2 roughly tend to correlate with areas of high unemployment and high labor force disengagement, with the exception of mid-town Manhattan. The outer edges of New York proper also exhibit high rates of labor force disengagement. I posit this is due to the outskirts being a more suitable residential area for the retired and family-rearing population, a trend we see in suburban commuters.

Figure 1.3 - Choropleth Map of New York City, color coded by % population unemployed

Figure 1.4 - Choropleth Map, color coded by % population not in the labor force

My second goal was to construct a handful of graphs which visually represented the data and exposed interesting bi-variate trends. I first confirmed that the New York crime rate had indeed dropped significantly (see Figure 2.1). What's astounding is that since the turn of the millennium, the city-wide crime rate dropped from just under 250,000 offenses per year to a little above 150,000 offenses per year, almost a 40% decrease since the beginning of the NYPD data set. Some inter-borough disparities in crime volume can be explained by each borough's population size, with Brooklyn having by far the largest population. But in hindsight, a similar graph adjusting for population size would have been interesting to ponder.

Note: Staten Island data were incomplete from the dataset between 2000 and 2012 and an executive decision was made to disqualify these years

Figure 2.1 - New York City crime rate (in # of offenses committed), color coded by Borough from 2000 - 2016

I next looked at income bracket distributions throughout the city to see if income correlated with crime rate. Not surprisingly, in 2015, Manhattan had the most number of families that made more than $200,000 a year. The Bronx stands apart with the most number of households with the least amount of income, and the least number of households with a high amount of income. Brooklyn exhibits a similar pattern, with a bolstered right tail, probably due to the gentrification of neighborhoods such as Williamsburg and Brooklyn Heights. Qualitatively, neighborhoods with fewer rich households in proportion to poor households seem to have a higher crime rate.

Figure 2.2 - New York City income bracket distribution, color-coded by Borough, 2015

Finally, I plot mean and median income against unemployment rate. We can see that there is a relatively strong correlation between the two variables.

Figure 2.3 - New York City mean household income vs. unemployment rate, color-coded by Borough, 2015

Figure 2.4 - New York City median household income vs. unemployment rate, color-coded by Borough, 2015


About Author

Mitchell Hung

Mitchell Hung

How can data be used for societal introspection? What kind of civic solutions can be constructed from diving into deep data? I graduated with a B.A. in History from the University of Pennsylvania in 2016, before going on...
View all posts by Mitchell Hung >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp