How Safe Is Your Neighbourhood?

Posted on May 13, 2016

Contributed by Arda Kosar. He  graduated from  NYC Data Science Academy 12 week full time Data Science Bootcamp program took place between April 11th to July 1st, 2016. This post is based on his second class project - R Shiny (due on the 4th week of the program).

Part 1 - Motivation

For the second project of the bootcamp, creating a Shiny app, I chose 7 Major Felonies in New York City which can be accessed from the NYC Open Data website.

My main motivation behind creating this app is, exploring the safety of the current neighborhood. Or if one decides to move, this app can be a reference point for assessing the safety of the new neighborhood. Since New York City is one of the biggest metropolitan cities in the world, it will be beneficial to see the crime data projected on a map.

Part 2 - Exploring The App

Part 2.1 - Insights Before Exploration

Screen Shot 2016-05-13 at 8.00.50 AM

When I first downloaded the dataset I saw that there are some definitions to learn before I start doing some data munging.

First of all one of the sections in my outline is to group the data according to time because I want to know about how crime rates change according to time. However I saw that there are 8 columns that gives me time information as can be seen below:

Screen Shot 2016-05-13 at 8.39.31 AM

After some research I found that the 3 columns; CompStat.Month, CompStat.Day, CompStat.Year, actually represents the data from the portal that is named as CompStat. I decided to move on with CompStat values because the other variables for time information includes 30-40% missing values. I also considered imputation; however since I have a complete 9 year data from CompStat I decided moving on with those three variables for time will be more beneficial for my analysis and app.

Part 2.2 - Map

The map is done by using the Leaflet package. Since my dataset has more than 1 million observations I did not start the map by selecting all because it takes too much time to render the map.

**An important note about the location data is that, because of privacy reasons the incidents have been moved to the street segment on which they occur.

Screen Shot 2016-05-13 at 8.59.40 AM

To start viewing points the user has to choose a Borough and a Felony type. When a borough and a felony type is chosen it will show all the felonies committed between 2006-01-01 and 2015-12-31. The user can also enter a date range of interest and filter the data accordingly. Below the data is filtered for Bronx, Burglary and for the date range of 2015-08-01 and 2015-10-19.

Screen Shot 2016-05-13 at 10.31.43 AM

In the filtered map clusters can be seen in different parts of the Borough. I figured out that even the date ranges' changed, the clusters on the map do not change that much. In every Borough these clusters for different types of felonies can be seen.

Part 2.3 - Graphical Exploration

The third tab of the app is graphical exploration. When the tab first clicked it expects the user to select the x and y axises:

Screen Shot 2016-05-14 at 9.08.42 PM

If Borough is chosen as x-axis and Number of Total Felonies is chosen by the y-axis the graph will be like the following:

Screen Shot 2016-05-14 at 9.15.21 PM

It seems that Brooklyn has the highest number of total felonies according to the count however this can be misleading because the counts are not normalized according to the population. The y-axis of the graph can also be chosen as normalized. When x-axis is chosen as Borough and y-axis is chosen as normalized the graph changes as follows:

Screen Shot 2016-05-14 at 9.18.58 PM

When the count of the felonies is normalized by the population of each borough the scenario changes. It seem to be Manhattan has the highest rate of felonies. In my data set, in the CompStat data, I did not have the occurrence hours regarding to the dates from CompStat. What will be more strange is to normalize the data according to the flow population of the Boroughs, therefore a more precise scene may be observed.

Also in the graphical exploration tab the data can be faceted by rows and columns for more information. For example; if I want to see how felony rates change in years in each borough, I can select my x-axis as years, y-axis as normalized and if I facet by column, I get a graph as the following:

Screen Shot 2016-05-14 at 9.26.14 PM

This graph gives me more information regarding the felony rates in each borough over the 9 year period.

Part 2.4 - Data Table

The fourth tab, Data Table, displays the data frame that I used for plotting the map and the graphs. I think this tab will be useful if the user is interested to see where the data on the map and graphs comes from, without downloading the dataset from the link.

Part 3 - Results

I believe this app will be useful for assessing the safety of your current neighborhood or if you are moving you can also check the neighborhood of the new house. I built this app because I believe that safety of the neighborhood is a huge factor on the rent or purchasing decisions.

From the map the safer neighborhoods can be seen from the clusters of the felonies which gives a beneficial insight.

The recent trends on the felony numbers and the change in years can be explored from the Graphical Exploration tab.

The app on its current structure, clearly demonstrates my purpose in building the app however further improvements for the app can be; implementation of heat maps in a different tab, in the Graphical Exploration tab the filtering option for the seasons, normalizing the counts by the daily flow population of the boroughs which will help the user to get more insights.

About Author

Arda Kosar

With a background in Mechatronics Engineering and an MBA , Arda started his career in data science at NYC Data Science Academy. Arda currently works as a Data Scientist at Publicis Worldwide, Search&Data Science Team. Arda works in...
View all posts by Arda Kosar >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI