How Safe Is Your Neighbourhood?
Contributed by Arda Kosar. He graduated from NYC Data Science Academy 12 week full time Data Science Bootcamp program took place between April 11th to July 1st, 2016. This post is based on his second class project - R Shiny (due on the 4th week of the program).
Part 1 - Motivation
For the second project of the bootcamp, creating a Shiny app, I chose 7 Major Felonies in New York City which can be accessed from the NYC Open Data website.
My main motivation behind creating this app is, exploring the safety of the current neighborhood. Or if one decides to move, this app can be a reference point for assessing the safety of the new neighborhood. Since New York City is one of the biggest metropolitan cities in the world, it will be beneficial to see the crime data projected on a map.
Part 2 - Exploring The App
Part 2.1 - Insights Before Exploration
When I first downloaded the dataset I saw that there are some definitions to learn before I start doing some data munging.
First of all one of the sections in my outline is to group the data according to time because I want to know about how crime rates change according to time. However I saw that there are 8 columns that gives me time information as can be seen below:
After some research I found that the 3 columns; CompStat.Month, CompStat.Day, CompStat.Year, actually represents the data from the portal that is named as CompStat. I decided to move on with CompStat values because the other variables for time information includes 30-40% missing values. I also considered imputation; however since I have a complete 9 year data from CompStat I decided moving on with those three variables for time will be more beneficial for my analysis and app.
Part 2.2 - Map
The map is done by using the Leaflet package. Since my dataset has more than 1 million observations I did not start the map by selecting all because it takes too much time to render the map.
**An important note about the location data is that, because of privacy reasons the incidents have been moved to the street segment on which they occur.
To start viewing points the user has to choose a Borough and a Felony type. When a borough and a felony type is chosen it will show all the felonies committed between 2006-01-01 and 2015-12-31. The user can also enter a date range of interest and filter the data accordingly. Below the data is filtered for Bronx, Burglary and for the date range of 2015-08-01 and 2015-10-19.
In the filtered map clusters can be seen in different parts of the Borough. I figured out that even the date ranges' changed, the clusters on the map do not change that much. In every Borough these clusters for different types of felonies can be seen.
Part 2.3 - Graphical Exploration
The third tab of the app is graphical exploration. When the tab first clicked it expects the user to select the x and y axises:
If Borough is chosen as x-axis and Number of Total Felonies is chosen by the y-axis the graph will be like the following:
It seems that Brooklyn has the highest number of total felonies according to the count however this can be misleading because the counts are not normalized according to the population. The y-axis of the graph can also be chosen as normalized. When x-axis is chosen as Borough and y-axis is chosen as normalized the graph changes as follows:
When the count of the felonies is normalized by the population of each borough the scenario changes. It seem to be Manhattan has the highest rate of felonies. In my data set, in the CompStat data, I did not have the occurrence hours regarding to the dates from CompStat. What will be more strange is to normalize the data according to the flow population of the Boroughs, therefore a more precise scene may be observed.
Also in the graphical exploration tab the data can be faceted by rows and columns for more information. For example; if I want to see how felony rates change in years in each borough, I can select my x-axis as years, y-axis as normalized and if I facet by column, I get a graph as the following:
This graph gives me more information regarding the felony rates in each borough over the 9 year period.
Part 2.4 - Data Table
The fourth tab, Data Table, displays the data frame that I used for plotting the map and the graphs. I think this tab will be useful if the user is interested to see where the data on the map and graphs comes from, without downloading the dataset from the link.
Part 3 - Results
I believe this app will be useful for assessing the safety of your current neighborhood or if you are moving you can also check the neighborhood of the new house. I built this app because I believe that safety of the neighborhood is a huge factor on the rent or purchasing decisions.
From the map the safer neighborhoods can be seen from the clusters of the felonies which gives a beneficial insight.
The recent trends on the felony numbers and the change in years can be explored from the Graphical Exploration tab.
The app on its current structure, clearly demonstrates my purpose in building the app however further improvements for the app can be; implementation of heat maps in a different tab, in the Graphical Exploration tab the filtering option for the seasons, normalizing the counts by the daily flow population of the boroughs which will help the user to get more insights.