R Shiny - Data Analysis on the Public Safety in Boston
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Github Repository
Covid-19 changed our life style by making people staying home and to work remotely. I therefore moved from New Jersey to Massachusetts in order to stay with family. I’ve never travelled Boston for such a long time and that made me curious about the living environment and the social data of this city.
From the website of Boston government dataset under the category of public safety, I found a dataset released by the Boston Police Department that contains crime incident report between year of 2016 and 2019.
There are 544,660 samples in total with 19 columns. Before analyze the date, I notice that samples from 2016 to 2018 are around 100,000. However, the sample of 2019 is extremely high with a number of 25,004 rows. Another thing I found is that even though Boston government named this dataset as crime incident report, I would consider it as a violation report instead because the incident contains vehicle accident, towed, and some others that are not as severe as crimes.
I used Shiny, ggplot2, leaflet, and dplyr packages within R Studio to create an interactive dashboard of data visualization.
In this project, I would like to know:
- Which season or day has the most violations?
- What’s the number of violation of each incidence?
- Among all the violations, were there many gun shoot involved?
- What are the districts with the highest numbers of violations?
Date Cleaning
- Value of district column only contains the codes of districts. I found the full name and code online and covered values of this column to full names of districts.
- There is a column that contains the year, month and date of each violation. I extracted year and month as 2 new columns. Since I wanted to analyze the violations of each season, I also created a new column to indicate season accordingly.
- There is a column named shooting with the value of 1 and 0 where 1 indicates gun shoot involved and 0 indicates no gun shooting. However, there are also many null value. I inserted “N/A” to avoid potential affection that may be caused by the null value.
Data Analysis
Below four graphs shows the number of violation of each season by year. As we see from the data of 2016 to 2018, summer is the season which has the most violations, while winter has the least number of violations. I assume that the cause of the difference is that people would have more outdoor activities in summer than in winter so it further increased the number of violations in a warmer season during the year. However, in 2019, there was an increase of violations in winter which seems abnormal compare to the rest of the data.
2016
2017
2018
2019
Since season is a fact that affects the number of violations, I the wondered whether the day of the week would be the fact as well. However, after analyzing the data, I found that there is not much difference among the days.
Below is a frame that sorts the most frequent incidents. As we see, vehicle accident, larceny, and medical assistance are the top 3 violations in Boston area. Some violations, such as medical assistance and vehicle towed, do not affect the quality of people’s safety. In other words, these do not harm people who lives in the Boston area.
When looking into the violations of each district, I found that below districts had the most violations. If one is searching for residency in Boston, I would not suggest him or her to search below district for domestic purpose.
Below shows the ratio of violation with gun shooting involved and without. Among the violations, the ratio of shooting is only 0.004768, which is not a large ratio compare to non-shooting violations.
After plotting the location of all the gun shooting on map, we can easily tell that most of the gun shootings were taken place in south Boston. Recall the analytical frame I attached above regarding numbers of violations by district, area below with the most blue dots are the districts showing on the frame such as Roxbury, Dorchester, and Mattapan.
Conclusion
This data analysis study answered by inquiries at the begin of this project. From this project, we can tell that Boston is safe to live with a low rate of gun shooting. Some violations are not harmful to residents in Boston. However, there are some districts that have more violations than others. With all the graphs and charts I’ve done, I wish this project not only provides answers to my personal curious, but also advices to people who plan to live in the Boston area.