R Shiny - Public Safety in Boston

Posted on Aug 3, 2021

Github Repository

Covid-19 changed our life style by making people staying home and to work remotely. I therefore moved from New Jersey to Massachusetts in order to stay with family. I’ve never travelled Boston for such a long time and that made me curious about the living environment of this city.

From the website of Boston government dataset under the category of public safety, I found a dataset released by the Boston Police Department that contains crime incident report between year of 2016 and 2019. There are 544,660 samples in total with 19 columns. Before analyze the date, I notice that samples from 2016 to 2018 are around 100,000. However, the sample of 2019 is extremely high with a number of 25,004 rows. Another thing I found is that even though Boston government named this dataset as crime incident report, I would consider it as a violation report instead because the incident contains vehicle accident, towed, and some others that are not as severe as crimes.

I used Shiny, ggplot2, leaflet, and dplyr packages within R Studio to create an interactive dashboard of data visualization.

In this project, I would like to know:

  • Which season or day has the most violations?
  • What’s the number of violation of each incidence?
  • Among all the violations, were there many gun shoot involved?
  • What are the districts with the highest numbers of violations?

Date Cleaning

  • Value of district column only contains the codes of districts. I found the full name and code online and covered values of this column to full names of districts.
  • There is a column that contains the year, month and date of each violation. I extracted year and month as 2 new columns. Since I wanted to analyze the violations of each season, I also created a new column to indicate season accordingly.
  • There is a column named shooting with the value of 1 and 0 where 1 indicates gun shoot involved and 0 indicates no gun shooting. However, there are also many null value. I inserted “N/A” to avoid potential affection that may be caused by the null value.

Data Analysis

Below four graphs shows the number of violation of each season by year. As we see from the data of 2016 to 2018, summer is the season which has the most violations, while winter has the least number of violations. I assume that the cause of the difference is that people would have more outdoor activities in summer than in winter so it further increased the number of violations in a warmer season during the year. However, in 2019, there was an increase of violations in winter which seems abnormal compare to the rest of the data.





Since season is a fact that affects the number of violations, I the wondered whether the day of the week would be the fact as well. However, after analyzing the data, I found that there is not much difference among the days.

Below is a frame that sorts the most frequent incidents. As we see, vehicle accident, larceny, and medical assistance are the top 3 violations in Boston area. Some violations, such as medical assistance and vehicle towed, do not affect the quality of people’s safety. In other words, these do not harm people who lives in the Boston area.

When looking into the violations of each district, I found that below districts had the most violations. If one is searching for residency in Boston, I would not suggest him or her to search below district for domestic purpose.

Below shows the ratio of violation with gun shooting involved and without. Among the violations, the ratio of shooting is only 0.004768, which is not a large ratio compare to non-shooting violations.

After plotting the location of all the gun shooting on map, we can easily tell that most of the gun shootings were taken place in south Boston. Recall the analytical frame I attached above regarding numbers of violations by district, area below with the most blue dots are the districts showing on the frame such as Roxbury, Dorchester, and Mattapan.


This data analysis study answered by inquiries at the begin of this project. From this project, we can tell that Boston is safe to live with a low rate of gun shooting. Some violations are not harmful to residents in Boston. However, there are some districts that have more violations than others. With all the graphs and charts I’ve done, I wish this project not only provides answers to my personal curious, but also advices to people who plan to live in the Boston area.

About Author

Cassandra Jones

Cassandra Jones is a certified data scientist with a focus on data science technologies and banking. Working at investment bank for 4 years on client services. Passionate about any data driven business insights going forward...
View all posts by Cassandra Jones >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp