R Shiny - Data Analysis on the Public Safety in Boston

Posted on Aug 3, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Github Repository

Covid-19 changed our life style by making people staying home and to work remotely. I therefore moved from New Jersey to Massachusetts in order to stay with family. I’ve never travelled Boston for such a long time and that made me curious about the living environment and the social data of this city.

From the website of Boston government dataset under the category of public safety, I found a dataset released by the Boston Police Department that contains crime incident report between year of 2016 and 2019.

There are 544,660 samples in total with 19 columns. Before analyze the date, I notice that samples from 2016 to 2018 are around 100,000. However, the sample of 2019 is extremely high with a number of 25,004 rows. Another thing I found is that even though Boston government named this dataset as crime incident report, I would consider it as a violation report instead because the incident contains vehicle accident, towed, and some others that are not as severe as crimes.

I used Shiny, ggplot2, leaflet, and dplyr packages within R Studio to create an interactive dashboard of data visualization.

In this project, I would like to know:

  • Which season or day has the most violations?
  • What’s the number of violation of each incidence?
  • Among all the violations, were there many gun shoot involved?
  • What are the districts with the highest numbers of violations?

Date Cleaning

  • Value of district column only contains the codes of districts. I found the full name and code online and covered values of this column to full names of districts.
  • There is a column that contains the year, month and date of each violation. I extracted year and month as 2 new columns. Since I wanted to analyze the violations of each season, I also created a new column to indicate season accordingly.
  • There is a column named shooting with the value of 1 and 0 where 1 indicates gun shoot involved and 0 indicates no gun shooting. However, there are also many null value. I inserted β€œN/A” to avoid potential affection that may be caused by the null value.

Data Analysis

Below four graphs shows the number of violation of each season by year. As we see from the data of 2016 to 2018, summer is the season which has the most violations, while winter has the least number of violations. I assume that the cause of the difference is that people would have more outdoor activities in summer than in winter so it further increased the number of violations in a warmer season during the year. However, in 2019, there was an increase of violations in winter which seems abnormal compare to the rest of the data.


R Shiny - Data Analysis on the Public Safety in Boston


R Shiny - Data Analysis on the Public Safety in Boston


R Shiny - Data Analysis on the Public Safety in Boston


Since season is a fact that affects the number of violations, I the wondered whether the day of the week would be the fact as well. However, after analyzing the data, I found that there is not much difference among the days.

Below is a frame that sorts the most frequent incidents. As we see, vehicle accident, larceny, and medical assistance are the top 3 violations in Boston area. Some violations, such as medical assistance and vehicle towed, do not affect the quality of people’s safety. In other words, these do not harm people who lives in the Boston area.

When looking into the violations of each district, I found that below districts had the most violations. If one is searching for residency in Boston, I would not suggest him or her to search below district for domestic purpose.

Below shows the ratio of violation with gun shooting involved and without. Among the violations, the ratio of shooting is only 0.004768, which is not a large ratio compare to non-shooting violations.

After plotting the location of all the gun shooting on map, we can easily tell that most of the gun shootings were taken place in south Boston. Recall the analytical frame I attached above regarding numbers of violations by district, area below with the most blue dots are the districts showing on the frame such as Roxbury, Dorchester, and Mattapan.


This data analysis study answered by inquiries at the begin of this project. From this project, we can tell that Boston is safe to live with a low rate of gun shooting. Some violations are not harmful to residents in Boston. However, there are some districts that have more violations than others. With all the graphs and charts I’ve done, I wish this project not only provides answers to my personal curious, but also advices to people who plan to live in the Boston area.

About Author

Cassandra Jones

Cassandra Jones is a certified data scientist with a focus on data science technologies and banking. Working at investment bank for 4 years on client services. Passionate about any data driven business insights going forward...
View all posts by Cassandra Jones >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI