Data Study on Safety in Los Angeles

Posted on Aug 8, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

 WHERE, WHEN, WHAT – How to live safely in Los Angeles?

-- 2011~2014 Crime and Collision in Los Angeles

Contributed by Shu Liu. He is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on his second class project - Shiny (due on the 4th week of the program).


Crime and collision have always been issues that concern people, especially for those who are living in Los Angeles or who are going to live in this area. As a big city with a large population, data shows Los Angeles has problems such as traffic accidents, burglary, and robbery. This Shiny Visualization project is designed for people who care about their safety in Los Angeles.


Three features of crime and collision were investigated: geographical distribution, crimes time, and crimes types. The shiny app described here includes detailed information about crimes type, location, time, and date of incidents.  The database covers the period from 2011 to 2014. Readers are welcome to explore inside and know more about crime and collision in Los Angeles.

You can explore this project via Shiny. (It may take you several seconds to load the app.)

You can check the R code on Github.

Data on where Crime and Collision Occur:

Data Study on Safety in Los Angeles

The shiny app allows users to customize the area and year they’d like to pay more attention to, and know every detail about every crime and collision that happened in the past four years. On another page “Crimes Area”, visitors can explore more about crime and collision trends, or drill down to learn more about specific incidents.

Which Area has the most crime and collision, and which one has the least?
Data Study on Safety in Los Angeles

LAPD oversees 21 areas in Los Angeles. The bar chart shows the number of crime and collision incidents based on how the LAPD defines different areas.   According to the right graph, the Hollenbeck has the fewest incidents, and the most occur in  77th Street. This rank is based on the total volume of crime and collision instead of crime occurrence density.

Is volume a good index for safety evaluation?

Data Study on Safety in Los Angeles

The shiny app allows one to check the historical trends in crime and collision, not just the current amount.  In some areas, for example Hollenback, a low volume of incidents can be misleading.  Despite its looking apparently safe because of the minimal volume of crime and collision, it has an increasing trend during the four years.


Data on When Crime and Collision Appear.

Data Study on Safety in Los Angeles
Dividing total crime and collision into seven days of a week, it’s interesting to find that Friday has the most crime and collision while Sunday has the least. I cannot give a proper explanation without professional knowledge in criminology, but this graph does tell us a pattern of crime in a week.



In addition, there are more noteworthy trends in this three-dimension graph. This graph can tell us more information about how days and a specific time period together influence the crime and collision occurrence . From the right angle, it’s obvious that crimes are much more frequent after noon than before noon. During 5:00 - 6:00am, there are the fewest crimes compared with other times in the day.


After transferring to another side, we focus on the 0:00pm (midnight) of every day. Crime and Collision are more likely to happen at 0:00pm on Saturday and Sunday, and the afternoon and mornings of weekends are relatively safer than weekdays


Data on Types of Crime and Collision.

In this interactive interface, users are free to check crime and collision types during different time periods and different seasons. The temperature in Los Angeles is really stable all the year round, so a year is only divided into two seasons, hot and warm, according to the historical record. Hot season is from May to October (6 months), and warm season consists of other months (6 months).

screen-shot-2016-09-27-at-17-02-11In addition, I also divided a day into seven periods for deeper exploration:  • Early morning: 4am - 8am • morning: 8am - 11am • Noon: 11am - 1pm • afternoon: 1pm - 5pm • evening: 5pm - 8 pm • Night: 8pm - 11am • Midnight 11pm - 4am.

Generally speaking, 'Hot' season (May to Oct.) has slightly more crime and collision than 'Warm' season (Nov. to Apr.) in all periods of a day. As to crime and collision types, 'Traffic' related issues are the major part, and burglary and theft related issues also worth notice.

Further Steps:

  • LA Open Data (data source) provides crime and collision data from 2011 to 2015, but data in 2015 is incomplete so I reomve the data from my analysis to avoid bias. In the future, I will update newest data to my shiny app when LAPD completes those missing data.
  • The crime and collison map on the front page only load a small part of samples due to the overall efficiency. Acutally, I tried my best to improve the efficiency of algorithm but the key point is that RStudio is a little bit mean to free users, and it limits the computation capability for my Shiny app. Maybe I will update my membership to premium  to fix this problem when I get a good job.

About Author


Shu is currently a master’s student studying financial engineering at University of Southern California, and he has a multidisciplinary background in math, economics, and financial engineering. Being able to look at problems from both marketing and technical perspectives,...
View all posts by Shu LIU >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI