Data Study on Safety in Los Angeles
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Β WHERE, WHEN, WHAT β How to live safely in Los Angeles?
-- 2011~2014 Crime and Collision in Los Angeles
Contributed by Shu Liu. He is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on his second class project - Shiny (due on the 4th week of the program).
Motivation:
Crime and collision have always been issues that concern people, especially for those who are living in Los Angeles or who are going to live in this area. As a big city with a large population, data shows Los Angeles has problems such as traffic accidents, burglary, and robbery. This Shiny Visualization project is designed for people who care about their safety in Los Angeles.
Overview:
Three features of crime and collision were investigated: geographical distribution, crimes time, and crimes types. The shiny app described here includes detailed information about crimes type, location, time, and date of incidents. Β The database covers the period from 2011 to 2014. Readers are welcome to explore inside and know more about crime and collision in Los Angeles.
You can explore this project via Shiny. (It may take you several seconds to load the app.)
You can check the R code on Github.
Data on where Crime and Collision Occur:
The shiny app allows users to customize the area and year theyβd like to pay more attention to, and know every detail about every crime and collision that happened in the past four years. On another page βCrimes Areaβ, visitors can explore more about crime and collision trends, or drill down to learn more about specific incidents.
Which Area has the most crime and collision, and which one has the least?

LAPD oversees 21 areas in Los Angeles. The bar chart shows the number of crime and collision incidents based on how the LAPD defines different areas. Β Β According to the right graph, the Hollenbeck has the fewest incidents, and the most occur in Β 77th Street. This rank is based on the total volume of crime and collision instead of crime occurrence density.
Is volume a good index for safety evaluation?
The shiny app allows one to check the historical trends in crime and collision, not just the current amount. Β In some areas, for example Hollenback, a low volume of incidents can be misleading. Β Despite its looking apparently safe because of the minimal volume of crime and collision, it has an increasing trend during the four years.
Data on When Crime and Collision Appear.
Dividing total crime and collision into seven days of a week, itβs interesting to find that Friday has the most crime and collision while Sunday has the least. I cannot give a proper explanation without professional knowledge in criminology, but this graph does tell us a pattern of crime in a week.
In addition, there are more noteworthy trends in this three-dimension graph. This graph can tell us more information about how days and a specific time period together influence the crime and collision occurrence . From the right angle, itβs obvious that crimes are much more frequent after noon than before noon. During 5:00 - 6:00am, there are the fewest crimes compared with other times in the day.
After transferring to another side, we focus on the 0:00pm (midnight) of every day. Crime and Collision are more likely to happen at 0:00pm on Saturday and Sunday, and the afternoon and mornings of weekends are relatively safer than weekdays
Data on Types of Crime and Collision.
In this interactive interface, users are free to check crime and collision types during different time periods and different seasons. The temperature in Los Angeles is really stable all the year round, so a year is only divided into two seasons, hot and warm, according to the historical record. Hot season is from May to October (6 months), and warm season consists of other months (6 months).
In addition, I also divided a day into seven periods for deeper exploration: Β β’ Early morning: 4am - 8am β’ morning: 8am - 11am β’ Noon: 11am - 1pm β’ afternoon: 1pm - 5pm β’ evening: 5pm - 8 pm β’ Night: 8pm - 11am β’ Midnight 11pm - 4am.
Generally speaking, 'Hot' season (May to Oct.) has slightly more crime and collision than 'Warm' season (Nov. to Apr.) in all periods of a day. As to crime and collision types, 'Traffic' related issues are the major part, and burglary and theft related issues also worth notice.
Further Steps:
- LA Open Data (data source) provides crime and collision data from 2011 to 2015, but data in 2015 is incomplete so I reomve the data from my analysis to avoid bias. In the future, I will update newest data to my shiny app when LAPD completes those missing data.
- The crime and collison map on the front page only load a small part of samples due to the overallΒ efficiency. Acutally, I tried my best to improve the efficiency of algorithm but the key point is that RStudio is a little bit mean to free users, and it limits the computation capability for my Shiny app. Maybe I will update my membership to premium Β to fix this problem when I get a good job.