Murder Data Statistics in the United States

Posted on Aug 12, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

I analyzed murder statistics in the United States from 1980 to 2014. First of all, the reason why I chose the subject about murder was because I enjoyed watching the American crime dramas. There are a lot of dramatic stories, So I was curious if there are any real events like that. When I started this shiny project, I was able to find data on the murders in the United States and have the opportunity to analyze them.

Here is my Shiny Web App and Git Hub .


Data & Question

The data used for this project is "Homicide Reports, from 1980 to2014" uploaded to the Kaggle site. Raw-data consists of 24 columns and about 630,000 data, and the column consists of Agency Info, Crime Solved, year and month of occurrence, victim information, perpetrator information, relationships, weapons, and record sources.

Murder Data Statistics in the United States

I used a total of 9 data columns for analysis and they are State, Year, Victim and perpetrator's info (age, Sex), Crime Solved, Relationship, Weapon. 
I draw the columns of data used for each question.  Overall data cleansing carried out pre-processing of the data from the perpetrator to 0 years old or the victim to 998 years old. Each cleansing according to each query was carried out, if necessary, and some new columns were created and used to draw graphs.

Data Observation

Q1. Where?

Murder Data Statistics in the United States
<Murders by state (Crime Solved Type)>

The graph on the left shows that state of the total number of murders from 1980 to 2014. California is the most murderous state in the United States, with 99,783 cases followed by Texas, New York and Florida.  Look at the graph in the middle where the Crime type is no, California is the state with the highest number of criminal failures followed by New York, Texas and Florida. The graph on the right shows the state with the ratio of  solved crime. North Dakota is the state with a high crime solved rate, followed by Montana and South Dakota, which has a low number of crime.

California has a high crime rate and  63% crime solved rate. Texas has a second-highest crime rate but a 77% crime solved rate, which is higher than New York, which is the third-highest crime rate. New York has a crime not solved rate of 55% and the lowest crime-solved state is District of Columbia with 34%.

Q2. When? 

Murder Data Statistics in the United States
<Number of murders per year>

This graph shows the number of murders divided by year.  As you can see from the graph, 1993 is the year of the highest number of crimes, and since 2000, the slope of the number of crimes has been flat. 

Murder Data Statistics in the United States
<Number of murders per year (solved/not solved)>

Also, look at the number of crimes solved in each year, the number of unsolved cases was the highest in 1993, when the number of crimes occurred the most. The average number of crimes solved is about 70 percent.

Q3. Who?

<Who killed whom (Gender)>

This graph shows the gender distribution of the victim's relationship with the perpetrator. As expected, 65.8% of Male killed male, followed by male killed female, female killed male in third, and female killed female in the last.This graph shows that both perpetrators and victims have more male than female.

<Number of perpetrator by male, female, age>
<Number of victim by male, female, age>

What are the statistics based on the gender and age of the victim and the perpetrator? 

Look at the first graph, Male perpetrators are among the largest in their teens and twenties, and the figure can be seen gradually declining after 40s.  Female perpetrators are also in their teens to twenties and thirties, and after that, the figure decreases, like male moats.

The second graph shows the number of murders according to the victim's gender and age. Statistics based on the victim's gender and age are similar, especially in the number of people killed in infancy.  Male Victim can see a rapid increase in numbers from their teens to their 20s, and Female can see that the average number is similar from 20s to 40s.

Q4. What used?

What kind of weapon do the perpetrators use? 

<Use of perpetrator weapons>

The first graph shows that guns are used the most with an overwhelming number of 64.4%. The Raw data divided the types of guns into Shotgun, Handgun, Rifle and Firearm, which put the data together as "Gun" for statistics. After that, they appear as Knife, blunt object, Unknown, etc.

The second graph shows that guns were used the most in 1993, the year of the most murders. Although the figures have declined since the 2000s, they are still at the top of the list, with a wide gap with other weapons used for murder. 


The question is, wouldn't it be that the preferred weapon depends on the gender of the perpetrator?

<Use or perpetrator weapons (Gender)>

These two graphs represent the proportion of weapons used by the victim’s gender. As shown in the graph previously, the result is the same as the gun is 1st and the knife is 2nd, but the gap in the ratio of gender use is different. Male use guns at 66.8 percent, while female use 44.7%, a difference of about 22% . Also, Male use a 15 percent knife, while female use a knife at 28.3% . This shows that women use more knives than men. 

Use of Weapons

So, what about changes in the use of weapons by year?

<Use or perpetrator weapons by year (Gender)>

According to the year-by-year graph, the use of guns is consistently higher in male than in other weapons. However, in the case of female, the difference between guns and knives was huge in 1980. But with the overall use of weapons reduced, the gap also gradually decreased. Since the 2000s, the rankings of guns and knives haven't changed, but we can see that the gap between the two has declined very much.

Q5. Who killed Whom?

<Who killed whom (Relationship)>

This graph shows the percentage of murders by relationship. Unknown topped the list with 42.8% , followed by Acquaintance with 20.7% and Stranger with 15.1%. Others are family, friend, Husband/Wife, and work. 
Two categories of people you know and people you don't know, and if you look at the ratio, The ratio is 58 percent for those who don't know and 42 percent for those who know. 

What about the detailed versions of segmentation category?

<Number of murders by Reltionship detail>

This graph shows the segmentation of the previous relationship category.  The first, second and third things are the same as the categories, but the fourth is different. In the previous graph, the fourth categories were Family, but in the segmentation category, you can see that Wife is fourth.  The category to which Wife belongs was the Husband/Wife group on the previous graph, which was the sixth most murderous It is the second at the end. 

Murders in Relationships

Then, it is necessary to look at the detailed status of each category.

<Who killed whom? : Wife/Husband>

This graph shows the percentage of murders in the Wife/Husband relationship. Looking at the graph, Wife Killed by Husband tops the list with 59.4%, nearly two-thirds of the total rate. The second is Husband built by Wife, and other things can be checked in the order of Common-low wife and Ex-life. A year-by-year graph shows that the number of murders in the Wife-Husband relationship generally decreases.

Relationship between Employer and Employee
<Who killed whom? : Employer/Employee>

This graph shows the relationship between Employer/Employee. 
Overall total statistics are about 14% different, indicating that Employer killed by Employee is more than Employer. 

look at the year-by-year graph, the deviation between the two is very sharp.
Employer has not always had more murders than Employee. In 1984, 1985, 2006, 2013 and 2014, there were more Employee.

<Who killed whom? : Friend>

This graph is about friendship. Friend Male killed by Male the most, as was the gender distribution of the victim's relationship with the perpetrator. Second, Girlfriend killed by her lover followed Boyfriend killed by his lover, and then Friend Female killed by Male. 

According to the year-by-year graph, there were more Friend Male than Girlfriend Killed by her lover before 1995. After 1995, the results of conflicting figures can be seen.

<Who killed whom? : Family>

This graph shows the percentage of murders in family relationships. First, the family except immediate family is number one with 26.6%. Second place is Son killed by Father, and third place is brother killed by brother. It was Male gender distribution of the victim's relationship with the perpetrator. The fourth to ninth is similar to the 7% range.  

Look at the graph by year, the family, except for the immediate family, is always the number one. The ranking from second to ninth can be seen turn over and over.


#1. California is the state with the highest number of crimes, and because of the large number of crimes, the crime rate is not the lowest. 

#2. 1993 is the year of the highest number of crimes, and since 2000, the slope of the number of crimes has been flat and The crime solved rate is similar as well.

#3. Based on the relationship between the victim and the perpetrator's sex and age, Male kill by male is the largest with 66%. 
The perpetrators are in their 10s and 20s,
Victims are distributed intensively from infancy to teenagers and in their 30s.

#4. The perpetrator use guns the most as weapons, and the proportion of weapons used by gender is different, and female use more Knives than male.

#5. Two categories of people you know and people you don't know, and if you look at the ratio, The ratio is 58 percent for those who don't know and 42 percent for those who know.

Future work

#1. Check whether the age of the perpetrator and the victim are related to each factor by dividing them by the age of 18.

#2. Find new insight by analyzing the results of this project in more detail. 

Thank you for viewing my project!

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI