Murder Data Statistics in the United States

hyelee lee

Posted on Aug 12, 2019

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

I analyzed murder statistics in the United States from 1980 to 2014. First of all, the reason why I chose the subject about murder was because I enjoyed watching the American crime dramas. There are a lot of dramatic stories, So I was curious if there are any real events like that. When I started this shiny project, I was able to find data on the murders in the United States and have the opportunity to analyze them.

Here is my Shiny Web App and Git Hub .

Data & Question

The data used for this project is "Homicide Reports, from 1980 to2014" uploaded to the Kaggle site. Raw-data consists of 24 columns and about 630,000 data, and the column consists of Agency Info, Crime Solved, year and month of occurrence, victim information, perpetrator information, relationships, weapons, and record sources.

Murder Data Statistics in the United States

I used a total of 9 data columns for analysis and they are State, Year, Victim and perpetrator's info (age, Sex), Crime Solved, Relationship, Weapon.
I draw the columns of data used for each question. Overall data cleansing carried out pre-processing of the data from the perpetrator to 0 years old or the victim to 998 years old. Each cleansing according to each query was carried out, if necessary, and some new columns were created and used to draw graphs.

Data Observation

Q1. Where?

The graph on the left shows that state of the total number of murders from 1980 to 2014. California is the most murderous state in the United States, with 99,783 cases followed by Texas, New York and Florida. Look at the graph in the middle where the Crime type is no, California is the state with the highest number of criminal failures followed by New York, Texas and Florida. The graph on the right shows the state with the ratio of solved crime. North Dakota is the state with a high crime solved rate, followed by Montana and South Dakota, which has a low number of crime.

California has a high crime rate and 63% crime solved rate. Texas has a second-highest crime rate but a 77% crime solved rate, which is higher than New York, which is the third-highest crime rate. New York has a crime not solved rate of 55% and the lowest crime-solved state is District of Columbia with 34%.

Q2. When?

This graph shows the number of murders divided by year. As you can see from the graph, 1993 is the year of the highest number of crimes, and since 2000, the slope of the number of crimes has been flat.

Also, look at the number of crimes solved in each year, the number of unsolved cases was the highest in 1993, when the number of crimes occurred the most. The average number of crimes solved is about 70 percent.

Q3. Who?

스크린샷-2019-08-10-오후-8.44.57 | Data Science Blog — <Who killed whom (Gender)>

This graph shows the gender distribution of the victim's relationship with the perpetrator. As expected, 65.8% of Male killed male, followed by male killed female, female killed male in third, and female killed female in the last.This graph shows that both perpetrators and victims have more male than female.

스크린샷-2019-08-10-오후-8.46.26 | Data Science Blog — <Number of perpetrator by male, female, age>

스크린샷-2019-08-10-오후-8.46.34 | Data Science Blog — <Number of victim by male, female, age>

What are the statistics based on the gender and age of the victim and the perpetrator?

Look at the first graph, Male perpetrators are among the largest in their teens and twenties, and the figure can be seen gradually declining after 40s. Female perpetrators are also in their teens to twenties and thirties, and after that, the figure decreases, like male moats.

The second graph shows the number of murders according to the victim's gender and age. Statistics based on the victim's gender and age are similar, especially in the number of people killed in infancy. Male Victim can see a rapid increase in numbers from their teens to their 20s, and Female can see that the average number is similar from 20s to 40s.

Q4. What used?

What kind of weapon do the perpetrators use?

스크린샷-2019-08-10-오후-8.55.16 | Data Science Blog — <Use of perpetrator weapons>

The first graph shows that guns are used the most with an overwhelming number of 64.4%. The Raw data divided the types of guns into Shotgun, Handgun, Rifle and Firearm, which put the data together as "Gun" for statistics. After that, they appear as Knife, blunt object, Unknown, etc.

The second graph shows that guns were used the most in 1993, the year of the most murders. Although the figures have declined since the 2000s, they are still at the top of the list, with a wide gap with other weapons used for murder.

Gender

The question is, wouldn't it be that the preferred weapon depends on the gender of the perpetrator?

스크린샷-2019-08-10-오후-8.58.23 | Data Science Blog — <Use or perpetrator weapons (Gender)>

These two graphs represent the proportion of weapons used by the victim’s gender. As shown in the graph previously, the result is the same as the gun is 1st and the knife is 2nd, but the gap in the ratio of gender use is different. Male use guns at 66.8 percent, while female use 44.7%, a difference of about 22% . Also, Male use a 15 percent knife, while female use a knife at 28.3% . This shows that women use more knives than men.

Use of Weapons

So, what about changes in the use of weapons by year?

스크린샷-2019-08-10-오후-8.59.46 | Data Science Blog — <Use or perpetrator weapons by year (Gender)>

According to the year-by-year graph, the use of guns is consistently higher in male than in other weapons. However, in the case of female, the difference between guns and knives was huge in 1980. But with the overall use of weapons reduced, the gap also gradually decreased. Since the 2000s, the rankings of guns and knives haven't changed, but we can see that the gap between the two has declined very much.

Q5. Who killed Whom?

스크린샷-2019-08-10-오후-9.05.07 | Data Science Blog — <Who killed whom (Relationship)>

This graph shows the percentage of murders by relationship. Unknown topped the list with 42.8% , followed by Acquaintance with 20.7% and Stranger with 15.1%. Others are family, friend, Husband/Wife, and work.
Two categories of people you know and people you don't know, and if you look at the ratio, The ratio is 58 percent for those who don't know and 42 percent for those who know.

What about the detailed versions of segmentation category?

스크린샷-2019-08-10-오후-10.29.46 | Data Science Blog — <Number of murders by Reltionship detail>

This graph shows the segmentation of the previous relationship category. The first, second and third things are the same as the categories, but the fourth is different. In the previous graph, the fourth categories were Family, but in the segmentation category, you can see that Wife is fourth. The category to which Wife belongs was the Husband/Wife group on the previous graph, which was the sixth most murderous It is the second at the end.

Murders in Relationships

Then, it is necessary to look at the detailed status of each category.

스크린샷-2019-08-10-오후-10.32.05 | Data Science Blog — <Who killed whom? : Wife/Husband>

This graph shows the percentage of murders in the Wife/Husband relationship. Looking at the graph, Wife Killed by Husband tops the list with 59.4%, nearly two-thirds of the total rate. The second is Husband built by Wife, and other things can be checked in the order of Common-low wife and Ex-life. A year-by-year graph shows that the number of murders in the Wife-Husband relationship generally decreases.

Relationship between Employer and Employee

스크린샷-2019-08-10-오후-10.36.05 | Data Science Blog — <Who killed whom? : Employer/Employee>

This graph shows the relationship between Employer/Employee.
Overall total statistics are about 14% different, indicating that Employer killed by Employee is more than Employer.

look at the year-by-year graph, the deviation between the two is very sharp.
Employer has not always had more murders than Employee. In 1984, 1985, 2006, 2013 and 2014, there were more Employee.

스크린샷-2019-08-10-오후-10.41.19 | Data Science Blog — <Who killed whom? : Friend>

This graph is about friendship. Friend Male killed by Male the most, as was the gender distribution of the victim's relationship with the perpetrator. Second, Girlfriend killed by her lover followed Boyfriend killed by his lover, and then Friend Female killed by Male.

According to the year-by-year graph, there were more Friend Male than Girlfriend Killed by her lover before 1995. After 1995, the results of conflicting figures can be seen.

스크린샷-2019-08-10-오후-10.43.53 | Data Science Blog — <Who killed whom? : Family>

This graph shows the percentage of murders in family relationships. First, the family except immediate family is number one with 26.6%. Second place is Son killed by Father, and third place is brother killed by brother. It was Male gender distribution of the victim's relationship with the perpetrator. The fourth to ninth is similar to the 7% range.

Look at the graph by year, the family, except for the immediate family, is always the number one. The ranking from second to ninth can be seen turn over and over.

Summary

#1. California is the state with the highest number of crimes, and because of the large number of crimes, the crime rate is not the lowest.

#2. 1993 is the year of the highest number of crimes, and since 2000, the slope of the number of crimes has been flat and The crime solved rate is similar as well.

#3. Based on the relationship between the victim and the perpetrator's sex and age, Male kill by male is the largest with 66%.
The perpetrators are in their 10s and 20s,
Victims are distributed intensively from infancy to teenagers and in their 30s.

#4. The perpetrator use guns the most as weapons, and the proportion of weapons used by gender is different, and female use more Knives than male.

#5. Two categories of people you know and people you don't know, and if you look at the ratio, The ratio is 58 percent for those who don't know and 42 percent for those who know.

Future work

#1. Check whether the age of the perpetrator and the victim are related to each factor by dividing them by the age of 18.

#2. Find new insight by analyzing the results of this project in more detail.

Thank you for viewing my project!

Murder Data Statistics in the United States

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data & Question