A Brief Look at Gun Violence

Michael Dollar
Posted on Jul 1, 2019

Gun violence is a polarizing subject in our country.  It is impossible to traverse social pathways without being expected to submit an opinion at some point.  Having been in many conversations about various shootings over the years, I can usually tell where the conversation is headed:  a seemingly endless back-and-forth between conversationalists about the 2nd Amendment, gun regulations, mental health, race, and, of course, getting rid of guns altogether.

I have often wondered what the effects of gun regulations really have on gun violence in our country, and I’ve been curious about the demographics of the shooters who are involved.  Do they tend to be from poorer areas of the United States?  Are the majority of them of a particular race?  Is gun violence more prevalent in states whose majority voted for Donald Trump?  Are most of them actually mentally ill, as the media would have you believe?

With the work I put into this application, I was able to get a better idea of how to find answers to some of these questions.  You can see the full visualization here.

Origin of the Data Set

There are two main data sets used in this presentation:  General Gun Violence and Mass Shootings.  The general gun violence data set used consists of aggregated values from 2016 grouped by state that include number of incidents per population, number of victims, median household income, number of regulations, and voting percentages of Hillary Clinton and Donald Trump.  It was constructed from large set found on Kaggle, and the rest was taken from census data and Wikipedia.  The mass shootings data mainly comes from Kaggle and goes back to 1966, but I only focus on 5 consecutive years from 2013 to 2017.  This decision was made for simplicity.


The first tab successfully illustrates the number of gun regulations and incidents by state.  The user is able to switch between the two and see that states with a high number of regulations show a lower number of incidents per capita.



The correlation tab allows the user to visualize the correlation between different variables in the data set.  The options were originally meant to be number of gun regulations, number of violent incidents, and median household income by state.  However, I decided to add  the vote percentage per state for both Hillary Clinton and Donald Trump for 2016 just to see if anything interesting showed up.  Generally, I think it can be seen that there is a negative correlation with gun regulation and number of incidents, and you can certainly see a negative correlation with the median household income and vote percentage for Donald Trump.

Mass Shooting Demographics

While looking through different data sets, I found one that focused on mass shootings going back to 1966.  The United States' Congressional Research Service acknowledges that the definition of a mass shooting will vary.  In this data set, the minimum number of victims to qualify is three.  Variables in this set included race, gender, and 'mental health issues.'  I saw an opportunity to allow the user to compare five different years to see how the demographics changed.  I chose to focus on a recent five-year range for simplicity, and also because there were holes in the dates.  As expected, most mass shootings in that time have been perpetrated by one or two males.   Also, as one might expect, the main two races behind mass shootings have been either white or black. 

Mental health is a widely acknowledged factor in the discussions involving mass shootings.  The mental health charts are constructed from a variable with three possibilities: yes, no, and unclear.  In order to analyze the results from the mental health charts, however, one would need to know how these labels were assigned.  The data set is lacking that information.

Data Tables

The data page allows the user to peruse the data sets used in the construction of the project.  The mass shooting data set is particularly interesting, because it includes a summary for each mass shooting.  Each table is searchable in more than one way, and you are encouraged to check them out.

What's Next?

Clearly, there is a lot of work that can be done with this data set in order to further investigate some of the original questions that led me to researching this topic.  It would be interesting to group the larger, more general gun violence data set by city, county, age group, or even congressional voting district.  To have done this would have required more time spent cleaning multiple data sets to match statistics.

Thank you for taking the time to read my post.  I hope you found the shiny app to be at least an inspiration to make further investigations.  If you have any questions or suggestions, please don't hesitate to reach out at:


[email protected]


About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp