Data Study on Tornado Damage Since 1996

Posted on Feb 5, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction - Data Set Overview

Data shows tornadoes occur all across the US almost all year around. Some states, such as Alabama and Oklahoma, see them frequently, while others like Massachusetts see only a couple every year. Because of this different states have different levels of preparedness for tornadoes.


To analyze this, I started with a data set from the Storm Prediction Center (SPC), consisting of tornadoes across the United States from 1950 to 2015. The set consisted of over 60,000 rows of storms with twenty-two columns. These columns included (but were not limited to) attributes such as date and time, state, starting and ending coordinates, injuries, fatalities, and property loss.

However, not all of this data proved to be useful. According to the set description, prior to 1996 property loss was recorded on a 1 - 9 scale. Each step in the scale represented a range. The digit 2 represented $50-$500, 3 represented $500-$5,000, and so on in powers of 10. Β From 1996 onward, it was listed in millions of dollars. To get better comparisons, the set was limited to just these storms after 1996.

A second filter of the data was also added so that only tornadoes in the continental US were considered. This was largely so that the country could be viewed as a whole more easily, as in the map below. This led to a reduction of only thirty-three storms between 1996 and 2015.

Data Study on Tornado Damage Since 1996

Tornado Occurrence Map

One interesting thing to note from this map is that there are very few, although not no, tornadoes west of the Rocky Mountains, most are localized in the Great Plains. Viewing the country at this distance, however, does not paint as interesting a picture as at the state-level. This is the level at which the Shiny application focuses.

The Shiny Data Application

The main purpose of this application is to allow users to see how different states are effected by tornadoes. It is broken up in to three sections: Map, State Comparison, and State at a Glance. The user can select these on the left side of the application. The left panel also allows the user to select the state they wish to focus on, and change options such as grouping data by year or storm, and including zero-valued data. These options will be covered more when they are relevant.


Data Study on Tornado Damage Since 1996

Map of Storm Paths in Arkansas

The map feature of the application shows the path taken by all tornadoes within the selected ranges in the map box, for the state selected in the left panel. The map box allows users to filter by severity, on the Fujita scale, as well as by the year the tornado took place. The default values here are all severity levels, and the years 2005-2015.

Not all tornadoes in the data set for these selections will show up. These were tornadoes where the data appeared to be in error. Tornadoes had to pass two tests to show up in these maps. First, the coordinates had to be between the latitudes of 10Β° and 60Β°, and the longitudes of -130Β° and -50Β°. Second, the Haversine distance based on starting and ending locations had to be within twenty percent of the recorded distance traveled. This filtered out some of the storms, but not so many that the map was not useful.

State Comparison Data

Data Study on Tornado Damage Since 1996

State Comparison of Alabama, Grouped by Year

One interesting way to see how tornadoes effect a state is to see how the damage the state endures both financially and with through casualties compares to other states. This comparison is shown by the state comparison page, which itself is broken up into three sections: Average Financial Loss Bar Plot, Average Casualties Bar Plot, Position Scatter Plot. Here casualties is defined as both injuries and fatalities.Β Also of note is that in both cases, for financial loss and casualties, NA values were always removed, this is similar to how apparently incorrect data was filtered out from the map.

Flexible Data/Panels

This section makes use of the options on the left-hand panel. First the user can change whether or not they include zero-values for attributes. By default, cases where there was no property loss and cases where there were no casualties were both left out. These were left out by default as they would artificially bring down the averages and it was more important to see what actual damage (either financial or casualties) was being done by the storms.

The options panel also allows the user to group by year, or group by storm. This allows for aggregating for the average total damage over all storms over all years (ie. sum up all storms per year, and then average over all those values), or simply over all storms (ie. average damage done per storm). By default this is set to per year, so the user can compare it to yearly state budgets or similar expenditures.

For ease of use, the selected state is always highlighted in the graphs. Further, in the bar plots, the bars shown are those four above and four below the selected state. In edge cases, more bars are shown on the unconstrained side. Alabama, for example, is all the way at the right, so more bars are shown to the left.


By using these graphs, several interesting things can be seen. For both financial loss and casualties per year, Alabama, Oklahoma, and Missouri see the most damage. Texas is fourth for financial loss, but drops rather far down when looking at the casualties, where it is replaced, somewhat unexpectedly by Massachusetts, which isn't even in the top 10 of damage per year. If grouping is changed to per storm instead of per year, some possible explanations start to show up.


State Comparison of Massachusetts, Grouped by Storm

As can be seen in the graphs, Massachusetts ends up shooting to the top of the graphs when look at states on a per-storm basis. Looking at the average casualties per storm, Massachusetts even has far more casualties than any other at 102, more than four times more than the next highest, Alabama, with only 25.

The final chart on this page shows the relative positions of each state from the graphs above, along with a trend line showing, as could be expected, that as financial damage increases, so does the number of casualties. However some see more financial damage, while others see more casualties. There is more scatter here when grouping per storm (left) as opposed to per year (right). In the charts below, Massachusetts is highlighted.



Grouped by Storm (Left) and Year (Right)

State at a Glance



Massachusetts at a Glance

The final page in the application is the State at a Glance page. This allows the user to see more state-specific data. Along the top, information about the average storms per year, average casualties per year, and average financial loss per year are available. The later two can be switched to be per storm using the options on the left panel. As with the previous page, NA values are dropped, and zero-values are omitted by default but can be added in the options.

In the picture above, Massachusetts is shown to have only two storms a year on average. Compare this to other states, such as Alabama, which has 53. Note that in the graph Β above, the chart is in standard scale, while the chart below is set to log-log. This is because the lower-severity storms overlap and become harder to read.



Alabama at a Glance, Log-Log Scale

Of note is that Alabama sees more tornadoes, and more severe tornadoes, but has fewer casualties per storm, and less financial loss per storm. The graphs allow us to dig down deeper and see that only the F4s and F5s in Alabama are more damaging than the F3s in Massachusetts, which are that state's most damaging. It is important to look at the per-storm damage here as it shows more about how each storm effects the state.

Most likely this effects we see in Massachusetts are the result of the state's preparedness rather than how severe the storms are. Since states like Massachusetts rarely see tornadoes, they are less prepared for them, especially the citizens. States that see many storms are hurt only by the worst and most severe storms.

Further Work

By grouping data by storms rather than by year, interesting trends start to emerge. States that see little damage per year, see great damage per storm. It is clear that different states can handle single storms greatly. This is where it would be worthwhile to dive deeper. For example, it would be interesting to look more in to the frequency that different states see different severities of storms.

This could also be followed up by looking at how much states spent during that time on storm preparedness. Are states that see less severe storms less prepared for any level of storm? Are there states that see roughly the same severity of storm, with the same frequency, but spend different amounts? Do they take different amounts of damage? By looking more into how each states fares on a strictly by-storm basis, these and many other questions could be answered.

The original data set can be found on Kaggle, with a description available here.

About Author

William Best

Over the years I have held several different programming roles, and the projects that interested me the most were the data-intensive ones. I received a BS and BE from NYU and Stevens respectively, and did my MEng at...
View all posts by William Best >

Related Articles

Leave a Comment

Good Friday wishes March 6, 2017
Woah! I'm really digging the template/theme of this website. It's simple, yet effective. A lot of times it's hard to get that "perfect balance" between usability and visual appearance. I must say you've done a great job with this. Also, the blog loads extremely fast for me on Internet explorer. Superb Blog!
gate result with dob February 15, 2017
You've made some really good points there. I looked on the internet for additional information about the issue and found most individuals will go along with your views on this site.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI