Data Analysis on Covid 19: Flattening the Curve?
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Inspiration and Goals
It goes without saying, that Covid-19 has rocked the world as hard as any event in modern human history, and because of this, I have been trying my best to keep track of the ever evolving situation in our country. During this time, I noticed many of the news networks focussing on the total number of national cases. As the numbers continued to rise, they started to lose meaning, particularly for individuals in states that have a lower case count. Furthermore, in mid-March I left New York City and went to Phoenix, Arizona to stay with my family. I was shocked at the difference in mentalities for people in each of these areas; in NYC, it seemed as though the world was ending, while in Phoenix, it seemed to be just business as usual.
Even though the situation in Arizona rapidly deteriorated since my arrival, noticing these differences in human behavior got me asking questions. Was the situation really that different from place to place? Does the national case count accurately describe what is happening across America? If not, how can I accurately tell what is going on locally?
With these questions in mind, I decided to create a tool that people could use to track Covid-19, more specifically at the state level. The goal of this R Shiny app was three fold:
- Personalization: I wanted the app to be focussed on state and local areas. Users would be able to visualize the coronavirus situation where they are.
- Contextualization: The user would be able to compare between multiple different states. By providing relativity, the user can better understand the scope of the crisis in different regions.
- Simplicity: The app is smooth, functional, and easy to use so that the user can quickly get the information they need.
The Data
For this project I chose to use two datasets operated and updated daily by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The first dataset I used contained state level data on a variety of variables including total case count, mortality rate, and testing rate. I used this data set to construct the interactive US geochart on the first page using GoogleVis. By hovering over the desired state, users can view up to date information on the number of confirmed cases.
The second data set I used contained time series data for every county in the United States going back to January. For the purpose of the project, I filtered the dates to start on March 1, because up till this point, many counties were not reporting data. This data was then manipulated in different ways to construct the remaining visuals within the State Level Analysis tab.
The Features
In addition to the National Overview on the first page, the bulk of the app functionality comes within the State Level Analysis tab. In the first subtab, the user is able to select a state to display the daily new cases. On the graph itself, the underlying bar chart shows the exact number of daily new cases reported each day, while the line shows the 5 day moving average. This feature was implemented due to the large variability in daily case reporting, and in turn creates a smooth representation of the growth trajectory. Additionally, the user has the option to add multiple states to the graph for comparison, while also adjusting the date slider to see how the trajectory has changed over time.
In the second tab, the user is able to select a state, and then view county level data for that state through a density map, which gives even more granular data into what areas are most impacted by coronavirus. One of my favorite features in the density map is the ability to change the date slider, as this really enables you to visualize how the county level situation evolved over time. From a usability standpoint, this can give the user insights into if their county is at risk, or doing a good job stopping the spread. Additionally, it provides warning if for example neighboring counties start to show heightened case counts.
Total Case Count
The two remaining tabs are designed to show total case count on both a linear scale and logarithmic scale. By using these two tabs in conjunction, the user can visualize whether or not the state is “flattening the curve”. Similarly to the Growth tab, users are also able to select multiple states to compare, while also adjusting the data slider to give them the time period they are interested in. At the time of writing this, a few states have started loosening their social distancing policies and opening businesses. Moving forward, I think it will be interesting to use these graphs to monitor whether or not there is a “re-steepening” of the curve in these states.
Going Forward
While the app is functional, there are more features that I would like to add to make the app even more useful. Firstly, I would like to add a “Recent News” tab, that allowed users to input their location, and then the app gave them recent Covid-19 news for their area. This would future help to get the most targeted information to users anywhere in the country. Secondly, I think creating more comparables within the app, to show coronavirus relative to something that people understand, would be helpful in changing human behavior because it would help people understand the data. For example, I could integrate car accident or seasonal flu data, to then compare Covid-19 mortality rate with these in different areas. I think this would open peoples eyes into how dangerous the spread of the virus actually is, and what they can expect within their local communities.