Analyzing Data to Track the COVID-19

Posted on Feb 19, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

A new coronavirus, SARS-CoV-2,  emerged in Wuhan, China in late 2019.  With more cases popping up every day, there was major concern that the coronavirus would evolve into a deadly pandemic. I developed an R shiny app in January 2020 to track COVID-19 cases and compare patient outcomes to previous viral outbreaks.  This app updates daily with the most recent numbers and data from the CSSE group at Johns Hopkins University.

What's in a name?

Coronavirus. 2019-nCoV. SARS-CoV-2. COVID-19. Many names and acronyms are being used to talk about this virus which can be very confusing. This is partly because it took some time for scientists to agree on what to call this brand new virus. The other reason for these names is that they refer to things other than the virus itself. Here’s a little primer on these biological terms.

Coronavirus is a group or family of similar viruses. This family gets its name from their shape which resembles a crown, or “corona” in Latin. Several different viruses are in the coronavirus family, some of which are relatively harmless. 

SARS-CoV-2 is the actual virus that is currently infecting people across the globe. This virus is a newly discovered member of the coronavirus family. Currently, scientists think SARS-CoV-2 spreads through liquid droplets, spread by close human contact. Washing your hands for 20 seconds and practicing social distancing are the best current methods to prevent the spread of this virus.​​​​

COVID-19, or coronavirus disease 2019, is the respiratory disease that a person gets after being infected by SARS-CoV-2. If you get infected with the virus, you may not necessarily get COVID-19. Many infected people do not show symptoms but can still spread the virus. This is why social distancing is so important! You could infect others even if you feel perfectly fine!

Where is COVID-19 now?

Analyzing Data to Track the COVID-19
Global SARS-CoV-2 cases as of April 5th, 2020

The virus first infected people in the Hubei province in China in late 2019. Cases were largely restricted to China until late February 2020. Despite having months to prepare for the spread of SARS-CoV-2, countries around the world failed to contain the spread of the virus and it evolved into a pandemic in March 2020. Infections often follow a sigmoid curve where there is a phase of exponential growth followed a tapering off of infections. Currently, we are still in the exponential phase. If the number of daily cases starts to decrease, we will know that we are starting to contain the virus.

Current hotspots for COVID-19 include Spain, Italy, and the United States which each have over 100,000 cases. Germany also has a high number of cases but their mortality rate is quite low, around 1.5%. Other countries like South Korea, have had remarkable success in containing the virus.

Data on COVID Numbers

Analyzing Data to Track the COVID-19
Cumulative and Daily Cases Numbers Across the Globe

As of April 5th, 2020, 1.2 million people have been infected with SARS-CoV-2 and nearly 70,000 people have died. The apparent mortality rate is about 5% but this number is likely to be overestimated. Because of a lack of testing resources, the true number of infected people is very likely to be much higher. Professional epidemiologists estimate the like mortality rate to be around 1%.

Data Comparison with Other Viruses

COVID-19 has both infected and killed more people than the 2003 SARS epidemic. However the mortality rate for SARS was around 10%, at least twice that of COVID-19 and likely 10 times more using more rigorous estimates. These viruses are both in the coronavirus family, but you can see they impact people very differently.

How about this year's flu? This year's flu has infected about 31 million people, or 30 times the number people with COVID-19. However, COVID-19 has killed more people than the flu and has a much higher mortality rate. The flu's mortality rate is only 0.1% while COVID-19 ranges from 1-5%. Both of these diseases are serious health threats. Neither should be discounted in terms of the cost of human life.

Analyzing Data to Track the COVID-19

The 1918 Influenza Pandemic infected an estimated 500 million people and killed about 50 million, or 10%. The COVID-19 numbers are lower than this previous pandemic, but we should be cautious as the numbers are rapidly accelerating every day. The WHO has officially declared COVID-19 a new pandemic and we must do what we can to "flatten the curve" and lower the number of cases.

Lastly, how does COVID-19 compare to Ebola, one of the scariest viruses known to man? COVID-19 has definitely infected and killed more people than the Ebola outbreak of 2014, but the mortality rate here is perhaps the most important. Ebola has a mortality rate of 62% meaning that 3 out of every 5 people infected with Ebola died. COVID-19 is far less fatal than Ebola which is somewhat reassuring.

About Author

Josefa Sullivan

Josefa has a PhD in Neuroscience from the Icahn School of Medicine at Mount Sinai and a BA in Biochemistry & Molecular Biology from Boston University. Her interests include applying data science to the healthcare & biotech fields,...
View all posts by Josefa Sullivan >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI