Data Study on Healthcare Innovation in the Startup World
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Anyone with one foot inside the medical world can tell you that there's a lot of buzz about the future of medicine in big data and genomics. Our improving ability to derive insight from medical data on larger and larger scales is auspicious for the precision of clinical medicine as well as the effectiveness of health policy. And of course, unlocking the treasure trove of information stored away in the human genome has the potential to reshape the delivery of healthcare in a myriad of ways--some predictable, some not.
However, from my brief foray into the medical education system (first year of medical school), it became very clear to me that this type of innovation was certainly not being bred in typical medical school curricula; clinicians will not be leading the charge on advanced analytics in medicine.
Having witnessed a number of friends entering the big data gold rush of other industries, predominantly in the startup sphere, I set out to understand what sort of involvement healthcare start-ups are having in medical innovation--with a particularly curious eye on the big data analytics model. What questions are healthcare startups seeking to answer? What problems are they attempting to solve--and how? Where is this all happening?
And then, could I make generalizations about which types of companies seem to do well? What type of healthcare innovation lends itself well to the startup model?
Using Selenium and BeautifulSoup in Python, I scraped data from the healthcare marketplace page of Angelist.com. The page provides links to detail pages of 400 healthcare startups. From these pages, I scraped company name, location, market tags, the "blurb," and number of employees.
Additionally, I had my scraper navigate to each company's "Activity" page and take everything recorded under the "People" tab. The "People" tab contains dated activity entries of predominantly investors, but also incubators, advisors and more.
With the data in workable form, I was able to begin to tackle the questions I set out to answer. First:
Where are healthcare startups?
Clearly, no other city comes close to San Francisco's count of just under 150 companies (of the 400 scraped). However, the lopsidedness is even more extreme than its first appearance; looking closer at the list, one notices how many of the other top locations are near San Francisco. In fact--Palo Alto, Mountain View, Menlo Park, Silicon Valley, San Mateo, Redwood City, San Carlos, and Oakland are all within a short drive of San Francisco! Binning them all together into the "Bay Area," we see the sheer dominance of that region in the healthcare startup world--almost 200 of 400 entries!
What are healthcare startups?
The data taken from Angelist provided two sources of information regarding this question--market tags and the "blurb."
Looking at the top tags, two categories jump out: mobile/software and personal health/fitness. It appears that a large portion of young healthcare companies are seeking to improve healthcare through the development of mobile apps. And interestingly, personal fitness seems to be a problem that many entrepreneurs feel equipped to take on.
Perhaps the most unexpected finding on this list was the "Elder Care" market tag--especially holding as high a position as it does. Because of our aging population, Geriatrics is a field of medicine in incredibly high demand--yet it consistently attracts a low amount of interest from graduating medical students. It's very interesting to see its strong presence in the medical startup sphere.
"Big data" fell fairly high--and even higher if you combine it with "big data analytics."
NB: Many (if not most) companies list multiple market tags. There is undoubtedly much overlap among these categories.
Considering that "Big Data" and "Big Data Analytics" appeared with high frequency, genomics--medicine's premier big data analytics problem--was notably absent. Aware of the possibility that genomics simply wasn't a market tag, I analyzed the language of each company's "blurb" to see if a focus on genomics would appear.
The blurbs of all the companies were combined into one string in order to view each word by its frequency. Only words greater than 5 letters were included, and some of the top words were removed (health, medical etc.).
The "blurb" analysis appeared very similarly to the market tag analysis. Mobile apps and personal health/fitness are featured at the top once again, and there's no indication of genome analysis. "Big" and "data" were too small to make the cut, but analytics is featured. Interestingly, "cancer" made the list as well. There are a substantial amount of healthcare companies focusing on cancer specifically.
What kinds of healthcare startups do well?
To evaluate how well a company had done, a growth rate metric for each startup (number of employees/age) was calculated from the data. The following plot of market tag by the growth metric was helpful in understanding which kinds of healthcare startups have done well. Since many companies listed multiple market tags, each company's top market tag (its tag with the highest general frequency) was used.
Notably, the order of market tag by growth rate is not the same as market tag by frequency. Mobile, fitness, and personal health companies--all top of the list by count--are usurped by the previously quiet "health and insurance" tag. From this we can see that the most common kinds of healthcare startups are not those with the most growth potential.
It's all happening in the Bay Area.
Health care startups are going mobile.
Health care entrepreneurs think its time to focus on your fitness.
Though healthcare startups are focusing on big data, companies focusing on genome analysis--medicine's Mount Everest of data analytics problems--are rare.
With more time...
Look for patterns in investment activity among particular types of healthcare companies--what kind of startup attracts the most financial support?
Conduct a time series analysis on early investment activity. Can we predict growth and valuation based on early investment and the company profile variables discussed so far?