Understanding Healthcare Innovation in the Startup World

William Bartlett
Posted on Aug 22, 2016

Anyone with one foot inside the medical world can tell you that there's a lot of buzz about the future of medicine in big data and genomics.  Our improving ability to derive insight from medical data on larger and larger scales is auspicious for the precision of clinical medicine as well as the effectiveness of health policy.  And of course, unlocking the treasure trove of information stored away in the human genome has the potential to reshape the delivery of healthcare in a myriad of ways--some predictable, some not.  However, from my brief foray into the medical education system (first year of medical school), it became very clear to me that this type of innovation was certainly not being bred in typical medical school curricula; clinicians will not be leading the charge on advanced analytics in medicine.  Having witnessed a number of friends entering the big data gold rush of other industries, predominantly in the startup sphere, I set out to understand what sort of involvement healthcare start-ups are having in medical innovation--with a particularly curious eye on the big data analytics model.  What questions are healthcare startups seeking to answer?  What problems are they attempting to solve--and how?  Where is this all happening?

And then, could I make generalizations about which types of companies seem to do well?  What type of healthcare innovation lends itself well to the startup model?



Using Selenium and BeautifulSoup in Python, I scraped data from the healthcare marketplace page of Angelist.com.  The page provides links to detail pages of 400 healthcare startups.  From these pages, I scraped company name, location, market tags, the "blurb," and number of employees.

Screen Shot 2016-08-21 at 7.25.13 PM

Above one can see that each company lists a "blurb" under its name, as well as its location, several market tags, and a number of employees range.

Additionally, I had my scraper navigate to each company's "Activity" page and take everything recorded under the "People" tab.  The "People" tab contains dated activity entries of predominantly investors, but also incubators, advisors and more.  

With the data in workable form, I was able to begin to tackle the questions I set out to answer. First:

Where are healthcare startups?  

Plotting the frequency of each location on a horizontal bar-chart gives the following:location

Clearly, no other city comes close to San Francisco's count of just under 150 companies (of the 400 scraped).  However, the lopsidedness is even more extreme than its first appearance; looking closer at the list, one notices how many of the other top locations are near San Francisco.  In fact--Palo Alto, Mountain View, Menlo Park, Silicon Valley, San Mateo, Redwood City, San Carlos, and Oakland are all within a short drive of San Francisco!  Binning them all together into the "Bay Area," we see the sheer dominance of that region in the healthcare startup world--almost 200 of 400 entries!


Next Question:

What are healthcare startups?

The data taken from Angelist provided two sources of information regarding this question--market tags and the "blurb."


Removing the top 3 uninformative market tags (health, healthcare system, and medicine), the most frequent market tags among healthcare startups listed on Angelist are as follows:Freq_market_tags

Looking at the top tags, two categories jump out: mobile/software and personal health/fitness.  It appears that a large portion of young healthcare companies are seeking to improve healthcare through the development of mobile apps.  And interestingly, personal fitness seems to be a problem that many entrepreneurs feel equipped to take on.

Perhaps the most unexpected finding on this list was the "Elder Care" market tag--especially holding as high a position as it does.   Because of our aging population, Geriatrics is a field of medicine in incredibly high demand--yet it consistently attracts a low amount of interest from graduating medical students.  It's very interesting to see its strong presence in the medical startup sphere.

"Big data" fell fairly high--and even higher if you combine it with "big data analytics."

NB: Many (if not most) companies list multiple market tags.  There is undoubtedly much overlap among these categories.

Considering that "Big Data" and "Big Data Analytics" appeared with high frequency, genomics--medicine's premier big data analytics problem--was notably absent.  Aware of the possibility that genomics simply wasn't a market tag, I analyzed the language of each company's "blurb" to see if a focus on genomics would appear.


The blurbs of all the companies were combined into one string in order to view each word by its frequency.  Only words greater than 5 letters were included, and some of the top words were removed (health, medical etc.).


The "blurb" analysis appeared very similarly to the market tag analysis.  Mobile apps and personal health/fitness are featured at the top once again, and there's no indication of genome analysis.  "Big" and "data" were too small to make the cut, but analytics is featured.  Interestingly, "cancer" made the list as well.  There are a substantial amount of healthcare companies focusing on cancer specifically.


What kinds of healthcare startups do well?

To evaluate how well a company had done, a growth rate metric for each startup (number of employees/age) was calculated from the data.  The following plot of market tag by the growth metric was helpful in understanding which kinds of healthcare startups have done well.  Since many companies listed multiple market tags, each company's top market tag (its tag with the highest general frequency) was used.


The growth rate is in employees added per month.

Notably, the order of market tag by growth rate is not the same as market tag by frequency.  Mobile, fitness, and personal health companies--all top of the list by count--are usurped by the previously quiet "health and insurance" tag.   From this we can see that the most common kinds of healthcare startups are not those with the most growth potential.


It's all happening in the Bay Area.

Health care startups are going mobile.

Health care entrepreneurs think its time to focus on your fitness.

Elder care--finally!

Though healthcare startups are focusing on big data, companies focusing on genome analysis--medicine's Mount Everest of data analytics problems--are rare.

With more time...

Look for patterns in investment activity among particular types of healthcare companies--what kind of startup attracts the most financial support?

Conduct a time series analysis on early investment activity.  Can we predict growth and valuation based on early investment and the company profile variables discussed so far?






About Author

William Bartlett

William Bartlett

Will Bartlett is a History of Science and Medicine Major from Yale University who recently took a leave of absence from medical school to explore data science. As an undergraduate, he studied the role of data in medicine...
View all posts by William Bartlett >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp