Seed Accelerators and Social Media: What made VCs Fund These Startups?
Contributed by Shu Liu. Shu is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on his third class project - Web Scraping (due on the 6th week of the program).
You may also explore this project via R, Python Codes and Data on Github.
Introduction:
The success of a startup depends on many factors, such as the founders, funding, and the environment of the industry in which it is established. Startups never stop searching for a chance to improve their probability of success. It’s the same for venture capitalists(VC’s). VC’s work hard to select the best target to invest with to maximize their profit.
Navigating the early part of its existence well is crucial to a startup’s success. A good seed accelerator can provide enough mentorship and funding support for startups. Mentorship help founders clearly understand what they want and what they should focus on. This is why some successful startups are usually born in the same seed accelerator.
After a seed accelerator, VC’s play an important role in helping startups to become stronger. However, it’s difficult for VC’s to know whether a young startup will succeed or fail. It’s common to use the Discounted Cash Flow method to a public company, but this method can’t be applied to a startup. In fact, most startups don’t have clear financial records and formal financial reports. Therefore, Relative Valuation is a choice for evaluating fast-growing startups. A critical part in the Relative Valuation for online companies is finding a related company, and assessing whether the two or similar or not based on the number of users. We can also explore how startups behave on social media to indirectly assess its number of users.
This project focuses on webscraping data from Seed-DB.com and Twitter.com. The first contains data about seed accelerators while the latter serves as the source for social media data of startups.
Data Source:
Extracted from:
Seed-DB.com using Webscraping:
Twitter using Tweepy API:
Variables Selection:
Startup (funding > 1 million):
name, website, number of followers, number of friends, number of statuses, amount of funding, rounds of funding
Corresponding Seed Accelerators:
name, address, established year, website, amount exited, amount funded, number of startups exited, number of startups funded
Initial Analysis:
I first took out the top ten seed accelerators with the most past funding. 'Y Combinator' dominates the feed in this respect. This is partly due to its being older. According to Wikipedia, it is the first seed accelerator. Y Combinator’s creation was followed by TechStars (2006) and Seedcamp (2007). The bar chart to the left cements the importance of age when it comes to seed accelerators.
The top ten startups from companies with a valuation greater than 1 million dollars are ordered by their total amount of funding. Most of them are very popular today. All of them are online companies, which proves the importance of the number of users in the valuation of startups.
The data scraped from twitter contains some interesting insight. Friends_num, statuses_num, and favourites_num are more correlated with each other than with followers_num, but these three variables are less correlated with funding (total amount of funding) than followers_num. This means that followers_num has greater direct influence on how much funding a startup can get. It's really a reasonable projection since the number followers on Twitter depends on how popular the startup is, and people who follow the company’s Twitter account are more likely to be users of its business. However, the other three variables is not direct indexes of how popular the business of the startup is because a startup can write as many statuses as possible on Twitter, even though it has only a few followers.
Further Steps: Multiple Linear Regression
The analyses above serve as a guide on how to apply multiple linear regression to this problem. The initial form of the model lies below.
Dependent variables:
Amount of funding/rounds of funding
Independent Variables:
Seed factor (year, state, amount funded(number of startups funded) ) &
Users factor (number of followers)
However, assumptions such as multicollinearity need to be checked before building an effective regression model. It is also possible that the variance explained by the model might be small due to the variables not having enough predictive power.