NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > R Shiny > Is there a clear path to success for startups?

Is there a clear path to success for startups?

Bryce Ferraro
Posted on Jan 22, 2024

Introduction

Imagine you're an eager entrepreneur who wants to know what it takes to increase your chances of success when founding a startup. You might ask yourself several questions:

  1. Where and when should I establish a startup?
  2. Which insights from leading industries can be applied to startups?
  3. What are the chances of success vs. failure?

This project explores these inquiries to better inform founders on what decisions to make to increase their chances of success.

Data and Pre-Processing

An informative source for startup data is Crunchbase, a website that aggregates information for startups, established companies, and investors, including what the company does, media postings, funding, and much more. Unfortunately, their website is purposely difficult to scrape to prevent bots from draining data property and server resources, so to circumvent this, used a data table from Kaggle called Startup Success/Fail Dataset from Crunchbase. This was a data-rich table (53583 rows, 14 columns), though it required some feature engineering.

For pre-processing I created and transformed several columns to better visualize the data:

  • Change date columns from string to datetime
  • Create year and month columns for founding year, first funding year, and last funding year
  • Calculate duration between various landmark years (i.e. years between founding and first funding)
  • Normalize total funding raised by rounds
  • Create a country name column from country code
  • Create an industry column with each value a single industry name

Data Limitations

One caveat with this dataset is that there isnโ€™t any information after the year 2015. Consequently, the findings will not be based on the most current trends. Regardless, there are plenty of impactful insights from the nearly 100 years of data.

Analysis:

Where and when to establish a startup?

Based on this dataset, the United States, Great Britain, Canada, India, and China are major global hubs for startups; however, the United States has by far the greatest number of startup companies among the countries. In fact, Great Britain, which is the country with the second most startups, has ten times fewer companies than the United States. The disparity between the US and the rest of the world can be clearly shown in Figure 1, a heat map that shows the count of companies per country.

Figure 1. Heat map of number of startup companies per country

Not only does the US have the most startup companies, it also has the greatest number of startup industries, as shown in Figure 2. Based on this analysis, the USโ€™ exponential rise in startup industries in the 1980s occurred years ahead of the other leading startup nations, showing its legacy of ingenuity and focus on entrepreneurship. This boom in US startup industries could have been the catalyst for the surge in other nations that followed.

Figure 2. Number of startup industries per country over time

Now focusing within the United States, performed a similar analysis but shaped the data by city rather than country (Figure 3.) While New York has consistently had the highest number of startup industries, the US cities with the most startup industries were relatively similar in density until the early 2000s. However, since then, New York and San Francisco saw a significant increase in new industries, distinguishing them as the global hubs for startups.

Figure 3. Number of startup industries per US city over time

In trying to understand if certain times of year are more or less ideal to start a company, I analyzed the number of startup companies that received their first round of funding during each month, as shown in Figure 4. Based on these results, most months have a relatively similar spread in the number of companies that received first-round funding. There are, however, two months with standout distributions:

  • December has a lower 75% quantile
  • January has a higher 25% quantile/median 

The hypothesis for why there are differences in these particular months is that investors and companies likely shut down operations during December due to the holidays, reducing the number of deals. Therefore, those discussions are pushed to the new year, increasing the number of funding rounds in January.

Figure 4. Distribution of the number of companies that received their first funding per month

Assuming one has the privilege to start a company anywhere in the world, this analysis suggests that the ideal location is within the United States and specifically, in New York or San Francisco. This recommendation is based on the assumption that being in an area with a large number of startup companies and industries would offer unparalleled access to networking and talent pool. In terms of what time of year to plan to raise the first round of funding, there isnโ€™t an optimal month to do so.

What are the insights from leading industries?

Based on this data, Software, Mobile, Biotechnology, E-Commerce, and Social Media industries have the greatest number of startups. Therefore, the subsequent analysis will be focused around their trends.

Figure 5. Top 15 industries with the highest number of startups

An aspiring founder would be interested in understanding the average value for a first-funding round to bolster negotiations with investors and also anticipate resource allocation post funding closure. In Figure 6, assessed how the average value of first-funding rounds have changed over time and observed a significant increase in the mean around 2000, ranging from 12-27 million dollars. However, in 2005, there was a downward shift in the mean that remained relatively constant for the next decade. Based on the last decade in this dataset, a founder should expect around 3-15 million dollars for a first round of funding.

Figure 6. Average value of first-round funding per year within top 5 industries

Interestingly, when investigating the total sum of first-round funding received per year, the total amount of capital raised per year continued to increase despite the downward shift in average fundraise size (Figure 7.) In other words, venture capital firms and investors are spending less per deal, but as a whole, more money is being raised for the startup industry. Therefore, is this relationship caused by an increase in the number of investors or deals?

Figure 7. Total sum of first-round funding per year within top 5 industries

To answer this question, I investigated the total number of companies that received their first-funding round per year, as shown in Figure 8. Based on this analysis, itโ€™s clear there was a significant increase in the number of first-funding rounds between 2005-2014, which would explain the increase in total startup spending despite the decrease in average fundraise size.

Interestingly, there was a decrease in the number of startups receiving first round funding between 2013-2014 and then a negative slope between 2014-2015. An initial thought was that this dip was due to poor data integrity. However, it was confirmed that the dataset had these results through the entire year of 2015, indicating a real downturn. That raises the question: what is the cause?

Figure 8. Total count of number of companies that received first round of funding within top 5 industries

The hypothesis is that the downward shift in companies receiving first round funding is due to a decrease in the number of founded companies two years prior. When overlaying the number of companies that were founded and first funding per year, itโ€™s clear that both metrics observed an increase during the first decade of the 2000s (Figure 9.) However, in 2011-2012, there was a tipping point in the number of companies founded per year.

Focusing on years where the number of companies founded and received first funding peaked (2012-2014) revealed that there is a difference of around two years between both metricsโ€™ high points. In fact, this delta of two years correlates well with historical data, in which the average number of years between founding and first-funding is 2.67 years (Table 1). Therefore, the decrease in the number of companies receiving first-funding from 2014-2015 is due to fewer companies being founded after 2012.

Figure 9. Number of founded and first-funded companies per year

Table 1. Summary statistics for durations between founding vs. first-funding for startups

When gaining insights from the leading startup industries (Software, Mobile, Biotech, E-Commerce, and Social Media), I learned that the average value of first-round funding had decreased from the early 2000s, landing between 3-15 million dollars. However, although the average value of fundraising has decreased, more companies are receiving first funding and the average time between founding and first funding is 2.67 years. All of this information would be extremely impactful for the founder's planning and resource allocation (i.e. managing burn rate, establishing hiring plan, asset management, etc.).

What are the chances of success vs. failure?

This dataset categorizes each company status by either โ€œoperatingโ€, โ€œacquiredโ€, โ€œIPOโ€, or โ€œclosedโ€. In this project, the metric for a successful company is one that is acquired or IPO, and a failed company is one that is closed. While not all founders aim to launch an IPO or be acquired, this analysis assumes that founders are interested in making a return on their shares.

After calculating the frequency for each company status, itโ€™s clear that chances of success are quite slim (Figure 10.) The majority of companies are operating (79.2%) and IPO is the least common status, having a frequency of only 2.69%. The frequency of being acquired vs. closed are similar, ranging from 8.94-9.17%, respectively.

Figure 10. Frequency of each company status (acquired, closed, IPO, and operating)

Interestingly, even when calculating the status frequencies for the five biggest startup industries (Software, Mobile, Biotechnology, E-Commerce, and Mobile), similar results are observed (Figure 11.) Therefore, founding a startup within the โ€œhottestโ€ industries alone does not change chances of success or failure. 

Figure 11. Frequency of each status (acquired, closed, IPO, and operating) for the top 5 industries

During this study, investigated if there was a relationship between successful/failed companies and the total amount of funding raised (Figure 12.) The hope was that the amount of funding raised could be a Key Performance Indicator (KPI) and a benchmark for how the startup was performing compared to historically successful/failed companies. During this process, successful companies (acquired, IPO) have higher median than other statuses. However, there are many closed companies that raised more money than IPOs or acquired companies and visa-versa. Therefore, thereโ€™s not a clear relationship between success and total funding.

Figure 12. Distribution of total funding raised by each company status

Another KPI that could be potentially used as a performance benchmark is the number of funding rounds for successful vs. failed companies. Therefore, plotted the distributions for the number of funding rounds for each company status (Figure 13.) This shows that successful companies have higher medians than other statuses. However, again, there are many closed companies that have more funding rounds than IPO or acquired companies and visa-versa. Therefore, thereโ€™s not a clear relationship between success and number of funding rounds.

Figure 13. Distribution of number of funding rounds by each company status

Finally, in exploring the behavior of successful vs. failed companies, studied the relationship between the number of funding rounds vs. capital raised (Figure 14.) This could help founders anticipate the  number of rounds needed to raise desired capital and be a point of reference for their rounds' performance. As expected, there is a positive, linear relationship between the number of funding rounds vs. total funding raised. However, there is significant variation, suggesting that using status and number of funding rounds alone will not accurately predict total funding raised. It is clear that IPO companies have a similar slope to the other statuses, though the higher y-intercept suggests that IPO companies generally raise more money per round.

Figure 14. Relationship between number of funding rounds vs. total funding raised, colored by the various company statuses

During this analysis, I learned that the chances of success are 2.67% for IPO and 8.94% to get acquired, while chances of failure is 9.17%. Also, the relationship between status and total funding raised or number of funding rounds showed that successful companies had higher medians for both metric, though there was significant variation. Overall, there wasn't a clear path of success using the features in this dataset.

Conclusions and Next Steps

Although there clearly isn't a clear path of success for any startup, this analysis synthesized a powerful data package with actionable, data-driven takeaways and invaluable benchmarks for steering company-wide planning (Figure 15). Whether itโ€™s a better understanding of where to build a company or how much time it takes to achieve the first round of funding after founding, these insights are extremely useful guides for a potential startup founder. 

Figure 15. Data package from this analysis

In the future, this analysis could be expanded upon by:

  • Applying inferential statistics
  • Applying classification machine learning to predict outcome (i.e. IPO, closure, operating, acquired)
  • Using an updated dataset to gain insights from the last decade of startup trends
  • Exploring why there the number of founded startups around 2011-2012 declined

References:

Github link

Feature image

About Author

Bryce Ferraro

My work experience in biotechnology is diverse, ranging from R&D, process development, and scale-up, to overseeing data management within fermentation, protein purification, and high throughput screening. Having worked in two different early-stage startups, my roles and priorities evolved...
View all posts by Bryce Ferraro >

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application