A Study of Kickstarter Projects

Posted on Jul 30, 2018


The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

My main goal for this project was to look for a category-independent trend of raised money for Kickstarter projects. Unfortunately, the only category-independent data I had to look at was time dependent. In analyzing just time dependent data, there is no correlation between length of open projects and percentage of money raised.

My dataset described almost 375,000 kickstarter projects from 2008 to 2018. In the following plots, you can see the distribution of Kickstarter projects in each category and how many backers each category has.

You can see that the most popular categories are Film & Video and Music. Also, you can see that while Film & Video and Music have the most number of projects, Games has the highest number of backers. This incongruence would mean that Games have a much larger number of backers, making it one of the most popular Kickstarter categories on the site. We can also look at each category's project success rates in the following plots:


Very few categories have more successful projects than failed, like Music, Comics, and Theater. Technology has the lowest project success to project failure ratio.

Now, for the category-independent data, I looked at how long the projects were open and how successful they were. I didn't find a time related trend here but with some extra inspection, I was able to explain why some of the summary plots looked the way they did. First, you can look at the time sensitive data for Art for projects that were open between 0 and 80 days.

It looks like there are more successful projects from the 15 to 30 day range. If you look at the y-scale, you can see that we are only looking at success rates from 0 to 200%. When you zoom out, like in this Games plot, the data looks like it has a trend with a peak around 18 days.


If you look closely, there is line going through the bottom data points. This line is the median of the projects' successes per open project times. This line shows us that the data points we are seeing spread around this plot are really far from the data's actual average. Upon a closer look, you'll see that this line is very horizontal with a few days that are anomalies.

The fact that this line stays very horizontal is indicative of the fact that there are so many projects that have raised 0% of their goals. Time frames like 21 days (3 weeks) and 30 days (1 month) are the most popular date ranges. They are an easy amount of time for people to choose and having so many projects with most of them failing (as we saw in the plots above) pull down the average to zero. Even in using box and whisker plots, which I took off the app as they didn't show any useful information, there was nothing really to glean from this data when the average gets pulled down so significantly. Maybe if these plots are made with only successful projects, we would see a trend. (I will be doing this some time later this week just for my own interest and the blog will be adjusted as such. )

In conclusion, more than anything about the trends of kickstarter projects, I think this project shows how important it is to analyze your data to make sure they are accurately represented in the graphs. Looking at different graphs and looking at the same graphs with different scales result in different conclusions to be made. But as far as kickstarter goes, it seems Games have the widest and most enthusiastic audience, as Games also has the highest percentage of the goal reached at 40,000% of the goal raised for Energy Hook. Although this too is not honest representation as the goal requested was $1. The actual most money gained for a project was a smartwatch in the Design category at $20,338,986 in 2015.

You can find my code on my GitHub account here:Β https://github.com/susarip/test/blob/master/app.R.Β For the data set I used, you can find it here on Kaggle:Β https://www.kaggle.com/kemical/kickstarter-projects#ks-projects-201801.csv

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI