A Study of Kickstarter Projects

Posted on Jul 30, 2018

My main goal for this project was to look for a category-independent trend of raised money for Kickstarter projects. Unfortunately, the only category-independent data I had to look at was time dependent. In analyzing just time dependent data, there is no correlation between length of open projects and percentage of money raised.

My dataset described almost 375,000 kickstarter projects from 2008 to 2018. In the following plots, you can see the distribution of Kickstarter projects in each category and how many backers each category has.

You can see that the most popular categories are Film & Video and Music. You can also see that while Film & Video and Music have the most number of projects, Games has the highest number of backers. This incongruence would mean that Games have a much larger number of backers, making it one of the most popular Kickstarter categories on the site. We can also look at each category's project success rates in the following plots:

Very few categories have more successful projects than failed, like Music, Comics, and Theater. Technology has the lowest project success to project failure ratio.

Now, for the category-independent data, I looked at how long the projects were open and how successful they were. I didn't find a time related trend here but with some extra inspection, I was able to explain why some of the summary plots looked the way they did. First, you can look at the time sensitive data for Art for projects that were open between 0 and 80 days.

It looks like there are more successful projects from the 15 to 30 day range. If you look at the y-scale, you can see that we are only looking at success rates from 0 to 200%. If you zoom out, like in this Games plot, the data looks like it has a trend with a peak around 18 days.

If you look closely, there is line going through the bottom data points. This line is the median of the projects' successes per open project times. This line shows us that the data points we are seeing spread around this plot are really far from the data's actual average. Upon a closer look, you'll see that this line is very horizontal with a few days that are anomalies.

The fact that this line stays very horizontal is indicative of the fact that there are so many projects that have raised 0% of their goals. Time frames like 21 days (3 weeks) and 30 days (1 month) are the most popular date ranges. They are an easy amount of time for people to choose and having so many projects with most of them failing (as we saw in the plots above) pull down the average to zero. Even in using box and whisker plots, which I took off the app as they didn't show any useful information, there was nothing really to glean from this data when the average gets pulled down so significantly. Maybe if these plots are made with only successful projects, we would see a trend. (I will be doing this some time later this week just for my own interest and the blog will be adjusted as such. )

In conclusion, more than anything about the trends of kickstarter projects, I think this project shows how important it is to analyze your data to make sure they are accurately represented in the graphs. Looking at different graphs and looking at the same graphs with different scales result in different conclusions to be made. But as far as kickstarter goes, it seems Games have the widest and most enthusiastic audience, as Games also has the highest percentage of the goal reached at 40,000% of the goal raised for Energy Hook. Although this too is not honest representation as the goal requested was $1. The actual most money gained for a project was a smartwatch in the Design category at $20,338,986 in 2015.

You can find my code on my GitHub account here: https://github.com/susarip/test/blob/master/app.R. For the data set I used, you can find it here on Kaggle: https://www.kaggle.com/kemical/kickstarter-projects#ks-projects-201801.csv

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp