A Study of Kickstarter Projects
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
My main goal for this project was to look for a category-independent trend of raised money for Kickstarter projects. Unfortunately, the only category-independent data I had to look at was time dependent. In analyzing just time dependent data, there is no correlation between length of open projects and percentage of money raised.
My dataset described almost 375,000 kickstarter projects from 2008 to 2018. In the following plots, you can see the distribution of Kickstarter projects in each category and how many backers each category has.
You can see that the most popular categories are Film & Video and Music. Also, you can see that while Film & Video and Music have the most number of projects, Games has the highest number of backers. This incongruence would mean that Games have a much larger number of backers, making it one of the most popular Kickstarter categories on the site. We can also look at each category's project success rates in the following plots:
Very few categories have more successful projects than failed, like Music, Comics, and Theater. Technology has the lowest project success to project failure ratio.
Now, for the category-independent data, I looked at how long the projects were open and how successful they were. I didn't find a time related trend here but with some extra inspection, I was able to explain why some of the summary plots looked the way they did. First, you can look at the time sensitive data for Art for projects that were open between 0 and 80 days.
It looks like there are more successful projects from the 15 to 30 day range. If you look at the y-scale, you can see that we are only looking at success rates from 0 to 200%. When you zoom out, like in this Games plot, the data looks like it has a trend with a peak around 18 days.
If you look closely, there is line going through the bottom data points. This line is the median of the projects' successes per open project times. This line shows us that the data points we are seeing spread around this plot are really far from the data's actual average. Upon a closer look, you'll see that this line is very horizontal with a few days that are anomalies.
The fact that this line stays very horizontal is indicative of the fact that there are so many projects that have raised 0% of their goals. Time frames like 21 days (3 weeks) and 30 days (1 month) are the most popular date ranges. They are an easy amount of time for people to choose and having so many projects with most of them failing (as we saw in the plots above) pull down the average to zero. Even in using box and whisker plots, which I took off the app as they didn't show any useful information, there was nothing really to glean from this data when the average gets pulled down so significantly. Maybe if these plots are made with only successful projects, we would see a trend. (I will be doing this some time later this week just for my own interest and the blog will be adjusted as such. )
In conclusion, more than anything about the trends of kickstarter projects, I think this project shows how important it is to analyze your data to make sure they are accurately represented in the graphs. Looking at different graphs and looking at the same graphs with different scales result in different conclusions to be made. But as far as kickstarter goes, it seems Games have the widest and most enthusiastic audience, as Games also has the highest percentage of the goal reached at 40,000% of the goal raised for Energy Hook. Although this too is not honest representation as the goal requested was $1. The actual most money gained for a project was a smartwatch in the Design category at $20,338,986 in 2015.
You can find my code on my GitHub account here: https://github.com/susarip/test/blob/master/app.R. For the data set I used, you can find it here on Kaggle: https://www.kaggle.com/kemical/kickstarter-projects#ks-projects-201801.csv