FundRazr Online Fundraising Campaign successful?
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Created in Canada in 2009, FundRazr is a crowdfunding site designed to allow individual, nonprofits or companies to set up a campaign for a cause and people who believe in it to contribute online. The website separates similar campaigns in 18 different categories. The campaign sets a goal for the amount of money that the campaigner wants to raise with an option to set an end date as well.
To scrape data of the different campaigns on Fundrazr and to use the data that has been scraped to draw insights on what makes a crowdfunding campaign successful.
The first step was to create a scrapy script that would iterate through each of the 18 categories on fundrazr.
I created a loop that would run through and gather different urls for the respective starting pages of each category. Then a second loop was created using the urls provided by the first loop, with a page number added from 1 to 10 for each category. The reason I limited the pages scraped was because I had to work within my time constraint and could not allow unlimited scraping time. The second loop also extracted all the different project urls for each page per category. The last loop scraped the data in each individual campaign necessary for analysis.
The data that was scraped was the title of project, the category the project was sorted in, the currency the donations were given in, the number of contributors, the target amount of donations, the amount raised,when the campaign ends, the amount of updates on the page, and the number of comments on each page.
The picture below illustrates one campaign with these factors:
After extracting the data, I had to clean and modify it to prepare it for analysis. Below is a list of the adjustments that I made:
- Dropped NA values that are MCAR(missing completely at random)without imputing any values.
- Removed the currency labels ($|£|€|₽|kr|₱|Fr|₪|฿|¥), the commas, and periods by using regular expressions.
- Created a function that would convert strings into numeric values in the target column(e.g. 2.5k to 2500).
- Changed start date and end date columns to datetime and then subtracted the columns to create a total days columns. Afterwards, I changed the days columns values to positive numbers divided by seven to get the total weeks’ duration the campaign was open.
- Divided the amount raised by the number of contributors to get average contribution.
- Divided amount raised by target to get percent_complt as the metric used to see how successful the project is.
Below you can see the table of these factors with each campaign:
Below is a look at the success of campaigns in general:
As you can see above, the campaigns that are funded most often and the most in general are campaigns that last from 0 to 25 weeks. However, there are also a few outliers so I created a new scatter plot with percentages between 0 and 100 percent:
A majority of projects are completed only 20% of the time. A more coherent graph portraying this is a distribution plot of percent funded below:
The data indicates that campaigns tend to be completed when their target falls below 5000 dollars:
The next analysis that was performed was on the different categories available to see if categories affect the success rate:
Unsurprisingly, non-profits,health, and sports categories were funded with campaigns in those categories receiving on average 20 to 40 percent. On the other hand, their campaigns target settings are quite low. The opposite can be said for categories that show a low percentage of funded campaigns , as their targets are a bit too high.
The health category had the maximum average contribution, though it also had some of the smallest number of contributors. However, this could also be because the target usually set by campaigns in this category is so low. This is true for the business category as well. However, as the average target is set too high to achieve funding, the percent funded is low.
Lastly, I checked the effects of features, Comments and Updates, that made the campaign more visible to viewers. The number of times the campaign creator updated his page had no direct impact on the extent to which campaigns were funded. As you can see below, the amount of updates led to a variable amount of funding on average:
The same can be said for comments, as the amount of funding changed randomly per each number of comments made on each campaign page:
- Gather a lot more data
- Make a graph to determine if the amount of comments and updates per category make a difference in percent funded.