What makes a crowdfunding campaign a success ?
The third project of the bootcamp was geared towards web scraping data. I decided to scrape GoFundme.com , a crowdfunding platform which allows people to raise money for events from accidents to trips. My preliminary goal for this project was to explore patterns, to find factors which influence a fund raising campaign , quantify "success" of a campaign, and build a model to predict it.
- Webscraping using Scrapy
- Data Munging
- Exploratory Data Analysis
- Future Work
The website’s front page has a very simple layout with previews of a few campaigns and of categories on the left side of the page. I scraped the first 100 ads from the Medical, Memorials, Emergencies, Volunteer, Charity, Animals, Sports, and Education categories.
The first campaign from the snapshot above provides a general template of the data available. It has a customary title, monetary goal, current amount, creation date, number of contributors, the creator’s name, the creator’s location, the campaign’s category under which it was set up , number of likes , number of Facebook shares, story section describing the fundraising cause, photo , name and time someone donated and comments left by people .
Due to time limitations, I decided to scrape only the numerical data for this project so the processing and analysis could be completed quickly.
2 Webscraping using Scrapy
I used scrapy for crawling web sites and extracting the data. I started with setting up containers to collect the items.
The most time intensive portion of creating the scraping framework was finding the relevant xpaths for the data . Once that was done, the Scrapy spider collected data from 100 campaigns across eight categories which was stored in a csv file. It took me a while to get a handle of finding xpaths and making scrapy spider but ultimately tweaking scrapy tutorial proved to be sufficient for the task in hand.
3 Data Munging
I cleaned and processed the data in Python. The cleaning process mostly involved encoding strings, and removing punctuation and numerical abbreviations. I then created two new features: "days" for the number of days the ad has been up since the creation date and percentage of monetary target met. Then i focused on averages of the downloaded and new features across the eight categories.
I started with plotting the distribution for number of days an ad was up for different categories . Unsurprisingly, urgent ads such as those for memorial services , emergencies and medical needs were the shortest (a median of 3 days). Personal categories such as sports, volunteering, and charity were up there for longer time (a median of about 10 days) . To my surprise the animals category also had a low median (about 4 days) Americans love animals !
From the perspective of campaign targets, the medical category seems to have high values , with a few statistical outliers with targets of half a million. Another stark outlier is in Education where a campaign had a target of one million dollars. I personally would like to know what that person is studying !
The average contribution per person seemed to have a more uniform distribution with an average of about $ 80 . The greatest value here occurred in the Emergency category with an average of $100. I suspect that people contribute what they can afford or are comfortable with most of the time and is not influenced by categories.
If we look at current funding statuses with respect to the percentage of the target amount garnered, there is a uniformity across all categories of about ~ 60% ! I am impressed that if one sets up a $1000 campaign they can raise on an average at least $600.
My next investigation looked at the average number of people who contributed to a campaign and the average number of social media shares per category. Campaigns with urgent needs are more active in this regard while medical needs have more people contributing on average. Memorials have the highest number of Facebook shares.
The correlation plot of all the features shows two strong correlations. One seemed a bit trivial in that is the money raised being proportional to the number of contributors. The other one between money raised and the number of Facebook shares is more interesting. It means even if a person contributes or doesn't contribute it always helps to spread the word and pass the ad around on social media.
- Urgent category funding campaigns on an average set higher targets.
- Not only that, they also have a lot of activity in terms of numbers of people contributing, number of social media shares, and reach their target in the shortest amount of time compared to other categories
- The average contribution per person is roughly $80 and is more or less uniform across all the categories
- All the categories are also on an average at 60% completion status with respect to their target amount.
- Sharing on social media is one of the most influential factors to success of a campaign.
6 Future Work
The major focus of this project was to scrape and analyze data from Go Fund Me. I believe that there is more to learn from the data I collected and even more potential insights in the uncollected data. Some of the things I want to explore in the future are:
- Extending analysis to include all other categories.
- Using NLP for feature selection from text data
- Time-series analysis of donations for a campaign and exploration of completion rate and patterns.
- Using machine learning to predict what percentage of a project will be completed in a given time interval.