What makes a crowdfunding campaign a success ?

Deepak Khurana
Posted on Aug 22, 2016

Introduction 

The third project of the bootcamp was geared towards web scraping data. I decided to scrape GoFundme.com , a crowdfunding platform which allows people to raise money for events from accidents to trips. My preliminary goal for this project was to explore patterns, to find factors which influence a fund raising campaign , quantify  "success"  of a campaign, and build a model to predict it.

Contents

  1. GoFundeme
  2. Webscraping using Scrapy
  3. Data Munging
  4. Exploratory Data Analysis
  5. Conclusions
  6. Future Work

 

1 GoFundme

The website’s front page has a very simple layout with previews of a few campaigns and of categories on the left side of the page. I scraped the first 100 ads from the Medical, Memorials, Emergencies, Volunteer, Charity, Animals, Sports,  and Education categories.

Screen Shot 2016-08-18 at 2.13.22 PM

The first campaign from the snapshot above provides a general template of the data available. It has a customary title, monetary goal,  current amount, creation date, number of contributors, the creator’s name, the creator’s location, the campaign’s category under which it was set up , number of likes , number of Facebook shares, story section describing the fundraising cause, photo , name and time someone donated and comments left by people .

Screen Shot 2016-08-18 at 2.15.02 PM

Due to time limitations, I decided to scrape only the numerical data for this project so the processing and analysis could be completed quickly. 

2 Webscraping using Scrapy

I used  scrapy for crawling web sites and extracting the data. I started with setting up containers to collect the items.

https://gist.github.com/dkhurana1306/07631f85b623e7e6ead2f9177fdf5062

The most time intensive portion of creating the scraping framework was finding the relevant xpaths for the data . Once that was done, the Scrapy spider collected data from 100 campaigns across eight categories which was stored in a csv file. It took me a while to get a handle of finding xpaths and making scrapy spider but ultimately tweaking  scrapy tutorial  proved to be sufficient for the task in hand.

https://gist.github.com/dkhurana1306/d5d394421441f42f625ceac93b122978

 

 

3 Data Munging

I cleaned and processed the data in Python. The cleaning process mostly involved encoding strings, and removing punctuation and numerical abbreviations.  I then created two new features: "days"  for the number of days the ad has been up since the creation date and percentage of monetary target met. Then i focused on averages of the downloaded and new features across the eight categories.

https://gist.github.com/dkhurana1306/facff5d03fd7a5bdf939b15432ad925b

 

4 EDA

I started with plotting  the distribution for number of days an ad was up for different categories . Unsurprisingly, urgent ads such as those for memorial services , emergencies and medical needs were the shortest (a median of 3 days). Personal categories such as sports, volunteering, and charity were up there for longer time (a median of about 10 days) . To my surprise the animals category also had a low median (about 4 days) Americans love animals !

Screen Shot 2016-08-18 at 1.29.51 PM

From the perspective of  campaign targets,  the medical category seems to have high values , with a few statistical outliers with targets of half a million.  Another stark outlier is in Education where a campaign had a target of one million dollars. I personally would like to know what that person is studying !

Screen Shot 2016-08-18 at 1.54.14 PM

The average contribution per person seemed to have a more uniform distribution with an average of about $ 80 . The greatest value here occurred in the Emergency category with an average of $100. I suspect that people contribute what they can afford or are comfortable with most of the time and is not influenced by categories.

 

Screen Shot 2016-08-18 at 1.51.12 PM

If we look at current funding statuses with respect to the percentage of the target amount garnered, there is a uniformity across all categories of about ~ 60% ! I am  impressed that if one sets up a $1000 campaign they can raise on an average at least $600.

 

Screen Shot 2016-08-18 at 1.50.33 PM

 

My next investigation looked at the average number of people who contributed to a campaign and the average number of social media shares per category. Campaigns with urgent needs are more active in this regard while medical needs have more people contributing on average. Memorials have the highest number of Facebook shares.

 

Screen Shot 2016-08-18 at 1.51.54 PMScreen Shot 2016-08-18 at 1.48.43 PM

 

The correlation plot of all the features shows two strong correlations. One seemed a bit trivial in that is the money raised being proportional to the number of contributors. The other one between money raised and the number of Facebook shares is more interesting. It means even if a person contributes or doesn't contribute it always helps to spread the word and pass the ad around on social media.

Screen Shot 2016-08-18 at 1.47.19 PM 

 

5 Conclusions

  • Urgent category funding campaigns on an average set higher targets.
  • Not only that,  they also have a lot of activity in terms of numbers of people contributing,  number of social media shares, and reach their target in the shortest amount of time compared to other categories
  • The average contribution per person is roughly $80  and is more or less uniform across all the categories
  • All the categories are also on an average at 60% completion status with respect to their target amount.
  • Sharing on social media is one of the most influential factors to success of a campaign.

6 Future Work 

The major focus of this project was to scrape and analyze data from Go Fund Me. I believe that there is more to learn from the data I collected and even more potential insights in the uncollected data. Some of the things I want to explore in the future are:

  • Extending analysis to include all other categories.
  • Using NLP for feature selection from text data
  • Time-series analysis of donations for a campaign and exploration of completion rate and patterns.
  • Using machine learning to predict what percentage of a project will be completed in a given time interval.

About Author

Deepak Khurana

Deepak Khurana

Deepak holds a Masters Degree in Physics from the Indian Institute of Technology Kharagpur, one of the top engineering school in India. He was then awarded the Henry M. MacCracken fellowship at New York University to pursue a...
View all posts by Deepak Khurana >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp