Crawling in a Plastic Jungle

Ricky Yue
Posted on Aug 22, 2016

Would you turn down a 35% discount on a purchase you are going to make? If your answer is no, would you be interested in using Python for web scraping or applying Shiny for interactive data visualization? If your answer is still no, you can skip the rest of the blog. Cardpool.com is one of the major gift card exchange websites where you can buy and sell discounted gift cards from hundreds of merchants.  Thousands of gift cards are exchanged through the website every day. Some gift cards are oftentimes available for sale and while some are usually out of stock.

With my recently acquired knowledge of web scraping by Python, I planned to scrape cardpool.com and developed a Shiny app to answer some questions that I had been curious about:

What cards are easy to obtain?

What cards are short in supply?

What cards are more popular?

Is a gift card’s popularity related to its discount?

 

How was the data scraped?

Since the availability of gift cards on cardpool.com are changing as sell and buy take place in real time, the basic idea was to scrape the website every a few hours on a given day so the dynamic change in exchange within that day could be monitored.

Cardpool.com has classified merchant gift cards into 33 categories, such as Apparel & Accessories, Birthday, Electronics and Grocery. It would be useful to compare the availability of each category such that you know whether it’s easy or not to buy the type of cards that you are interested in.  Below shows three components from the cardpool.com main page for buyers. Part 1 in red square rectangle shows the list of categories. Click on any category in Part 1 will direct to a new page that has Part 2 with available merchant cards and Part 3 with merchant cards out of stock within that category. The scraping strategy is to obtain the url list for each category in Part 1, and then to grep merchant name, percentage of discount, stock count for all merchants under that category in both Part 2 and Part 3.

pic1

The Python code for the first scraping strategy is below.

 

Next it would be interesting to scrape more information such as card face value, price for sale and percentage of discount for all merchant cards available. On cardpool.com, gift cards are sold in three forms: electronic, physical and mobile. My observation is that a card sold electronically could also be sold physically by mail or by mobile. As such the shown stock number of each type might overlap with the other for each merchant card. You can’t simply add them together when comparing stock availability between merchant cards. The flow of scraping starts from choosing a merchant from the main page and entering the merchant page to obtain available card information for each of the cart types.  For instance, as shown below, we first chose “1-800-Flowers.com” from the merchant list in Part 1 and loaded the merchant page with Part 2 and Part 3; then we selected card type in Part 2 and got the card information for each type in Part 3.

pic2

The code for the second scraping strategy is below.

I had planned to scrape the website a few times with a constant interval. However not every scheduled scraping task completed, it ended up with obtaining data from tasks started at 2 am, 5 am, 7 am, 8 am, 9 am, 2 pm, 3 pm, 5 pm, 8 pm, 9pm and 11pm(EST) on Saturday, 8/13/2016. Completion of each task took about 95 – 110 minutes.

After almost 24 hours’ crazy scraping, I obtained the stock number of all gift cards that had been sold until 8/13/2016 from the website. For every card available at the scraping moment, the merchant name, the category it belongs to, the face value, the price for sale, the percentage of discount and the card type were grepped, and the scraping time for that card (H:M:S) was also recorded.  The full website was scraped 11 times in a day that allows us to look into the time series of card availability.

Finally, the Shiny app absorbed all the data obtained and addressed the questions I mentioned at the beginning of the blog.

As shown in the Likert chart below, it seems gas cards are really hard to buy. Only 1 out 9 merchants (11.1%) within that category had cards available. In contrast, more than 80% of merchants that belong to Baby & Maternity, Pet Supplies and Children & Toy had cards in stock.

pic3

To further look into the change of card availability during a day, the time series chart was plotted for all categories. Below shows the time series of stock for Electronics and Gas cards of physical type. It shows that the Electronics cards were relatively abundant in supply with about 1200 cards available during the whole day. Gas cards were short in stock with only 2 cards available. Popular gas merchants such as Shell, BP and Chevron were all out of stock. As shown in the Likert chart above, Fast Food and Grocery cards were not adequate either. This would somehow reflect the fact that customers are less willing to sell gift cards which could be used for their basic needs.

pic4

Within each category, it would be interesting to compare competing merchants with regard to the availability of their gift cards.  As exhibited in the graph below, Macy’s seemed to have much less physical gift cards available in the market than Nordstrom. That could indicate Macy’s cards were either shorter in supply or greater in demand as compared to Nordstrom cards. Actually on cardpool.com, Macy’s card had 8% off while Nordstrom only had 5% off.  This would suggest the percentage of discount might be correlated with card availability.  Higher percentage off the face value might make a merchant card more popular.

pic5

So is it true that higher percentage in discount would make a merchant card more popular? To answer this question, we have to further define the popularity of a gift card. If the stock of a card is small and barely change during a day, we can’t say it’s popular. The card is more likely low in supply and demand. If the stock of a card is large and only changes in a small range during a day, we can’t say it’s popular either. This card is more likely high in supply but low in demand. Only when a card is low in supply and high in demand, we would say it as popular. I calculated the range between the maximum stock and the minimum stock and the average stock of a day for each merchant card. When considering both the average and the range of the card stock, we might better understand the popularity of a gift card. For instance, as shown in the table below, Best Buy had an average stock of 298 cards during a day, but only changed slightly with a range of 7 cards. In the same day, Bass Pro Shops had a smaller average of 248 and a larger range of 32. This would suggest Bass Pro Shops might be more popular than Best Buy.  Interestingly, Bass Pro Shops had 15% off while Best Buy had only 3% off. However, better discount not necessarily means higher popularity. Fleming’s Steakhouse had the largest 20% off in this table. However, it had a very small range of 4 and an average of 202 in stock, indicating it might not be popular among card buyers.

pic6

The correlation between popularity and discount was further analyzed in the scatter plot below. It seems that the merchant cards with stock high in range and low in average (left upper region) not necessarily had a bigger discount than those with stock low in range and high in average(right lower region).  This would suggest among all gift cards, the popularity and the percentage of discount might not be correlated. However, within each category, we might be able to observe certain correlation, especially when comparing competing merchants’ gift cards.

pic7

About Author

Ricky Yue

Ricky Yue

As a data enthusiast, Ricky loves to think the real life issues in a quantitative way. He likes to talk about probability and alternative. He’s proud of his Bayesian skepticism based on years of scientific training. He was...
View all posts by Ricky Yue >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp