Crawling in a Plastic Jungle

Posted on Aug 22, 2016

Would you turn down a 35% discount on a purchase you are going to make? If your answer is no, would you be interested in using Python for web scraping or applying Shiny for interactive data visualization? If your answer is still no, you can skip the rest of the blog. is one of the major gift card exchange websites where you can buy and sell discounted gift cards from hundreds of merchants.  Thousands of gift cards are exchanged through the website every day. Some gift cards are oftentimes available for sale and while some are usually out of stock.

With my recently acquired knowledge of web scraping by Python, I planned to scrape and developed a Shiny app to answer some questions that I had been curious about:

What cards are easy to obtain?

What cards are short in supply?

What cards are more popular?

Is a gift card’s popularity related to its discount?


How was the data scraped?

Since the availability of gift cards on are changing as sell and buy take place in real time, the basic idea was to scrape the website every a few hours on a given day so the dynamic change in exchange within that day could be monitored. has classified merchant gift cards into 33 categories, such as Apparel & Accessories, Birthday, Electronics and Grocery. It would be useful to compare the availability of each category such that you know whether it’s easy or not to buy the type of cards that you are interested in.  Below shows three components from the main page for buyers. Part 1 in red square rectangle shows the list of categories. Click on any category in Part 1 will direct to a new page that has Part 2 with available merchant cards and Part 3 with merchant cards out of stock within that category. The scraping strategy is to obtain the url list for each category in Part 1, and then to grep merchant name, percentage of discount, stock count for all merchants under that category in both Part 2 and Part 3.


The Python code for the first scraping strategy is below.


Next it would be interesting to scrape more information such as card face value, price for sale and percentage of discount for all merchant cards available. On, gift cards are sold in three forms: electronic, physical and mobile. My observation is that a card sold electronically could also be sold physically by mail or by mobile. As such the shown stock number of each type might overlap with the other for each merchant card. You can’t simply add them together when comparing stock availability between merchant cards. The flow of scraping starts from choosing a merchant from the main page and entering the merchant page to obtain available card information for each of the cart types.  For instance, as shown below, we first chose “” from the merchant list in Part 1 and loaded the merchant page with Part 2 and Part 3; then we selected card type in Part 2 and got the card information for each type in Part 3.


The code for the second scraping strategy is below.

I had planned to scrape the website a few times with a constant interval. However not every scheduled scraping task completed, it ended up with obtaining data from tasks started at 2 am, 5 am, 7 am, 8 am, 9 am, 2 pm, 3 pm, 5 pm, 8 pm, 9pm and 11pm(EST) on Saturday, 8/13/2016. Completion of each task took about 95 – 110 minutes.

After almost 24 hours’ crazy scraping, I obtained the stock number of all gift cards that had been sold until 8/13/2016 from the website. For every card available at the scraping moment, the merchant name, the category it belongs to, the face value, the price for sale, the percentage of discount and the card type were grepped, and the scraping time for that card (H:M:S) was also recorded.  The full website was scraped 11 times in a day that allows us to look into the time series of card availability.

Finally, the Shiny app absorbed all the data obtained and addressed the questions I mentioned at the beginning of the blog.

As shown in the Likert chart below, it seems gas cards are really hard to buy. Only 1 out 9 merchants (11.1%) within that category had cards available. In contrast, more than 80% of merchants that belong to Baby & Maternity, Pet Supplies and Children & Toy had cards in stock.


To further look into the change of card availability during a day, the time series chart was plotted for all categories. Below shows the time series of stock for Electronics and Gas cards of physical type. It shows that the Electronics cards were relatively abundant in supply with about 1200 cards available during the whole day. Gas cards were short in stock with only 2 cards available. Popular gas merchants such as Shell, BP and Chevron were all out of stock. As shown in the Likert chart above, Fast Food and Grocery cards were not adequate either. This would somehow reflect the fact that customers are less willing to sell gift cards which could be used for their basic needs.


Within each category, it would be interesting to compare competing merchants with regard to the availability of their gift cards.  As exhibited in the graph below, Macy’s seemed to have much less physical gift cards available in the market than Nordstrom. That could indicate Macy’s cards were either shorter in supply or greater in demand as compared to Nordstrom cards. Actually on, Macy’s card had 8% off while Nordstrom only had 5% off.  This would suggest the percentage of discount might be correlated with card availability.  Higher percentage off the face value might make a merchant card more popular.


So is it true that higher percentage in discount would make a merchant card more popular? To answer this question, we have to further define the popularity of a gift card. If the stock of a card is small and barely change during a day, we can’t say it’s popular. The card is more likely low in supply and demand. If the stock of a card is large and only changes in a small range during a day, we can’t say it’s popular either. This card is more likely high in supply but low in demand. Only when a card is low in supply and high in demand, we would say it as popular. I calculated the range between the maximum stock and the minimum stock and the average stock of a day for each merchant card. When considering both the average and the range of the card stock, we might better understand the popularity of a gift card. For instance, as shown in the table below, Best Buy had an average stock of 298 cards during a day, but only changed slightly with a range of 7 cards. In the same day, Bass Pro Shops had a smaller average of 248 and a larger range of 32. This would suggest Bass Pro Shops might be more popular than Best Buy.  Interestingly, Bass Pro Shops had 15% off while Best Buy had only 3% off. However, better discount not necessarily means higher popularity. Fleming’s Steakhouse had the largest 20% off in this table. However, it had a very small range of 4 and an average of 202 in stock, indicating it might not be popular among card buyers.


The correlation between popularity and discount was further analyzed in the scatter plot below. It seems that the merchant cards with stock high in range and low in average (left upper region) not necessarily had a bigger discount than those with stock low in range and high in average(right lower region).  This would suggest among all gift cards, the popularity and the percentage of discount might not be correlated. However, within each category, we might be able to observe certain correlation, especially when comparing competing merchants’ gift cards.


About Author

Ricky Yue

As a data enthusiast, Ricky loves to think the real life issues in a quantitative way. He likes to talk about probability and alternative. He’s proud of his Bayesian skepticism based on years of scientific training. He was...
View all posts by Ricky Yue >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI