Web Scraping to Visualize Trends in Deals Using Data
Motivation:
In a world full of deals and coupons, have you ever wondered which deals are actually good deals?
Anyone familiar with consumer psychology can tell you that people love deals. Those huge, red signs saying "30% off" or "Buy 1 Get 1 Free" are very attractive to consumers. So much so that many companies are having sale items all year round.
This raises the question about the quality of these deals. Do these deals exist because the items have poorer quality (e.g. a jacket with a scratch on the back)? Do they exist because the functionality is obsolete (e.g floppy disks)? Do they exist because the inventory is low or the item is out of season? Whatever the reason may be, there is always a reason. The interesting question is, is this deal a good deal and will it save me money.
About dealmoon.com:
Dealmoon.com is very similar to groupon.com, where it gathers information of deals and coupons from merchants in the U.S., and groups them into different categories (e.g. Clothing, Electronics, Baby, etc.). All information are available on their website for free.
Web Scraping:
I used the Selenium package in Python to scrape all data.
Some logistics about the data I scraped:
-
- Total of ~45,000 deals from 8 categories (i.e. Clothing, Beauty, Nutrition, Baby, Home, Electronics, Travel, Finance )
- Total of 6 attributes (i.e. category of deal, deal title, deal description, posted time, number of comments, number of bookmarks)
- The entire crawling process took ~6hrs.
Visualisations:
-
What are the popular deals?
For me, when I try to find good deals I always check the popular dealsโโdeals with a lot of bookmarks and comments. My rationale is that if a deal has high popularity, it must be good; the chance of a group of people bookmarking a bad deal is low. Under this assumption, I first explored the popular deals.
To take into consideration that maybe not everyone defines popularity the same way, the App allows the users to define "popularity" by whichever metric they like: the number of bookmarks, the number of comments, or both.
-
Which stores always have good deals?
By now, you know enough about the functionality of this app to explore this topic on your own, my dear reader. Find the link to the app at the end of this post, and find out which stores always have good deals. You may be surprised!
-
When are there most deals?
Future Directions:
All the above-mentioned visualisations will help us understand which deals are good deals and which stores always have good deals. However, one drawback of this analysis is that they are all post-hoc analysesโโthey will only inform users which deals they should take advantage of AFTER other users have used the deal. By then, it may be too late: the deal is no longer valid or the item has been sold out. Therefore, in order to fully take advantage of past deals, one approach is to use Natural Language Processing to extract patterns of previous good deals to help classify new deals to be good or bad in real time.
The patterns may be the type of deal (e.g. 'Buy 1 get 1 free', 'Free shipping for orders over $100'), the duration of the deal (e.g. 'Today only', 'Valid for this weekend'), deducted percentage (e.g. '$50, originally $100', ' $100, originally $250').