Web scraping dealmoon.com to visualise trends in deals

Jielei (Emma) Zhu
Posted on Aug 22, 2016

Motivation:

In a world full of deals and coupons, have you ever wondered which deals are actually good deals?

Anyone familiar with consumer psychology can tell you that people love deals. Those huge, red signs saying "30% off" or "Buy 1 Get 1 Free" are very attractive to consumers. So much so that many companies are having sale items all year round. This raises the question about the quality of these deals. Do these deals exist because the items have poorer quality (e.g. a jacket with a scratch on the back)? Do they exist because the functionality is obsolete (e.g floppy disks)? Do they exist because the inventory is low or the item is out of season? Whatever the reason may be, there is always a reason. The interesting question is, is this deal a good deal and will it save me money.

 

About dealmoon.com:

Dealmoon.com is very similar to groupon.com, where it gathers information of deals and coupons from merchants in the U.S., and groups them into different categories (e.g. Clothing, Electronics, Baby, etc.). All information are available on their website for free.

Web Scraping:

I used the Selenium package in Python to scrape all data.

Some logistics about the data I scraped:

    • Total of ~45,000 deals from 8 categories (i.e. Clothing, Beauty, Nutrition, Baby, Home, Electronics, Travel, Finance )
    • Total of 6 attributes (i.e. category of deal, deal title, deal description, posted time, number of comments, number of bookmarks)
    • The entire crawling process took ~6hrs.

 

Visualisations:

  • What are the popular deals?

For me, when I try to find good deals I always check the popular deals––deals with a lot of bookmarks and comments. My rationale is that if a deal has high popularity, it must be good; the chance of a group of people bookmarking a bad deal is low. Under this assumption, I first explored the popular deals.

 

Screen Shot 2016-08-21 at 10.51.49 PM

 

To take into consideration that maybe not everyone defines popularity the same way, the App allows the users to define "popularity" by whichever metric they like: the number of bookmarks, the number of comments, or both.

  • What are the popular stores?

    Screen Shot 2016-08-21 at 10.50.34 PM

 

  • Which stores always have good deals?

By now, you know enough about the functionality of this app to explore this topic on your own, my dear reader. Find the link to the app at the end of this post, and find out which stores always have good deals. You may be surprised!

 

  • When are there most deals?

Screen Shot 2016-08-21 at 10.49.00 PM

 

Screen Shot 2016-08-21 at 10.49.34 PM

 

Future Directions:

All the above-mentioned visualisations will help us understand which deals are good deals and which stores always have good deals. However, one drawback of this analysis is that they are all post-hoc analyses––they will only inform users which deals they should take advantage of AFTER other users have used the deal. By then, it may be too late: the deal is no longer valid or the item has been sold out. Therefore, in order to fully take advantage of past deals, one approach is to use Natural Language Processing to extract patterns of previous good deals to help classify new deals to be good or bad in real time. The patterns may be the type of deal (e.g. 'Buy 1 get 1 free', 'Free shipping for orders over $100'), the duration of the deal (e.g. 'Today only', 'Valid for this weekend'), deducted percentage (e.g. '$50, originally $100', ' $100, originally $250').

 

 

To check out the App, click here.

All code used to generate the App can be found on my Github.

About Author

Jielei (Emma) Zhu

Jielei (Emma) Zhu

Emma (Jielei) Zhu graduated from New York University in May 2016 with a B.A. in Computer Science and Psychology and a minor in Mathematics. In school, Emma was able to explore her interests to the fullest by taking...
View all posts by Jielei (Emma) Zhu >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp