Scraping Chewy.com

Avatar
Posted on May 14, 2017

The pet industry has grown three-fold since 1996. Even through the Great Recession, it grew more than 10%.  People will cut back on eating out and vacations, but they'll continue to provide for their pets. And more young men and women are "picking pets over people" according to an article by the Washington Post. With the lifetime cost of a child being $233,000, compared to $32,850 for a dog, millennials are delaying parenthood and spending the extra money on their pets.

Increasingly, more and more customers are taking to online retail stores rather than their brick-and-mortar counterparts. Chewy.com is an online pet food & supplies company headquartered in Dania Beach, Florida.  Their website has a layout typical to many e-stores, with sections divide by pets and item category.

The Scraping

To scrape the website, I created a spider using Scrapy in python. Using xpaths, most of the information was easy to pull, as seen in the image below.

Code:

However, the information became more difficult to scrape as I moved down the page. What looks like a table in the image below was actually an ordered list with no distinguishing attributes in the html code. What's more is that the categories in the table, and the order they were in, changed depending on the item.

My solution was to grab the list, zip it into itself to create a list of tuples, convert it to a dictionary, and then grab it into my data frame.

The Data: 

My initial dataset contained 31 features and 14,000 observations.  Numeric columns included:

  • sale price
  • original cost
  • number of reviews
  • average rating (out of 5 stars)
  • product customer recommendation percent

Categorical columns included:

  • product name
  • product description
  • brand
  • item category
  • product-specific features

The Analysis:

First, I explored what the data looked like when grouped by pet. Not only do dogs and cats have the largest number of foods and supplies to choose from, they also tend to be more expensive than other pets as shown in the graph below.

Below is a table showing the average cost per item, sale percentage per item, and customer recommendation percent grouped by pet. Small pets include gerbils, hamsters, guinea pigs, rabbits, ferrets, and more.

Not only are dog and cat items more expensive with lower sales, their owners are pickier about which items they recommend!

I then looked at the products grouped by item category, such as food, grooming, and toys. The graph below shows the average cost by category. The associated table shows the average rating by category.

Dog food items make up over 40% of the total data. So I took a special look into that category. As shown in the first graph, more highly recommended brands are more expensive.

The highest-rated brand, Royal Canin with 4.82 stars, sells food for $67 on average. I then got pool-side with the data to see why some brands may be higher-rated and more expensive. I noticed that a lot of pricey items included a special diet. Selecting for this in the data, I retrieved the following information:

Special diet items not only have higher ratings on average, they also cost about $10 more per item! These findings coincide with trends in the pet food industry of more natural diet options.

Conclusions:

1. Pet owners care about quality:

  • Prefer natural foods
  • Prefer recommended brands
  • Willing to pay significantly more

2. Chewy should:

  • feature nutrition more significantly on their site
  • look into other areas for high quality products for millennial pet owners (travel, grooming)

Chewy actually already has all it needs. The website is user-friendly with a clean design, nutritional information, and even videos for the different products. However, this information is only available as you scroll down the page. Rearranging the layout of the item page will improve sales.

Future Work:

Further analysis can bring insight into customer buying habits. I would be interested in taking a look into:

  • product sales by the numbers
  • trends over time of products, brands, and individual customers
  • textual analysis of reviews
  • other online pet stores

 

About Author

Avatar

Sean Kickham

Sean migrated from the Midwest to New York City after graduating with a BS in Mathematics from the University of Notre Dame. He taught middle school math for five years in city schools. Equipped with a Masters in...
View all posts by Sean Kickham >

Related Articles

Leave a Comment

Avatar
http://holisticlifecare.in/Product/nba-2k18-coins-low-cost-nba-2k18-mt-coins August 25, 2017
You can definitely see your skills within the article you write. The world hopes for even more passionate writers such as you who are not afraid to mention how they believe. Always follow your heart.
Avatar
Mark July 17, 2017
I just like the valuable info you provide on your articles. I'll bookmark your blog andd test again right here regularly. I am somewht certain I will be toold many new stuff proper here! Best of luck for the following!
Avatar
Sean Kickham June 20, 2017
Thanks for the compliment! You ask a very good ethical question. Since data science is still a relatively new field, I am not sure if these topics are being discussed enough. What I have found is that, in general, websites wish for you to use the scraped data in an academic way that does not harm the privacy of their users. I used Python's Scrapy library in my scraping, which comes with pre-written code in the settings file that looks like this: # Obey robots.txt rules ROBOTSTXT_OBEY = True This code communicates with the website you are trying to scrape and figures out what is pre-approved for scraping. Search engines use robots.txt as well when scraping the internet to generate relevant results. Hope this helps! -Sean
Avatar
saurabh pundir June 17, 2017
Hi Sean, I just happen to get to your project page while googling about web scraping. I want to tell you that you have done very good work. I am a student myself and liked how you approached it. I have a question about scrapping. I always get scared that scrapping might get me into some trouble with website owners. Please suggest me how do you get permission from the website owner. What step should I follow before start scrapping and during scrapping data from the website? Thanks in advance.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp