Data Scraping Chewy.com

Sean Kickham

Posted on May 14, 2017

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data shows the pet industry has grown three-fold since 1996. Even through the Great Recession, it grew more than 10%. People will cut back on eating out and vacations, but they'll continue to provide for their pets. And more young men and women are "picking pets over people" according to an article by the Washington Post. With the lifetime cost of a child being $233,000, compared to $32,850 for a dog, millennials are delaying parenthood and spending the extra money on their pets.

Increasingly, more and more customers are taking to online retail stores rather than their brick-and-mortar counterparts. Chewy.com is an online pet food & supplies company headquartered in Dania Beach, Florida. Their website has a layout typical to many e-stores, with sections divide by pets and item category.

The Data Scraping

To scrape the website, I created a spider using Scrapy in python. Using xpaths, most of the information was easy to pull, as seen in the image below.

Code:

However, the information became more difficult to scrape as I moved down the page. What looks like a table in the image below was actually an ordered list with no distinguishing attributes in the html code. What's more is that the categories in the table, and the order they were in, changed depending on the item.

My solution was to grab the list, zip it into itself to create a list of tuples, convert it to a dictionary, and then grab it into my data frame.

The Data:

My initial dataset contained 31 features and 14,000 observations. Numeric columns included:

sale price
original cost
number of reviews
average rating (out of 5 stars)
product customer recommendation percent

Categorical columns included:

product name
product description
brand
item category
product-specific features

The Data Analysis:

First, I explored what the data looked like when grouped by pet. Not only do dogs and cats have the largest number of foods and supplies to choose from, they also tend to be more expensive than other pets as shown in the graph below.

Below is a table showing the average cost per item, sale percentage per item, and customer recommendation percent grouped by pet. Small pets include gerbils, hamsters, guinea pigs, rabbits, ferrets, and more.

Not only are dog and cat items more expensive with lower sales, their owners are pickier about which items they recommend!

Average Cost by Category

I then looked at the products grouped by item category, such as food, grooming, and toys. The graph below shows the average cost by category. The associated table shows the average rating by category.

Dog food items make up over 40% of the total data. So I took a special look into that category. As shown in the first graph, more highly recommended brands are more expensive.

The highest-rated brand, Royal Canin with 4.82 stars, sells food for $67 on average. I then got pool-side with the data to see why some brands may be higher-rated and more expensive. I noticed that a lot of pricey items included a special diet. Selecting for this in the data, I retrieved the following information:

Special diet items not only have higher ratings on average, they also cost about $10 more per item! These findings coincide with trends in the pet food industry of more natural diet options.

Conclusions:

1. Pet owners care about quality:

Prefer natural foods
Prefer recommended brands
Willing to pay significantly more

2. Chewy should:

feature nutrition more significantly on their site
look into other areas for high quality products for millennial pet owners (travel, grooming)

Chewy actually already has all it needs. The website is user-friendly with a clean design, nutritional information, and even videos for the different products. However, this information is only available as you scroll down the page. Rearranging the layout of the item page will improve sales.

Future Work:

Further analysis can bring insight into customer buying habits. I would be interested in taking a look into:

product sales by the numbers
trends over time of products, brands, and individual customers
textual analysis of reviews
other online pet stores

About Author

Sean Kickham

Sean migrated from the Midwest to New York City after graduating with a BS in Mathematics from the University of Notre Dame. He taught middle school math for five years in city schools. Equipped with a Masters in...

View all posts by Sean Kickham >

Machine Learning

Beware of Feature Importance for Business Decisions

Capstone

LendingClub Grade Optimization

Data Visualization

Ames Iowa Home Sale Prediction

Data Visualization

Python Shows Factors Influencing University Retention Rates

Machine Learning

Boosting Real Estate Decisions

Cancel reply

You must be logged in to post a comment.

http://holisticlifecare.in/Product/nba-2k18-coins-low-cost-nba-2k18-mt-coins August 25, 2017

You can definitely see your skills within the article you write. The world hopes for even more passionate writers such as you who are not afraid to mention how they believe. Always follow your heart.

Mark July 17, 2017

I just like the valuable info you provide on your articles. I'll bookmark your blog andd test again right here regularly. I am somewht certain I will be toold many new stuff proper here! Best of luck for the following!

Sean Kickham June 20, 2017

Thanks for the compliment! You ask a very good ethical question. Since data science is still a relatively new field, I am not sure if these topics are being discussed enough. What I have found is that, in general, websites wish for you to use the scraped data in an academic way that does not harm the privacy of their users. I used Python's Scrapy library in my scraping, which comes with pre-written code in the settings file that looks like this: # Obey robots.txt rules ROBOTSTXT_OBEY = True This code communicates with the website you are trying to scrape and figures out what is pre-approved for scraping. Search engines use robots.txt as well when scraping the internet to generate relevant results. Hope this helps! -Sean

saurabh pundir June 17, 2017

Hi Sean, I just happen to get to your project page while googling about web scraping. I want to tell you that you have done very good work. I am a student myself and liked how you approached it. I have a question about scrapping. I always get scared that scrapping might get me into some trouble with website owners. Please suggest me how do you get permission from the website owner. What step should I follow before start scrapping and during scrapping data from the website? Thanks in advance.

Data Scraping Chewy.com

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

The Data Scraping

Code:

The Data:

The Data Analysis:

Average Cost by Category

Conclusions:

Future Work:

About Author

Sean Kickham

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Scraping Chewy.com

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

The Data Scraping

Code:

The Data:

The Data Analysis:

Average Cost by Category

Conclusions:

Future Work:

About Author

Sean Kickham

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!