Data Scraping Chewy.com
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data shows the pet industry has grown three-fold since 1996. Even through the Great Recession, it grew more than 10%. People will cut back on eating out and vacations, but they'll continue to provide for their pets. And more young men and women are "picking pets over people" according to an article by the Washington Post. With the lifetime cost of a child being $233,000, compared to $32,850 for a dog, millennials are delaying parenthood and spending the extra money on their pets.
Increasingly, more and more customers are taking to online retail stores rather than their brick-and-mortar counterparts. Chewy.com is an online pet food & supplies company headquartered in Dania Beach, Florida. Their website has a layout typical to many e-stores, with sections divide by pets and item category.
The Data Scraping
To scrape the website, I created a spider using Scrapy in python. Using xpaths, most of the information was easy to pull, as seen in the image below.
However, the information became more difficult to scrape as I moved down the page. What looks like a table in the image below was actually an ordered list with no distinguishing attributes in the html code. What's more is that the categories in the table, and the order they were in, changed depending on the item.
My solution was to grab the list, zip it into itself to create a list of tuples, convert it to a dictionary, and then grab it into my data frame.
My initial dataset contained 31 features and 14,000 observations. Numeric columns included:
- sale price
- original cost
- number of reviews
- average rating (out of 5 stars)
- product customer recommendation percent
Categorical columns included:
- product name
- product description
- item category
- product-specific features
The Data Analysis:
First, I explored what the data looked like when grouped by pet. Not only do dogs and cats have the largest number of foods and supplies to choose from, they also tend to be more expensive than other pets as shown in the graph below.
Below is a table showing the average cost per item, sale percentage per item, and customer recommendation percent grouped by pet. Small pets include gerbils, hamsters, guinea pigs, rabbits, ferrets, and more.
Not only are dog and cat items more expensive with lower sales, their owners are pickier about which items they recommend!
Average Cost by Category
I then looked at the products grouped by item category, such as food, grooming, and toys. The graph below shows the average cost by category. The associated table shows the average rating by category.
Dog food items make up over 40% of the total data. So I took a special look into that category. As shown in the first graph, more highly recommended brands are more expensive.
The highest-rated brand, Royal Canin with 4.82 stars, sells food for $67 on average. I then got pool-side with the data to see why some brands may be higher-rated and more expensive. I noticed that a lot of pricey items included a special diet. Selecting for this in the data, I retrieved the following information:
Special diet items not only have higher ratings on average, they also cost about $10 more per item! These findings coincide with trends in the pet food industry of more natural diet options.
1. Pet owners care about quality:
- Prefer natural foods
- Prefer recommended brands
- Willing to pay significantly more
2. Chewy should:
- feature nutrition more significantly on their site
- look into other areas for high quality products for millennial pet owners (travel, grooming)
Chewy actually already has all it needs. The website is user-friendly with a clean design, nutritional information, and even videos for the different products. However, this information is only available as you scroll down the page. Rearranging the layout of the item page will improve sales.
Further analysis can bring insight into customer buying habits. I would be interested in taking a look into:
- product sales by the numbers
- trends over time of products, brands, and individual customers
- textual analysis of reviews
- other online pet stores