Web Scraping Product Details from Sunglass Hut and Woot!
Sunglasses product details were scraped from the Sunglass Hut and Woot! websites in order to perform an exploratory data analysis (EDA) and to compare the deals on Woot! to the retail prices on Sunglass Hut. The above word cloud was produced using the descriptions of the sunglasses on Sunglass Hut. The code used to scrape and analyze this data may be found on the Git Hub.
Web scraping
The Sunglass Hut website uses Ajax to load more sunglasses on each page when you click a button at the bottom the screen. For this reason Selenium had to be used to interact with this dynamic website. The main brand page was visited, and the "load more" button was programed to click until all sunglasses were visible on the page. Then the url to each of those pairs was scraped and saved in a CSV file. This list of urls was then the starting urls in a Scrapy spider that visited each and collected the urls to all the different colors of the same pair. This was necessary because each pair that comes in multiple colors will have different product details for each color. For example, some colors may have polarized lenses, while some may not. Then, each individual pair's url was visited and the product details were scraped.
For each pair, the brand, description, name, price, whether it is on sale and by how much, whether the lenses are polarized, frame color, frame material, lens color, lens material, lens technology, shape, url to the product page, and face shape for best look was scraped.
What are the most expensive brands?
We break down the median price per pair for each brand.
How do the price distributions of the most expensive two brands compare?
We see that, although Fendi has a higher median price, Bulgari has a few pairs that are extremely expensive. In fact, the most expensive pair on all of the Sunglass Hut website is from Bulgari
It is also worth noting that Bulgari has many more models available than Fendi, as the next graph demonstrates.
Let's choose a few brands
For the purpose of an exploratory data analysis, let's pick a few brands to analyze. The following graph shows the number of pairs available on Sunglass Hut from each brand. We see that Ray-Ban is far and away the most, followed by Oakley, Vogue and Prada. We will also include Gucci because it is a popular brand, and we will include the Prada Linea Rossa sunglasses along with the Prada sunglasses.
We see that, among these brands, Gucci seems to be the most expensive overall, followed by Prada. Ray-Ban and Oakley seem similarly priced, while Vogue is the cheapest among these brands.
Lens polarization
How does whether or not the lenses are polarized affect the price of the sunglasses. One would assume this feature would result in an increased price. Do the numbers bear this out? From the below graph, we see that for most brands, the polarized sunglasses tend to be more expensive than the non-polarized sunglasses. The notable exception seems to be Gucci, where the median price of polarized sunglasses is less than non polarized. This is due to the fact that many of the most high-end sunglasses are not polarized. We see that the price distribution of polarized sunglasses is much more strongly peaked near its median. In other words, Gucci has some relatively cheaper non-polarized sunglasses and also very expensive non-polarized sunglasses.
Below is a distribution of the prices among all brands of lenses that are polarized (red, right) and lenses that are not polarized (blue, left). The difference of these distributions was found to be very significant (p-value less than 1e-14) based on the Kolmogorov–Smirnov test.
Further EDA
Further EDA can be done on this data set. For example, the following graphs give price by frame color as well as price by face shape for best look.
Woot!
A Scrapy spider was also written to scrape the product details of the sunglasses on Woot!. This data set was then joined with the Sunglass Hut data set when we could find possible matches. Many pairs contain a letter and numerical digit label in the name of the pair which can then be matched on both pages. Then, further exploration can be done to determine if the deals on Woot! are as good as they seem.
The first pair is an easy match and a good deal.
Items 0 through 6 in the above table all correspond to different Wayfarer sunglasses by Ray-Ban of different colors. The exact color from Woot! could not be found on Sunglass Hut. See the images below:
Best use of this data
This data is rich enough to explore several features to find sunglasses that you like or to match pairs with Woot! to find deals. It may be useful to do this kind of brand analysis if you are opening a shop or if you are thinking of manufacturing sunglasses. There is also a market for reselling sunglasses on sites like Poshmark and TheRealReal. This data could be used to find deals on Woot! that may be resold on Poshmark or TheRealReal, although a further analysis of sales on those websites would become necessary.