Data Analysis on Athletic Shoes and Sneakers
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Overview
Using data from zappos.com I was able to paint a picture of what consumers value in top selling menβs athletic shoes and sneakers by brand. My motivation stems from wanting to know what consumers of menβs athletic shoes and sneakers have to say about them.Β I wanted to analyze which qualities consumers tended to base their purchasing decisions off. To achieve this, I had to pose some questions of interest. Which brands reign supreme in different ratings, such as comfort and style? What do consumers value in shoes? Is it the price? What about arch support? Could it be the true size feeling, true width feeling, or style?
True size feeling tells us if the consumers thought a shoe was too small, too large, or just right. Similarly, true width feeling tells us if the consumers thought a shoe was too narrow, too wide, or just right. I also wanted to see what consumers thought about different products, so I asked, βWhat are consumers saying about the shoes?β Finally, are there trends based on different shoe qualities? Without further ado, letβs examine the process of how the data for this project was collected, cleaned, and analyzed. We will then see what was uncovered in the resultant findings.Β
Process
Using the Scrapy web-crawling framework in Python, I was able to collect data from the top 200 selling menβs athletic shoes and sneakers from zappos.com. After collecting basic information on each shoe, such as price, brand name, and product name, I scraped ratings and comments from individual reviews. This data was cleaned and then organized in descending order of best true size feeling, true width feeling, and arch support. This structured data was then used to generate statistics for the shoe features, visual plots representing the variations in shoe ratings and correlations between them, and WordClouds to show what consumers are saying about different shoe brands.
Data Findings
After completing analysis using the shoes data, I sought out the top 5 selling brands and products. They are listed below:
![webscrape 1 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/e18fe309ab333c54b1ac05061717ffdd/webscrape-1.png)
Below are some statistics for different shoe attributes of the 19 unique brands that comprise the top 200 selling products:
![webscrape 2 | Data Science Blog Data Analysis on Athletic Shoes and Sneakers](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/cfcfdb0a92767c3419ab85d7109834e5/webscrape-2.png)
From the table above, we can see that most of the top selling shoes are priced $65 and below. It seems that on average, customers rate the shoes as having good true size feelings and better true width feelings. Notice how the mean arch support rating drops compared to the previous two ratings. This tells us consumers are not that satisfied with the arch support most shoe products offer. We also see that most consumers are, in general, generous when it comes to rating shoe style. They are seemingly most critical of comfort, and less of overall and style.
Specifics
Letβs now look at how the 19 unique brands performed individually in true size and true width in addition to their mean product price. I produced some horizontal barplots detailing the performance of each brand in the three categories. Each brandβs respective mean values are listed next to the plots.
![webscrape_new3 | Data Science Blog Data Analysis on Athletic Shoes and Sneakers](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/072c58c5ed1491d49d5c01f387d2a753/webscrape_new3.png)
![webscrape_new2 | Data Science Blog Data Analysis on Athletic Shoes and Sneakers](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/bbf2e62916d85344c5d18ed6de89e613/webscrape_new2.png)
![webscrape_new1 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/193d64ec535f27843dccb382e0f3e4dd/webscrape_new1-1.png)
Upon looking at the plot βMean Price by Brandβ, we can see right away that the brands Vans and Converse sit on the lower end of the price spectrum at $53.03 and $52.99, respectively. Their low product prices and timeless styles make them quite popular; itβs no secret why they collectively hold 4 of 5 spots for the top 5 selling products. Joining them are Saucony Originals and SKECHERS. SKECHERS has always been known to make highly affordable shoes that serve general purpose athletics. Saucony, a pioneer of running shoes, seems to have begun making cheaper shoes to increase sales.
When looking at the βMean True Size Feeling by Brandβ we see high quality names, like ASICS, Brooks, Nike and New Balance leading the way at 86.5, 88.23, 83.97, and 83.71, respectively. Also joining them is the classic Vans with a value of 82.93. Poorer performers include Superga at 56.0 and Converse at 63.13. I myself used to own a pair of Converse Chuck Taylor All Star Core Ox and found its size to be untrue of what the company claimed; it ran rather large.
With the βMean True Width Feeling by Brandβ plot we shall observe that all 19 brands perform well as a collective. Leaders in this category include Vans at 91.89, Saucony Originals at 91.99, and SKECHERS at 91.29, Nike at 91.73, and Converse at 91.54. This was expected; makers of athletic shoes strive for nominal size and width fits. Judging from the plots, Vans seems adept at making proper fitting, everyday purpose, stylish shoes at affordable prices; they have found their specialization in the shoe market. Upon review, I recommend that Superga, having performed poorly in true width feeling with a score of 71.0, should overhaul it shoe design process. Consumers of its products are critical of its productsβ actual sizes and widths.
Overal Rating vs Price
Below are a couple of jointplots and a boxplot that I made with the help of the Python data visualization libraries, Seaborn and Matplotlib. Β A jointplot shows, in addition to a scatterplot, a histogram for each variable of the scatterplot. In the examples below, the variables of interest are the mean overall rating vs price and mean style rating vs price. The tables next to the plots detail the statistics of different ratings for the top 5 selling brands.
![webscrape 6 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/da337a74c6fac244b1b955cc027d3301/webscrape-6.png)
![webscrape 7 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/c559eb644ee22e94b954f2eac23703ea/webscrape-7.png)
![webscrape 8 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/8cd0afeb00b6f3d4e100ca8df489c9a5/webscrape-8.png)
Data AnalysisΒ
The jointplot of overall rating vs price tells us that the majority concentration of reviews gave an overall rating of 4-5 stars for shoes in the $45-60 price range. Converse attains the highest mean overall rating of 4.73 for a single brand while Nike earns the lowest of 4.41. The standard deviations of their respective ratings are 0.68 and 1.12. This tells us that of the top 5, consumers of Nike shoe products were the most critical and likely to giving lower overall ratings. Consumers of Converse shoe products were the least critical and more likely to give high overall ratings.
We see in the jointplot of style rating vs price that consumers are, for the most part, generous in their style ratings; the histogram of style ratings tells us most consumers gave 5 stars for style.
The majority concentration of reviews gave an style rating of 4-5 stars for shoes in the $45-60 price range. Converse leads the top 5 brands in style with a mean rating of 4.90 while Nike is in last with a rating of 4.67. Consumers are very pleased with Converse shoe productsβ style; a standard deviation of 0.36 tells us that consumers generally tend to give high ratings and are not that critical of Converse shoe productsβ style. Nike has the highest standard deviation of style rating with a rating of 0.717. This tells us consumers of Nike shoe products are critical of their style and may give lower ratings.
Comfort Rating
The boxplot shows the variation in the comfort ratings for the top 5 brands. With outliers present it is difficult to gauge exactly what consumer opinion on comfort is, so a table detailing the statistics of the ratings is given on the right. Interestingly, we see that Converse and Nike swap places for comfort.
Converse was given the lowest comfort rating with a mean score of 4.49. It also has the lowest standard deviation of 0.81. This means most consumers are not satisfied with the comfort Converse shoe products provide. This may be attributed to the shoesβ poor arch support; weβll see evidence of this below. The standard deviation of 0.81 tells us that most people who wear Converse generally agree with one another that the shoes arenβt as comfortable as, say shoes from the other 4 leading brands.
Nike has the highest comfort rating with a mean score of 4.53. It also has the highest standard deviation of 0.97. I expected this since, as a master maker of athletic shoes across various sports, Nike should have a competitive edge in comfort and arch support. The standard deviation of 0.97 tells us the consumers may beg to differ amongst each other.
Word Clouds
The following WordClouds, made using the WordCloud package of Python, tell us what consumers are saying about shoe products for different brands. Β
![webscrape 9 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/82e096b3987072e47d810b9f7aeadef1/webscrape-9.png)
![webscrape 10 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/bdee3e4528ef7f3489c13d97345ccce9/webscrape-10.png)
![webscrape 11 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/72dbcfa9f15ee2242dfc49300c6e9862/webscrape-11.png)
Vans
In the Vans WordCloud we see words such as, love, comfortable, great, feel, slip on, good, and recommend. This supports our earlier finding that Vans is great at making affordable, stylish, everyday shoes with proper fitting. It seems that people are happy with the slip-on shoes the most.
Converse
In the Converse WordCloud, we see words such as, love, good, great, classic, Chuck Taylor, and arch support. As mentioned in the discussion of the mean comfort rating vs price boxplot above and seen in the WordCloud, consumers feel the arch support of Converse shoes is a bit weak. Consumers, however, still love the timeless look Converse has to offer; it seems they are particularly fond of the Chuck Taylor shoes.
Saucony
In the Saucony WordCloud, we see words such as, great, running, comfortable, fit, look, good, love and M574. This may indicate that consumers are very pleased with the M574 running shoe as it is highly affordable, comfortable, proper fitting, and looks stylish. Sauconyβs running shoes generally tend to be on the pricier end, so their push to deliver cheaper shoes seems to be paying off.
Conclusion and Future Pursuits
As a recap to all of this:
![webscrape 12 | Data Science Blog](https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2019/09/7be9562a4adbf6e44514b430f7785a93/webscrape-12.png)
- People love a timeless look. Everyone needs a back-to-basics option to fall back on, and Converse has that classic feel that transcends time.
- Consumers prioritize affordability and style when it comes to decision making.
- Everyday, all-purpose sneakers in the price range of $40-60 sell.
- Companies should look to improving comfort in shoes. Consumers may not shout it out loud, but they will always keep appreciate great arch support.
- Comfort is the third pillar of a shoesβ success, with affordability and style as one and two. Athletic shoes have better true fit. Leisure shoes rank higher in style. This whole βath-leisureβ craze makes sense now as people love style that doesnβt break the bank and is comfy. Since mean comfort ratings have shown to be lower than style and overall, companies may benefit from figuring out how to improve comfort.
This project provided a lot of valuable insight into menβs athletic shoes and sneakers. But, as always, we can always learn more from doing analysis greater in volume and variety. With more data and time, I would explore a connection between best selling products and social media influencers. Using sentiment analysis and more WordClouds Iβd look to see how shoe trends change over time. I would also repeat this entire process to see what makes athletic shoes and sneakers sell in the women's market.
Thanks for reading!