Whiskey Advocate Data
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
GitHub|LinkedIn
BACKGROUND AND INSPIRATION
Over the years, I’ve tasted a very limited selection of whiskies, and they have tended to be the usual common brands such as Johnnie Walker and Glenlivet. I've always wondered, is there a strong relationship between price and quality? A whiskey bottle could set you back $10 and in some cases even $100,000 . Whiskey prices are dictated by supply, demand, age, brand, etc. My analysis through the data gathered will focus on price and rating.
I was able to web scrape a website called Whiskyadvocate. Whiskyadvocate is America's leading whisky publication. It's a premier source for whisky information, education and entertainment for whisky enthusiasts. There are over 5,000+ whiskey reviews and its this data I plan to use.
PROJECT GOALS
- Is there a strong relationship between price and rating?
- Select top rated whiskies that costs less $ 150
DATA DICTIONARY
- Category – Whiskey category
- Brand – Whiskey brand
- Title – Whiskey title
- Alcohol Percentage- Alcohol Content
- Price: Price of the whiskey bottle (USD)
- Reviewer: Reviewed by
- Review: Rating (out of 100):
- 95-100 points—Classic: a great whisky
- 90-94 points— Outstanding: a whisky of superior character and style
- 85-89 points—Very good: a whisky with special qualities
- 80-84 points—Good: a solid, well-made whisky
- 75-79—Mediocre: a drinkable whisky that may have minor flaws
- 50-74—Not recommended
Note - Data outside the 95% confidence interval were removed due to price outliers
DATA ANALYSIS
Price vs. Review Data
Does price yield a better overall rating? As expected, as you increase price, the overall rating increases:
Review Distribution Data
We can see the reviews are distributed around 85-95 out of 100. In general, if you were to pick a whiskey bottle randomly you would most likely pick a bottle that reviewed well. Note - The PDF is skewed to the left.
Data of Top 15 Whiskey Categories
Lets now focus on the top 15 categories - This shouldn't come as a surprise, there's a large number of different single malts (Scotch) produced in excess of 2,000 bottles, followed by Bourbon, and Rye.
Average - Category Review vs Price – Top 15
- The 'Irish Single Pot Still' category on average scored the highest review at a price of < $200 and a score of 92 out of 100.
- Japanese whiskies are ranked second which is no surprise due to its high popularity.
- However, I would have expected a Single Malt Scotch in the top 15 considering there are over 2,000 different whiskies in this category.
Price (<$150) Box Plot – Top 15
- Single Malt Scotch and American Whiskies cost on average of ~$100.
- Canadian whiskies were priced at $65.
- Blended Malt Scotch Whiskies contained the greatest price dispersion.
Review (<$150) Box Plot – Top 15
- Bourbon/Tennessee on average yielded a greater review of 93/100 vs Irish/American Single Malts reviewed less favorably.
- Before we highlight a final selection of top whiskies, lets calculate look at the review points per $1 spent.
Review Data Points/$1 – Top 15
- For every $1 spent, English Grain Whiskey yielded 1.4 review points.
- Bourbon yielded 1.15 review points for every $1 spent.
- For every $1 spent, Irish single malts yielded 0.29 review points.
- While Japanese whiskies are popular, you would need to spend more for a good bottle – 0.43 points per dollar spent.
CONCLUSION
- Price does indeed yield a highly rated whiskey.
- The top whiskies of choice purely based on price/review were two Bourbons:
- Parker's Heritage Collection, 'Golden Anniversary - $150 – Scored 97/100.
- Four Roses Limited Edition Small Batch (2013 Release) - $85 – Scored 97/100.
FUTURE WORK
- To expand this data analysis, I would extract key terms from the actual review vs the overall score. This would help drive further insight and develop a more meaningful analysis.
- The overall conclusion, assumes all whiskies are the same. However, there are many differences between bottles within a category and across categories. For example, Bourbons are sweet and a subset of Scotch whiskies are smoky in flavor. This analysis needs to be refined to capture these differences, and to identify top picks unique to the reader.
- As a closing comment, the prices in this data set needs to be refreshed as they are stale. For example, the Parker's Heritage Collection, 'Golden Anniversary' whiskey bottle costs $4,000 and not $150.
About the Author
I am currently a Director at RBC working in the Equities Derivatives Technology Group. I graduated with a BSc in Computer Science from University of College London and recently completed my MBA at Chicago Booth. I am keen to explore the data science world and create actionable insight.