Beer Reviews: An Analysis
For the web-scraping assignment I chose to indulge a hobby of mine: craft beers. I'm a beer enthusiast and thought it would be fun to analyze a beer review magazine. From the format you can tell that they are imitating wine rating systems. Beers are so different though. There are so many different kinds of categories of beers and and the measures for delicious beers are so subjective. Take for instance coffee beer: it's a new invention. Before you can measure it for "mouthfeel", as a reviewer, you first need to determine if you think coffee beer is a good idea. But I'm getting ahead of myself. First let me take you through my process step by step.
I started by sifting through beer review websites. I decided to choose one that had a few parameters and had many reviews on it. In the end, I chose https://beerconnoisseur.com/reviews. There are about 2000 reviews there. Judges Rating, Aroma, Appearance, Flavor, Mouthfeel, Overall Impression are the categorical impressions. There's also a text essay which contains the official category, but this was too difficult to scrape as the critique is not written in a standard format.
Python/Selenium was my technological tool of choice. I used clicking to get through the reviews, by going to the next page until all the pages were scraped and by drilling down on each individual review. There were a couple of challenges. For one, an ad popup sometimes randomly appeared, and my code would have to click past it to continue the traversal. Another challenge I had was that when clicking back after the individual review drill down, the website would return me to the first page of reviews, instead of the page I was up to before drilling down. (See below for my code.)
From a business point of view, I found the scores were very inflated. This is a website for people who love all beer. This is perhaps reflective of beer drinkers in general. The reviewers obviously do not intend to be very discerning. It's PR tool more than anything else.
And my code and presentation can be found here.
Enjoy and let me know your feedback!