Beer Reviews: An Analysis

Avatar
Posted on Feb 19, 2018

 

For the web-scraping assignment I chose to indulge a hobby of mine: craft beers.  I'm a beer enthusiast and thought it would be fun to analyze a beer review magazine.  From the format you can tell that they are imitating wine rating systems.  Beers are so different though.  There are so many different kinds of categories of beers and and the measures for delicious beers are so subjective.  Take for instance coffee beer: it's a new invention.  Before you can measure it for "mouthfeel", as a reviewer, you first need to determine if you think coffee beer is a good idea.  But I'm getting ahead of myself.  First let me take you through my process step by step.

I started by sifting through beer review websites.  I decided to choose one that had a few parameters and had many reviews on it.  In the end, I chose https://beerconnoisseur.com/reviews.  There are about 2000 reviews there.  Judges Rating, Aroma, Appearance, Flavor, Mouthfeel, Overall Impression are the categorical impressions.  There's also a text essay which contains the official category, but this was too difficult to scrape as the critique is not written in a standard format.

 

Python/Selenium was my technological tool of choice.  I used clicking to get through the reviews, by going to the next page until all the pages were scraped and by drilling down on each individual review.  There were a couple of challenges. For one, an ad popup sometimes randomly appeared, and my code would have to click past it to continue the traversal.  Another challenge I had was that when clicking back after the individual review drill down, the website would return me to the first page of reviews, instead of the page I was up to before drilling down.  (See below for my code.)

From a business point of view, I found the scores were very inflated.  This is a website for people who love all beer.  This is perhaps reflective of beer drinkers in general.  The reviewers obviously do not intend to be very discerning.  It's PR tool more than anything else.

 

And my code and presentation can be found here.

Enjoy and let me know your feedback!

About Author

Avatar

Zipporah Polinsky-Nagel

Zipporah is pivoting to data science from her creative and energetic experience as a 15-year veteran at an international Financial Technology company. Zipporah's experience there involved: developing the trade cash flow generation and API, project managing, and leading...
View all posts by Zipporah Polinsky-Nagel >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp