User Data Analysis on BeerAdvocate.com

Charles Cohen

Posted on Jul 15, 2019

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

User data is one of the largest growing commodities in today's marketplace. Everytime you click, share, post or tag, that action is recorded and analyzed, bought and sold. Companies pay huge sums for this data and the insight it provides.

BeerAdvocate.com is a popular online community that catalogs beer information and user reviews/ratings. In 2017, one admin boasted the site lists nearly 300,000 individual beers. Many of these beers have between dozens and thousands of individuals reviewing them. In order to tap into this wealth of information, I built a scrapy spider and analyzed the results.

Webscraping Data

The spider worked by first exploring each style of beer, from American IPA to Wheat Beer, and then iteratively opening each beer in that style. I limited my spider to only opening beers with more than 100 ratings in an effort to increase efficiency, and focus on the bulk of user activity.

Over 3 hours later, my spider scraped data from nearly 10,000 individual beers and 1.7 million reviews. The data was stored in relationally linked csv files and processed.

Numerical Data Analysis

Examining the data, one insight that stood out was the distribution of user activity. Since BeerAdvocate is primarily a review website, I looked into how many reviews each reviewer has written.

What I found was that more than 75% of the userbase has written less than 10 reviews each, whereas there are 10 reviewers that are responsible for more than 65% of the number of reviews.

In the following graphs we can infer that while the vast majority of the userbase is relatively inactive (less than 10 reviews to their name), there is a severe minority of users that contribute the majority of the content.

User Review Distribution

Taking a closer look at the top 10 most active users monthly activity we see two things. 1) They're activity is rather stable and 2) collectively they contribute to nearly 70% of the content.

Lifetime activity of the top 10 super users as a fraction of the whole.

It's one thing to identify the extremely active user base, it's another to identify the monetary value of said user base. So I looked to see if there was any correlation to the number of reviews on a given product and that's product rating.

BA Score by Number of Reviews

We see that the vast majority of beers are rated under 1000 times, and for these the BA Score (Beer Advocates average rating) is extremely variable. However as we view products with 2000 - 4000 reviews, the Score stabilizes around 4.4.

BeerAdvocate maintains a Top 250 list of it's most popular beers by BA Score. The lowest rated beer on that list is Imperial Eclipse Stout with a score of 4.46.

So we see that the more times a beer is rated, the higher its BA Score, the greater the likelihood that beer gets on the Top 250. This in turn leads to higher visibility on the website.

Beer Advisor (A Recommender System)

Another way social media sites use user data is in building recommender systems. Similar to how Netflix suggests content to watch or Amazon suggest items to purchase, we can user BeerAdvocate user data to recommend beers to try.

I built such a recommender system employing a User-Item Collaborative Filter. This filter takes your user preferences and finds others with similar preference to you, and uses their history to suggest products.

Collaborative Filtering is a common technique due to it's simplicity and computational ease, however it tends to promote popularity bias and cannot handle obscure taste profiles. I.E. if you only reviewer extremely obscure items, it won't be able to recommend to you.

This recommender system can be used in all manner of scenarios from providing targeted advertising to intelligent sales promotion.

Future Work

There is a wealth of textual data that I omitted from analysis. Much of that info can be used to vastly improve my recommender system. Additionally, review content can be examined using Natural Language Processsing to provide insight in user favorability, polarity and product clustering.

Thank You

To examine the code that went into this project and to check out my other work please go see my github.

About Author

Charles Cohen

Charles Cohen is currently teaching at the NYC Data Science Academy. Charles studied Physical Sciences at the City College of New York and subsequently worked in research and non-profit environments. Charles is a self-motivated learner who eagerly adapts...

View all posts by Charles Cohen >

Using Data to Analyze Netflix: Are You Still Watching?

Student Works

Building a Video Game Recommendation System

Capstone

Recommender Systems' Data Impacting People Behaviors

Capstone

Improving a Music Website's User Experience

AWS

Metarecommendr: A recommendation system for video games, movies and TV shows

Cancel reply

You must be logged in to post a comment.

CBD For Dogs December 14, 2020

CBD For Dogs [...]just beneath, are a lot of completely not related sites to ours, nevertheless, they're certainly really worth going over[...]

Google September 30, 2020

Google Very handful of web-sites that take place to become in depth beneath, from our point of view are undoubtedly properly really worth checking out.

Google August 31, 2020

Google The time to read or pay a visit to the subject material or internet sites we have linked to below.

Backlink August 28, 2020

Backlink [...]here are some hyperlinks to web pages that we link to because we believe they're really worth visiting[...]

OnHax Me August 19, 2020

OnHax Me [...]Every when in a although we decide on blogs that we study. Listed beneath are the newest internet sites that we select [...]

mksorb.com August 5, 2020

mksorb.com [...]here are some hyperlinks to web sites that we link to mainly because we assume they are worth visiting[...]

mksorb.com July 30, 2020

mksorb.com [...]Every the moment inside a even though we opt for blogs that we study. Listed beneath are the newest web pages that we pick out [...]

cbd for pain July 9, 2020

cbd for pain [...]that could be the end of this report. Right here youll discover some web pages that we consider youll enjoy, just click the links over[...]

User Data Analysis on BeerAdvocate.com

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Webscraping Data

Numerical Data Analysis

Beer Advisor (A Recommender System)

Future Work

Thank You

About Author

Charles Cohen

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

User Data Analysis on BeerAdvocate.com

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Webscraping Data

Numerical Data Analysis

Beer Advisor (A Recommender System)

Future Work

Thank You

About Author

Charles Cohen

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!