A Comedy of Errors: Or how to Identify Pattern Issues Using Selenium
The original purpose of this project was a simple demonstration of scraping a website using scrapy in Python. What this eventually became is a demonstration of how to try and recreate the scrapy shell using Selenium. You can find the shiny app created from the data here and you can find the code for the shiny app and the python scraping here.
The data for this project came from two separate websites. The first website I scraped was beerconnoisseur.com. An example page can be seen in the photo above. Some of the variables that can be found on each webpage are.
- Type of beer
- Where the brewer is located
- Description of the Beer
- Scores for the beer
- A review by a professional
However, while the website is very uniform in how it was coded, it did not provide the level of information I was looking for.
The second website I scraped is beerandbrewing.com. This website provided much more information including:
- Descriptions of the beers provided by the brewer.
- Individual reviews for the aroma, the flavor, and the overall score.
The purpose of the data was to create an app that would filter beers based on user preferences, until users were given a list of beers that they could look through and consider buying for themselves.
The app contains two tabs: The about me tab and the Beer Table. The beer table is an interactive table of all the beer reviews scraped from beerandbrewing.com. If you are interested in a particular brewer or a particular style of beer you can use the search box and type in your query. This will filter the entire table for you. In the photo above after searching for "Pale Ale" the table reduced from 1000 different beers to 78. To the right of the table, there is a box with three different tabs which contain different information. The first tab, "Description" contains the ABV and the IBU (measures for how much alcohol is in the beer and the level of bitterness of the beer) as well as the brewers description of the beer.
The second tab gives you the panel's reviews on the aroma, the flavor, and overall. The last tab is just simply a picture of the beer itself for those individuals who are curious what the label looks like if they want to go searching for it in their local grocery store.