Web Scraping Amazon Monitor product Data
As the digital world continues to expand, more and more businesses are using Amazon as a platform to advertise and sell their products. With the rise of online shopping, it is becoming increasingly important for businesses to be able to track their product information on Amazon. One way to do this is to use a web scraper. One product with many metrics consumers and sellers look at are computer monitors. For this project I will scrape computer monitor features and parse review words to compare what consumers are looking for a in a monitor.
Selenium is an open source web automation tool that allows you to write scripts to automate web browser activities. It's a great tool for scraping Amazon web pages as it can easily interact with the web elements and extract the required data. To build a web scraper using Selenium in Python to scrape product information for computer monitors from Amazon we start off by installing selenium and use an appropriate web driver for the browser that you want to use. I used a Chrome browser. Selenium will open a chrome browser of different monitors collecting information from each page. We will produce an output file that store the parsed data into a working dataset.
200 products were used for this dataset. From this working dataset we're able to collect data features such as ratings, price, display size, refresh rate, height adjustability, and more. Collecting review data, we can conisder 3 stars and up as positive reviews while 1 and 2 star reviews are negative reviews. We create heat maps to layout which product features and key words in reviews have collinearity with higher reviewed products.
Some key takeaways are monitors with larger screens and more adjustable stands often received higher ratings than those with smaller screens. Features such as a curved screen or monitor speakers are not indicating of achieving higher scores. When it came to reviews, positive phrases in reviews such as "HDMI", "picture quality", and "resolution" had high correlation to better received reviews.
Future improvements to this project include comparing different brands to generate insight on where a manufacture lacks in comparison to its rivals or to target its strengths. Some improved to the scraper itself could be redefining which features would be useful for shoppers. A larger dataset could be used in the future to strengthen or disprove the results. Another improvement to this project could be scraping for specific features such as HDMI ports, USB C ports, mounting abilities, and panel type.