Web Scraping Amazon Monitor product Data

Posted on Jan 10, 2023

As the digital world continues to expand, more and more businesses are using Amazon as a platform to advertise and sell their products. With the rise of online shopping, it is becoming increasingly important for businesses to be able to track their product information on Amazon. One way to do this is to use a web scraper. One product with many metrics consumers and sellers look at are computer monitors. For this project I will scrape computer monitor features and parse review words to compare what consumers are looking for a in a monitor.

Selenium is an open source web automation tool that allows you to write scripts to automate web browser activities. It's a great tool for scraping Amazon web pages as it can easily interact with the web elements and extract the required data. To build a web scraper using Selenium in Python to scrape product information for computer monitors from Amazon we start off by installing selenium and use an appropriate web driver for the browser that you want to use. I  used a Chrome browser. Selenium will open a chrome browser of different monitors collecting information from each page. We will produce an output file that store the parsed data into a  working dataset.

200 products were used for this dataset.  From this working dataset we're able to collect data features such as ratings, price, display size, refresh rate, height adjustability, and more. Collecting  review data, we can conisder 3 stars and  up as positive reviews while 1 and 2 star reviews are negative reviews.  We create heat maps to layout which product features and key words in reviews have collinearity with higher reviewed products.

Some key takeaways are monitors with larger screens and more adjustable stands often received higher ratings than those with smaller screens. Features such as a curved screen or monitor speakers are not indicating of achieving higher scores. When it came to reviews,  positive phrases in reviews such as "HDMI", "picture quality", and "resolution" had high correlation to better received reviews.

Future improvements to this project include comparing different brands to generate insight on where a manufacture lacks in comparison to its rivals or to target its strengths. Some improved to the scraper itself could be redefining which features would be useful for shoppers. A larger dataset could be used in the future to strengthen or disprove the results. Another improvement to this project could be scraping for specific features such as HDMI ports, USB C ports, mounting abilities, and panel type.

About Author


Hey everyone, my name is Erjon Brucaj and I originally studied chemical engineer. I worked as an engineer for a few years before decided it wasn't the career path for me. I've always enjoyed developing solutions to problems...
View all posts by Erjon >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI