Data Scraping Best Buy website to track phone rating changes

Posted on Jul 25, 2019

 

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Github link to the project

Introduction

Smart phones have become an essential part of modern life. The huge size of the smart phone market results in fierce competition between cell phone manufacturers. Cell phone companies try to roll out new phones with new features periodically to attract more buyers and win more market shares. The customer ratings are indication of the popularity of the phones, which will impact the willingness of buying for potential buyers.  Therefore, it's important to track the data on rating changes of phones.

How does the popular phones perform overtime? Does the release of new phones from other companies affect the ratings of the old phones? What types of phones are being sold in each carrier? To address these questions, I did web-scraping using Scrapy on the Best Buy website to get phone models, ratings, the date that rating was posted, and the carrier it is bounded. I scraped around 274 phones and got 257,870 phone ratings. 

Data on Popular Phones

To answer this question, I analyzed the features of phones on the list. The number of phones with certain feature on the list is a good indicator of the popularity for the feature.

As we can see from the above plots, Apple has the most phones on the list, followed by Android phone manufacturer Samsung and LG. The most popular memory size is 32GB, followed by 64GB and 128GB. Silver is the most favored color, followed by Gold, Rose Gold, Space Gray and the Black. Of the three main carriers, Sprint has the most phones on the list, followed by Verizon and AT&T. Sprint also has the most diverse phone listings. Apple phones are the main items each carrier is trying to sell. Google phones are currently exclusively sold by Verizon.

Data on Average Rating

Apple phones have accumulated the highest number of customer ratings as expected. However, if consider the number of ratings per phone, Samsung has the most ratings, which is 1261 in average, followed by Apple phones, which is around 1054.

Data on Rating Changes Over Time

Iphones have generally higher ratings compared to other brands. They maintained good ratings between 4.75 and 5. Interestingly, between April 2016 and July 2016, when Samsung phones appeared on the listing, the ratings for iphones dropped and bounced back later. 

I selected the main phones released by Apple and Samsung in recent years and tracked the rating change over time. When iphone 6s came out, it had very high ratings. Then the ratings fluctuated between 4.7 and 5. Interestingly, when Galaxy S7 came out in April 2016, it had average ratings around 4.6 and as its rating went up, the rating for iphone 6s dropped. The ratings for iphone 6s reached a low point around June 2016, but it bounced back, which is accompanied by the continued declining of Galaxy S7 ratings.

Conclusion

Apple dominates the smart phone market by having the most phones on the list for each carrier. However, in terms of customer feedback, Samsung customers tend to have more feedback than Apple's. The competition between Apple and Samsung may affect ratings of phones that released in closed window. Rising of the ratings by one phone is accompanied by the declining of the competitor's phone.  But this effect only occurred in a short period. Eventually, iphones maintain a higher ratings than their competitors. 

About Author

Jun Kui Chen

Jun obtained Ph.D. from Columbian University in Immunology. He is currently working in a Fintech start up as a Data Analyst.
View all posts by Jun Kui Chen >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI