Data Analysis of the TV Market with API Data from BestBuy

Posted on Oct 24, 2021

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Analysis of the TV Market with API Data from BestBuy's online shop

Today lots of people are holed up in their own rooms, streaming Netflix or Disney+ on their phone. But there are still many of us who watch TV shows on an actual Television. The TV market, despite existing for nearly a century, is thriving. Last year (2020) was a record year in terms of total market revenue, and is forecast to maintain a 10% annual growth rate through at least 2024. However, the market is fiercely competitive and a few brands from different countries vie for market-share dominance. There are also new market entrants from China (e.g. TCL, Hisense, Xiaomi) quickly gaining ground.

BestBuy is a specialty retailer focused on consumer electronics, and its online shop is considered one of the best platforms for buying TVs (behind Amazon and Walmart, both of which do not specialize in electronics). Using's online retail data as a proxy for today’s TV marketplace, I sought insight on the competitive landscape and customer purchasing behavior. The insight should prove useful to TV manufacturers in their quest for market-share.

BestBuy API

Raw data was obtained through's API, which contained 465 models of TVs with 34 attributes. After cleaning the data, ie. eliminating the models no longer available both in-store and online, the resulting dataset contained 387 rows. For additional detail on data munging (like determining outliers, binning, calculation of TV “bezel”) and data harmonization (extracting warranty length and converting to years), please refer to my github.

Market Landscape

Televisions are categorized into four classes of size: extra-large, large, mid, and small. Today’s TVs all utilize some form of light-emitting diode (LED) display, with about a third using ultra-high definition output technology, such as OLED or QLED. Also 91% of all TVs on the market are smart capable (meaning it has internet connectivity and support for a range of apps, such as streaming services and even internet browsers).

Pie charts showing general market proportions
Figure 1: Market Proportions
Measurement (inches) > 60”      45"-60"   32"-45" < 31"

Signs of Market Segmentation

The stacked bar charts below show, from left to right, the proportion of TVs sold on the market by each brand, the total review counts received by brand, and review counts by brand, separated by size class. Although we aren’t able to obtain actual sales figures, I’d like to use review counts as a proxy for sales, especially since BestBuy boasts that they are “verified purchase reviews.” The three leading brands, Samsung, LG, and Sony, account for over 60% of the TVs available on the market, and they also account for 60% of all sales. However, in the size segmented sales figures, there are stark differences. Those same 3 brands have over 80% of all extra-large TV sales. However, when it comes to small TVs, the big three only account for less than 25% of TV sales.

3 Stacked bar charts representing:
Brand product distribution, review counts, and review counts by size
Figure 2: product distribution, review counts, and review counts by size

TVs today come in a wide range of prices, and we can see additional signs of market segmentation through a strip plot of price and categorized by brand. The 3 leading brands mentioned previously price their products on the high end, with median price and a large portion of the inter-quartile range falling well above the median price of all TVs. On the other hand, the other remaining brands price most of their products below the median line.

Boxplot and overlaid stripplot showing TV prices
Figure 3: TV prices by brand

From both the differences in distribution of sales in each size class, and the pricing strategy divergence among different brands, we can discern a market that is segmented into two halves: (1) more expensive premium TVs of a larger size and better LED tech, and (2) non-premium TVs of mid/small size and perhaps basic LED tech.

Finding Correlates with Review Score and Review Count

Moving forward with the market analysis, we plot a correlation heatmap with review scores and review count as target attributes. Unfortunately, nothing exhibited any positive or negative relation with the target attributes, with the exception of a -0.51 coefficient between price and review score for mid-size TVs. This suggests buyers of those TVs may be price sensitive, something we will investigate further in the next section.

Correlation Heatmap Grids for different size classes
Figure 4: Correlation heatmaps by different size class

Investigating Consumer Purchasing Behavior

For each product webpage, makes a recommendation of other products called “Ultimately Bought” (UB). It's a list of the top 10 products purchased by other shoppers who also viewed the original product. Fortunately, this is offered as an endpoint on the API, so a second round of requests to the API was made with the SKU code for each TV, returning 10 additional “UB'' product SKUs.

With the additional data, some aspects of consumer buying could be obtained, namely price sensitivity and brand loyalty.

Price Sensitivity or Shopper Price “Stickiness”

The price difference between the original TV and the average price of UB TVs was calculated, and the density histogram is displayed below. 

Price differences between viewed TV and ultimately bought TV (unsegmented)
Figure 5: Price differences between viewed TV and ultimately bought TV

Most shoppers did not stray far from the original price point. However, when the market is segmented into premium and budget (non-premium) TVs, stark differences between customers can be observed. Premium TVs are represented in the figure below in yellow, and their price differences did not cluster near 0. Premium TV buyers often ultimately bought a higher priced TV.

Budget TV buyers, on the other hand, exhibited even more price “stickiness” between the viewed and UB product and the median viewed TV price was almost equal to the median UB TV price. This price sensitivity of budget buyers may explain the negative correlation between review score and price for mid-sized televisions seen previously.

Price differences between viewed TV and ultimately bought TV (segmented Premium vs. Budget)
Figure 6: Price diffs between viewed TV and ultimately bought TV (Premium vs Budget)

Customer Brand Loyalty

Of the ultimately bought TVs, a simple count of those that were of the same brand as the originating TV was tallied and graphed as a boxplot to get a sense of the degree of customer loyalty. The 3 dominant brands exhibited relatively high customer loyalty. The interquartile range for both Sony and Samsung sit above the average of all brands, while LG sits on the average and all remaining brands fall short.

Sony, Samsung, and LG price a large portion of their TV models in the premium TV range. Accordingly, the data suggests premium TV customers favor a particular brand while budget TV customers are more willing to switch during browsing.

Boxplot showing count of ultimately bought TVs that are of same brand as viewed TV
Figure 7: Count of ultimately bought TVs that are of same brand as viewed TV


The data showed obvious signs of market segmentation into premium and budget TVs. Premium TV customers are not sensitive to price, often buying at a higher price than the TV originally viewed, although they tend to stick with the brand. Manufacturers of premium televisions should, therefore, focus on marketing to boost brand recognition and improve customer perception.

Budget TV customers exhibited price “stickiness” and review scores of mid-sized televisions negatively correlated with higher price. Consequently, budget TV sellers should focus on cutting costs and price competitively.


BestBuy’s online retail shop, although comprehensive, may not represent the market as a whole. Xiaomi’s televisions are not currently sold on BestBuy also sells their own Insignia brand of TVs, which could have benefited from preferential promotions on the website. The count of reviews for a product was used as a proxy to represent sales, which may be inaccurate and biased due to possible review filtering practices by the site.

Further Analysis

As next steps to the analysis, pricing and customer purchasing data can be cross referenced with that of Amazon, Walmart, or B&H, etc., to increase the robustness of the study and decrease bias for BestBuy’s own “Insignia” TV brand. And although customer review scores did not exhibit much correlation to most attributes in this analysis, it doesn’t mean they don’t represent customer satisfaction. NLP sentiment analysis can be performed on the actual text data of the reviews.

About Author

Daniel Nie

Data Scientist with background experience in both Healthcare Administration and Finance. A versatile thinker that enjoys deep data exploration and generating business value with machine learning in both Python and R
View all posts by Daniel Nie >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI