Sentiment-Enhanced Product Recommendation System for E-Commerce
Overview
This project leverages advanced Natural Language Processing (NLP) techniques to analyze customer sentiment in e-commerce product reviews. By extracting emotional context from user-generated content, the system creates a much improved recommendation engine that goes beyond traditional rating systems. The implementation compares two leading sentiment analysis approaches—RoBERTa (an optimized BERT variant developed by Facebook AI in 2019) and VADER—to determine which provides more accurate and nuanced understanding of customer opinions.
Data
The analysis utilizes a comprehensive Flipkart e-commerce dataset containing thousands of product reviews across diverse categories. Each record includes detailed product information (product name, product price) alongside customer ratings, reviews and summaries, providing conducive context for sentiment extraction (opinion mining) and product evaluation.
Methodology
The project implements a modular architecture with dedicated classes for:
- Data loading, validation and preprocessing
- Sentiment extraction using RoBERTa transformer models
- Lexicon-based sentiment analysis with VADER
- Comparative performance evaluation between approaches
- Visualization of relationships between price points, customer ratings, and sentiment scores
- Product ranking based on sentiment polarity metrics
The system extracts sentiment from both brief reviews and concise summaries (typically one or two sentences long), offering multi-dimensional analysis of customer opinions.
Discussion
The word cloud map for our customers summary review is filled with mostly positive words like "nice," "excellent," etc. as shown below:
We converted the 5-point numerical rating scale to categorical labels, with 1 representing "Very Poor" and 5 representing "Excellent". As shown in the figure below, our analysis revealed that 75% of reviews fell into the "Good" or "Excellent" categories, indicating overall positive customer satisfaction:
Drawing on the extrapolated rankings and additional qualitative ratings, we explored the following relationships:
- Correlation patterns between product price and sentiment polarity
- Comparative performance analysis of transformer-based (RoBERTa) vs. lexicon-based (VADER) sentiment extraction
- Identification of sentiment-price-rating relationships that can inform pricing strategies
- Optimized recommendation algorithms that prioritize products with consistently positive sentiment patterns
The comparison shows distinct patterns between both sentiment analysis models: RoBERTa displays more cautious sentiment assignment with higher neutral classifications (43.5% for reviews vs. VADER's 39.4%), while VADER tends to assign more positive sentiment (53.4% for reviews and 75.8% for summaries compared to RoBERTa's 46.9% and 69.4%). Though RoBERTa likely offers superior contextual understanding and nuance detection, it required approximately 1000x longer processing time than VADER's efficient lexicon-based approach. For a definitive accuracy assessment, human-annotated ground truth labels would be necessary to evaluate which model better captures authentic sentiment, particularly in cases where contextual understanding significantly impacts interpretation.
We examined potential relationships between review sentiment and product prices. The correlation heatmap below visualizes the strength of associations between sentiment scores (from both review types) and product variables such as price:
Interestingly, the heatmap reveals only weak correlations between review sentiment and product price. This suggests that customer satisfaction, as measured through sentiment analysis, is not strongly directly related to how expensive a product is. That value perception likely depends on multiple factors beyond the price point.
The top 10 recommended products based on our algorithm:
Product Name | Price (USD) |
---|---|
IFB Neptune VX Free Standing 12 Place Settings Dishwasher | $423.39 |
IFB Neptune SX1 Free Standing 15 Place Settings Dishwasher | $489.39 |
Voltas Beko DF14W Free Standing 14 Place Settings Dishwasher | $347.49 |
TP-Link TL-WA850RE(IN) 300 Mbps WiFi Range Extender | $16.16 |
Hold up Triangle Shape Mobile Holder For Table | $1.50 |
Airtel Regular Digital TV DTH Remote Compatible | $2.37 |
KENT 16079 - Wet Grinder (White) | $54.99 |
Google Chromecast 3 Media Streaming Device | $29.16 |
Google Nest Hub (2nd gen), Display with Google Assistant | $76.99 |
Butterfly Rapid Plus Wet Grinder with Coconut Scraper | $46.19 |
Applications
This sentiment-enhanced recommendation system offers significant advantages for e-commerce platforms. By analyzing emotional context within reviews rather than just star ratings, it connects customers with products that truly meet their expectations.
The system directly addresses key business challenges: increasing conversion rates by recommending genuinely satisfying products, and reducing costly returns by identifying potential issues that ratings alone might miss. Recent trends like "frequently returned item" labels on major platforms demonstrate the industry's recognition of this challenge.
Additionally, the sentiment analysis provides valuable feedback to vendors and manufacturers for product improvement, ultimately creating a more satisfying shopping experience that builds customer loyalty in the competitive e-commerce landscape.
Technologies
Python, PyTorch, Transformers (BERT, RoBERTa), NLTK, Pandas, Matplotlib, Seaborn and HuggingFace