E-Commerce Data Analysis of Locks and Safes
For this capstone project, we were given an opportunity to conduct an e-commerce data analysis for MasterLock. MasterLock’s main objective was to use e-commerce data analysis to design strategies that would increase their products sales. Our task was to scrape customer reviews and product/website details from e-commerce sources that sold MasterLock's products. With this data, we would create a data-driven analytics report, which in turn can help MasterLock to achieve their goals.
This project comprises an E-Commerce Data Analysis Report,
along with raw data deliverables, and a re-useable data analytics interface,
to understand the web presence of a large-scale Consumer-Packaged Goods (CPG) company.
Our data-driven approach to these business questions and goals involved the following steps: (It is important to note that the craft of data science is a cyclical process, so many of these steps were done, reiterated upon, and improved.)
Step 1: Data Mining
Working with web scraping programming tools, we built scripts to collect all relevant data from all retailers in question. This left us with over 30,000 rows of data that included MasterLock's (and its competitors) product reviews, dates, prices, product names, and meta content about the retailer website.
Step 2: Understanding the Data
The team used a natural language processing algorithms (Regular-expression filtering,Term Frequency–Inverse Document Frequency statistics, Latent Dirichlet Allocation, Word2Vec modeling, and K-Means clustering of word matrices) for estimating the aggregation of topics that reviewers were likely to write about.
Numerical and Categorical Data:
With a cleaned dataset at our disposal, we could explore the behavior of our data in a myriad of ways. A few of our key points included: Visualizations- filterable by data category, aggregations of product ratings by price, and comparisons of number of suggested items on the page that are competitor brand across websites.
Step 3: Sharing Insights
An essential step in data analytics is operationalization. We wanted our findings to be easily accessible to our clients- without anyone having to dig into any code. So we built a data processing toolkit in the form of a custom web application. The app can execute the same dataframe-filtering, aggregating, and natural language processing methods our team explored, with a user-interface. Metrics can be produced by review keyword, product name, review rating, or any feature of our data set. With an approach that ultimately seeks to generalize methods for gathering insight, we gave our clients the tools to explore the data on their own, as it suits their needs. We were able to present our own findings in a succinct and actionable manner, with a slideshow report of our findings, and Tableau presentation.
In our analysis, our goal was to come to an actionable conclusion for our E-Commerce report, so our client could walk away with an observation backed by numbers.
Understanding customer satisfaction between MasterLock and its competitors
There is a saying in China, “If you know yourself and your enemies, you will win hundreds of battles.” To achieve success, it is necessary to not just knowMasterLock’s own strengths but that of its competitors as well. Based on customers’ ratings, reviews were divided into two parts: ‘good reviews’ and ‘bad reviews’. We used the TF-IDF algorithm (Term Frequency–Inverse Document Frequency), to identify key descriptors in documents. Words having a high TF-IDF score indicates that the words not only occur frequently in the document but also provide the most information about that specific document.
From their positive reviews, we can see that the customers do trust MasterLock’s safes for their important document. They consider them to not only work great but to be well priced. In negative reviews, the key descriptors are ‘battery’, ‘keyboard’, ‘handle’, and so on. However, customers generally complain about these things about every safe brand, so it is likely that aspects like the battery, keyboard and handle are thing which need to be improved upon continually. However, one term in the negative reviews that caught our attention was the word “dent.” Dents are likely caused by a lack of care in shipping, currently handled by by Amazon. MasterLock can just pick up the phone, call Amazon and ask them to put more protective materials like foam in the boxes when they ship the safes.
Some of Barska safes are biometric. From the key descriptors in the positive reviews, we can see that customers really like this feature. However, biometric feature seems to be a double-edged sword as it was also mentioned in the negative reviews.
1.3 First Alert
From the plots, we can see that customers said a lot of good things about First Alert safes such as ‘easy to use’, ‘high quality’, and ‘exactly as described’. There are ‘mold’ and ‘moisture’ in their negative reviews, which indicates that First Alert safes may have a problem keeping valuables dry in the safes. MasterLock can improve the design to block out moisture and then advertise that their safes keep customers’ valuables safe from mold to target the customers who have suffered from the mold issue using First Alert safes.
When looking at the key descriptors from Paragon’s negative reviews, you understand what made their customers frustrated. Their safes stopped working, and they had to find a locksmith to open the safe.
The phrase ‘wooden crate’ in their positive reviews caught our eyes, so we checked it in the customers’ reviews. Customers are very positive about receiving safes that Amazon ships in wooden crates. This can be a perfect solution to the dent complaint MasterLock had in their safe reviews. MasterLock can ask Amazon to also ship their safes in wooden crates, which will improve the condition of the safes upon arrival. Another feature to note for improvement in design is what customers said about liking a backup key in case of power loss.
Customers mainly buy security safes from the GunVault to store firearms at home or in their vehicles as we can see from the key descriptors in their positive reviews. Negative reviews include the phrase ‘customer service’, indicating that customers are not happy with GunVault’s customer service. MasterLock should keep their customer service at a high level which will increase customers’ satisfaction with their brand, leading to increased sales.
MasterLock can learn the strengths of their competitors’ safes from the key descriptors in the positive reviews so that they can add those strengths and features to their own safes. At the same time, they can also check if they have the same issues their competitors have. When advertising, MasterLock can target customers who suffered from the problems caused by their competitors' safes. That’s how you can use data science to beat your competitors.
2. Know what customers’ concerns are
In addition to reviews, we also scraped customers’ questions asked on Amazon about MasterLock’s safes. Using Latent Dirichlet Allocation, a model for uncovering latent topics from all documents in natural language processing, we identified the five most important topics for customers of MasterLock’s safes on Amazon: size, combination, battery, mount, and fire. From these topics, we can conclude that customers like to ask questions such as:
- What is the size of the safe?
- How to set the combination?
- How often does the battery need to be charged?
- Can the safe be mounted to the wall?
- Is the safe fireproof?
Using this information, we can adjust the product page on Amazon or even other websites. If a product does not have information to these question, MasterLock should add them to the description. Even if a product does have information to these question, we think it would be better to highlight them so that customers can easily find the topics of interest.
With a data science approach, we were able to offer our client a quantifiable understanding of their product's baseline across retailer websites, give our clients an interface to understand the data according to their needs, and share impactful insights gathered with natural language processing, and statistical modeling.
Thank you for reading this post. Please feel free to reach out to any of our team members.