E-Commerce Data Analysis of Locks and Safes

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


For this capstone project, we were given an opportunity to conduct an e-commerce data analysis for MasterLock. MasterLock’s main objective was to use e-commerce data analysis to design strategies that would increase their products sales. Our task was to scrape customer reviews and product/website details from e-commerce sources that sold MasterLock's products. With this data, we would create a data-driven analytics report, which in turn can help MasterLock to achieve their goals.

 Project Overview:

This project comprises an E-Commerce Data Analysis Report,
along with raw data deliverables, and a re-useable data analytics interface,
to understand the web presence of a large-scale Consumer-Packaged Goods (CPG) company.

Project Solution:

Our data-driven approach to these business questions and goals involved the following steps: (It is important to note that the craft of data science is a cyclical process, so many of these steps were done, reiterated upon, and improved.)

Step 1: Data Mining

Working with web scraping programming tools, we built scripts to collect all relevant data from all retailers in question. This left us with over 30,000 rows of data that included MasterLock's (and its competitors) product reviews, dates, prices, product names, and meta content about the retailer website.

Step 2: Understanding the Data

Text Data:

The team used a natural language processing algorithms (Regular-expression filtering,Term Frequency–Inverse Document Frequency statistics, Latent Dirichlet Allocation, Word2Vec modeling, and K-Means clustering of word matrices) for estimating the aggregation of topics that reviewers were likely to write about.

E-Commerce Data Analysis of Locks and Safes

We were able to partition groups of reviews into clusters. This particular cluster suggests a positive review of locks as used for school lockers.


Numerical and Categorical Data:

With a cleaned dataset at our disposal, we could explore the behavior of our data in a myriad of ways. A few of our key points included: Visualizations- filterable by data category, aggregations of product ratings by price, and comparisons of number of suggested items on the page that are competitor brand across websites.

E-Commerce Data Analysis of Locks and Safes

Step 3: Sharing Data Insights

An essential step in data analytics is operationalization. We wanted our findings to be easily accessible to our clients- without anyone having to dig into any code. So we built a data processing toolkit in the form of a custom web application. The app can execute the same dataframe-filtering, aggregating, and natural language processing methods our team explored, with a user-interface.

Metrics can be produced by review keyword, product name, review rating, or any feature of our data set. With an approach that ultimately seeks to generalize methods for gathering insight, we gave our clients the tools to explore the data on their own, as it suits their needs. We were able to present our own findings in a succinct and actionable manner, with a slideshow report of our findings, and Tableau presentation.

E-Commerce Data Analysis of Locks and Safes

An example of our app searching reviews by key words.

Actionable Conclusions

In our analysis, our goal was to come to an actionable conclusion for our E-Commerce report, so our client could walk away with an observation backed by numbers.

Understanding customer satisfaction between MasterLock and its competitors

There is a saying in China, “If you know yourself and your enemies, you will win hundreds of battles.” To achieve success, it is necessary to not just knowMasterLock’s own strengths but that of its competitors as well. Based on customers’ ratings, reviews were divided into two parts: ‘good reviews’ and ‘bad reviews’. We used the TF-IDF algorithm (Term Frequency–Inverse Document Frequency), to identify key descriptors in documents. Words having a high TF-IDF score indicates that the words not only occur frequently in the document but also provide the most information about that specific document.

1.1 MasterLock

From their positive reviews, we can see that the customers do trust MasterLock’s safes for their important document. They consider them to not only work great but to be well priced. In negative reviews, the key descriptors are ‘battery’, ‘keyboard’, ‘handle’, and so on. However, customers generally complain about these things about every safe brand, so it is likely that aspects like the battery, keyboard and handle are thing which need to be improved upon continually.

However, one term in the negative reviews that caught our attention was the word “dent.”  Dents are likely caused by a lack of care in shipping, currently handled by by Amazon. MasterLock can just pick up the phone, call Amazon and ask them to put more protective materials like foam in the boxes when they ship the safes.

1.2 Barska

Some of Barska safes are biometric. From the key descriptors in the positive reviews, we can see that customers really like this feature. However, biometric feature seems to be a double-edged sword as it was also mentioned in the negative reviews.

1.3 First Alert

From the plots, we can see that customers said a lot of good things about First Alert safes such as ‘easy to use’, ‘high quality’, and ‘exactly as described’. There are ‘mold’ and ‘moisture’ in their negative reviews, which indicates that First Alert safes may have a problem keeping valuables dry in the safes. MasterLock can improve the design to block out moisture and then advertise that their safes keep customers’ valuables safe from mold to target the customers who have suffered from the mold issue using First Alert safes.

1.4 Paragon

When looking at the key descriptors from Paragon’s negative reviews, you understand what made their customers frustrated. Their safes stopped working, and they had to find a locksmith to open the safe.

1.5 AmazonBasics

The phrase ‘wooden crate’ in their positive reviews caught our eyes, so we checked it in the customers’ reviews. Customers are very positive about receiving safes that Amazon ships in wooden crates. This can be a perfect solution to the dent complaint MasterLock had in their safe reviews. MasterLock can ask Amazon to also ship their safes in wooden crates, which will improve the condition of the safes upon arrival. Another feature to note for improvement in design is what customers said about liking a backup key in case of power loss.

1.6 GunVault

Customers mainly buy security safes from the GunVault to store firearms at home or in their vehicles as we can see from the key descriptors in their positive reviews. Negative reviews include the phrase ‘customer service’, indicating that customers are not happy with GunVault’s customer service. MasterLock should keep their customer service at a high level which will increase customers’ satisfaction with their brand, leading to increased sales.

1.7 Results

MasterLock can learn the strengths of their competitors’ safes from the key descriptors in the positive reviews so that they can add those strengths and features to their own safes. At the same time, they can also check if they have the same issues their competitors have. When advertising, MasterLock can target customers who suffered from the problems caused by their competitors' safes. That’s how you can use data science to beat your competitors.

2. Know what customers’ concerns are

In addition to reviews, we also scraped customers’ questions asked on Amazon about MasterLock’s safes. Using Latent Dirichlet Allocation, a model for uncovering latent topics from all documents in natural language processing, we identified the five most important topics for customers of MasterLock’s safes on Amazon: size, combination, battery, mount, and fire. From these topics, we can conclude that customers like to ask questions such as:

  1. What is the size of the safe?
  2. How to set the combination?
  3. How often does the battery need to be charged?
  4. Can the safe be mounted to the wall?
  5. Is the safe fireproof?

Using this information, we can adjust the product page on Amazon or even other websites. If a product does not have information to these question, MasterLock should add them to the description. Even if a product does have information to these question, we think it would be better to highlight them so that customers can easily find the topics of interest.


With a data science approach, we were able to offer our client a quantifiable understanding of their product's baseline across retailer websites, give our clients an interface to understand the data according to their needs, and share impactful insights gathered with natural language processing, and statistical modeling.

Thank you for reading this post. Please feel free to reach out to any of our team members.

About Authors

Dean Goldman

Dean Goldman is based in New York City. He is a creative thinker with experience in web programming, data science, and design. Seeking to apply skills in problem solving, coding, and data analytics.
View all posts by Dean Goldman >

Daniel (Donghyun) Kang

Daniel (Donghyun) got a Ph. D. in Electronic Engineering (Wireless Communication Systems) from Sungkyunkwan University, South Korea. Since 2002, He has served as a wireless communication system design engineer for Samsung Electronics, where he has been recognized for...
View all posts by Daniel (Donghyun) Kang >

Tyler Williams

Tyler graduated from Bowdoin College with a degree in Mathematics and a concentration in Probability and Statistics. After college, he ran a trading book for two years at Trillium Management, LLC, a proprietary trading firm specializing in equities....
View all posts by Tyler Williams >

Neha Chanu

Ms. Chanu, 2017 Hesselbein Student Leader Fellow, was one of 50 selected from more than 800 student leader nominees from around the world. She is an honors graduate of the University of Pittsburgh and the Cornell Pre-Law Summer...
View all posts by Neha Chanu >

Fatima Hamdan

Fatima got her bachelor's degree in Computer Engineering from Lebanese American University. She was chosen as one of the 24 women in engineering change makers from all over the world to attend the Women in Engineering conference in...
View all posts by Fatima Hamdan >

Zhe Yang

Hi, My name is Zhe Yang. I got my master degree in Financial Analyst at Rutgers University. I love challenges and solving difficult problems. I used to be a trader in the T3 trading company. During I worked...
View all posts by Zhe Yang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI