Posted on Apr 9, 2021


Data science presents great opportunities for individual businesses to increase their profits. Among them, customer data is becoming extremely appealing to companies. In particular, customer segmentation allows companies to target their customers in a more customized way. 

This is an exploratory data visualization project that aims to unravel gain opportunities for a chain of supermarkets. The Company was losing revenue and in need of a strategic review. The dataset includes 1,000 observations.

Exploratory data visualization

The results from the customer segmentation analysis are displayed in a shiny app that the user can navigate. The analysis reveals some important insights that can inform the company’s marketing strategy.

Below is a graph that visualizes the current state of the Company’s sales in its three locations.

Figure 1: Total sales per city

The first tab, "Location", is shown in Figure 1. As we can see, there are differences in sales between the three supermarkets. The branch located in Naypyitaw is the most successful. However, these differences are not big and suggest that location may not be one of the variables having an important impact on the Company´s performance.

Figure 2: Average sales per weekday and hour

Sales are distributed over weekdays and hours as expected. We can see there are more on Mondays and Fridays than on the other days of the week.. Time of day also follows a pattern, as sales are concentrated between 12:00am and 15:00pm.

Figure 3: Total sales per type of product

The distribution of sales by type of product shows that there are not huge differences in total sales within them. However, there are minor ones. "Food and beverages," in particular,  stands as the biggest contributor to total sales, while "Health and beauty" is the least important one.

Figure 4: Total sales per gender

Overall, women represent a slightly bigger proportion of total sales than men. We can try to discern if this is the case for every type of product or if there are differences between them. 

Figure 5: Total sales per type of product and gender

As shown in Figure 5, women buy more than men across most product types. "Food and beverages" is a category in which women clearly spend more in comparison to men. Interestingly, the biggest gap in consumption happens in "Health and beauty", where men account for  the biggest part of total sales. Both genders are close to  even for "Electronic accessories" sales. These results suggest that targeting men and women for different specific types of products could increase sales in particular categories.

Figure 6: Correlation between total sales and customer rating

Customer rating is a measure of a customer´s satisfaction summarized in a single score. The Figure above shows a very weak correlation between sales and customer ratings. Targeting highly satisfied customers can boost sales outcomes. A way to do this is by improving the Company´s membership program, as we proceed to analyze below.

Figure 7: Total sales by membership

As shown in Figure 7, the differences in sales between members and non-members is very small; members spend on average 327.79 while non-members spend 318.12 . We can take a deeper look to provide a more nuanced analysis.

Figure 8: Membership by gender

As pointed out before, women represent a slightly higher amount of sales than men. Therefore, women could be a better target for the membership program, especially given their larger spend on  the "Food and beverages" category. We already see a higher rate of participation among women in the membership program, 52%, that could be increased.

Figure 9: Total sales per type of product and membership

In Figure 9, we break down total sales by type of product for members and non-members. Participants in the membership program spend more on products that are bought in a more recurrent way. The biggest positive difference is in "Food and beverages," and the biggest negative difference is in "Electronic accessories." Figure 7 showed only slightly higher total sales by members due to the negative correlation with total sales in "Electronic accessories," which is probably not causal.

Figure 10: Membership by customer rating

As we can see in Figure 10, customer satisfaction does not seem to affect participation. This should be also considered in the light of the weak correlation between customer rating and total sales shown in Figure 6, which together suggests there is a pool of satisfied customers that could be targeted.


We provide evidence to support possible strategic changes for a loss making supermarket. These changes imply targeting specific segments of customers (men and women, by type of product, satisfied customers, members of the loyalty program) in a customized way.

The Shiny app discussed in this blog post together with the relevant code and data can be found here.

About Author

Guillermo Ruiz

Data Science Professional and Economist with a demonstrated history of data analysis and machine learning modeling with a focus on storytelling with data. Passionate about helping companies to gather and analyze data to make more informed decisions to...
View all posts by Guillermo Ruiz >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp