DATA STUDY ON RETAIL SALES AND CUSTOMER SEGMENTATION

Posted on Apr 9, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data science presents great opportunities for individual businesses to increase their profits. Among them, customer data is becoming extremely appealing to companies. In particular, customer segmentation allows companies to target their customers in a more customized way. 

This is an exploratory data visualization project that aims to unravel gain opportunities for a chain of supermarkets. The Company was losing revenue and in need of a strategic review. The dataset includes 1,000 observations.

Exploratory data visualization

The results from the customer segmentation analysis are displayed in a shiny app that the user can navigate. The analysis reveals some important insights that can inform the company’s marketing strategy.

Below is a graph that visualizes the current state of the Company’s sales in its three locations.

Figure 1: Total sales per city

DATA STUDY ON RETAIL SALES AND CUSTOMER SEGMENTATION

The first tab, "Location", is shown in Figure 1. As we can see, there are differences in sales between the three supermarkets. The branch located in Naypyitaw is the most successful. However, these differences are not big and suggest that location may not be one of the variables having an important impact on the Company´s performance.

Figure 2: Average sales per weekday and hour

Sales are distributed over weekdays and hours as expected. We can see there are more on Mondays and Fridays than on the other days of the week.. Time of day also follows a pattern, as sales are concentrated between 12:00am and 15:00pm.

Figure 3: Total sales per type of product

DATA STUDY ON RETAIL SALES AND CUSTOMER SEGMENTATION

The distribution of sales by type of product shows that there are not huge differences in total sales within them. However, there are minor ones. "Food and beverages," in particular,  stands as the biggest contributor to total sales, while "Health and beauty" is the least important one.

Figure 4: Total sales per gender

DATA STUDY ON RETAIL SALES AND CUSTOMER SEGMENTATION

Overall, women represent a slightly bigger proportion of total sales than men. We can try to discern if this is the case for every type of product or if there are differences between them. 

Figure 5: Total sales per type of product and gender

As shown in Figure 5, women buy more than men across most product types. "Food and beverages" is a category in which women clearly spend more in comparison to men. Interestingly, the biggest gap in consumption happens in "Health and beauty", where men account for  the biggest part of total sales. Both genders are close to  even for "Electronic accessories" sales. These results suggest that targeting men and women for different specific types of products could increase sales in particular categories.

Figure 6: Correlation between total sales and customer rating

Customer rating is a measure of a customer´s satisfaction summarized in a single score. The Figure above shows a very weak correlation between sales and customer ratings. Targeting highly satisfied customers can boost sales outcomes. A way to do this is by improving the Company´s membership program, as we proceed to analyze below.

Figure 7: Total sales by membership

As shown in Figure 7, the differences in sales between members and non-members is very small; members spend on average 327.79 while non-members spend 318.12 . We can take a deeper look to provide a more nuanced analysis.

Figure 8: Membership by gender

As pointed out before, women represent a slightly higher amount of sales than men. Therefore, women could be a better target for the membership program, especially given their larger spend on  the "Food and beverages" category. We already see a higher rate of participation among women in the membership program, 52%, that could be increased.

Figure 9: Total sales per type of product and membership

In Figure 9, we break down total sales by type of product for members and non-members. Participants in the membership program spend more on products that are bought in a more recurrent way. The biggest positive difference is in "Food and beverages," and the biggest negative difference is in "Electronic accessories." Figure 7 showed only slightly higher total sales by members due to the negative correlation with total sales in "Electronic accessories," which is probably not causal.

Figure 10: Membership by customer rating

As we can see in Figure 10, customer satisfaction does not seem to affect participation. This should be also considered in the light of the weak correlation between customer rating and total sales shown in Figure 6, which together suggests there is a pool of satisfied customers that could be targeted.

Conclusion

We provide evidence to support possible strategic changes for a loss making supermarket. These changes imply targeting specific segments of customers (men and women, by type of product, satisfied customers, members of the loyalty program) in a customized way.

The Shiny app discussed in this blog post together with the relevant code and data can be found here.

About Author

Guillermo Ruiz

Data Science Professional and Economist with a demonstrated history of data analysis and machine learning modeling with a focus on storytelling with data. Passionate about helping companies to gather and analyze data to make more informed decisions to...
View all posts by Guillermo Ruiz >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI