Market Basket Analysis - Instacart Dataset
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. How companies like Instacart boost their sales by predicting products that their customers may purchase next. Instacart, a grocery ordering and delivery app, allows users to place grocery orders through their website or app which are then fulfilled by a personal shopper .In 2017 ,Instacart open-sourced 3 million grocery orders. This anonymized dataset contains a sample of over 3 million grocery orders from more than 200,00 Instacart users. Currently they use transactional data to develop models that predict which products a user will buy again, try for the first time, or add to their cart next during a session. The users are anonymized. There’s no demographics data — no gender, age. There is a field for the week and hour of day the order was placed, and a relative measure of time between orders. The dataset is a relational set of files describing customers’ orders over time. The goal of this analysis is to examine variables of customer buying patterns before making the inferential analysis. To maintain the speed and efficiency of executing the code, I’m using only 10% of the data.
First, let’s understand the data
The dataset has a set of relational files. There are six data tables at total in the format of CSV which are;
"aisles, departments, order_products_prior, order_products_train, orders and products"
The datatypes for my exploratory analysis are numerical and string data types. They are as expected and do not require us to change to different data types. There are over 5 % missing values in this dataset on the day since the last order. However, as it is explained in the description of the variables, NA represents the order_number 1 of that particular customer. The first five rows of each of the csv files I’m using for my analysis are as follows:
Exploratory Data Analysis
To understand the buying patterns of Instacart’s customers, exploring each variable is a crucial component of our analysis as they serve the purpose of getting an overall view of the data.
What day people place most orders ?
There are significantly more orders on days 0 and 1. The dataset does not clearly mention that day 0 = Saturday or Sunday. There is no information regarding which values represent which day of the week. However, we can assume that this is the weekend as customers mostly make their weekly grocery shopping on the first and second day of the week. There isn't a huge gap between the other days of the week either.
What hour people place most orders?
The volume of orders increases between early morning till 4 pm. This insight can help the company have more shoppers available during this time period. Additionally, this will further help to make sure the website and the app does not have usability issues.
What part of the day people order most?
There is a common pattern across all the days for each part of the day. The distribution is mostly similar.
How many products people usually order?
From the right skewed distribution, we can observe that people usually order around 5-8 products. What could be the reason that customers order so low amounts of orders? Instacart can look into ways various ways to increase the amount of orders by fulfilling most of the grocery needs of their customers.
Reordered or not & repeat items?
Looking at the above plot, we can observe that around 60% are reordered and 40 % are first time orders. The table shows that dairy products make the top 10 repeat products from the re-orders. This insight is important for Instacart to understand what actions they can take to increase the percentage of reorders. This can be further leveraged to predict what will be the next product in the customer’s cart.
What are the best selling products - Top 10
It is interesting to note that Instacart’s top selling products are fresh fruits and vegetables. Products from other aisles and departments did not make it to the top 10 and customers preferred more organic produce. There are almost 8000 products that are ordered once only. What could be the plausible reason behind such low count. Are they highly marked-up then the in-store prices?
What are the popular department and aisle name?
From the above plot, we can observe that certain departments are clearly more popular. Produce department contributed to much higher sales than the rest of the aisles.
Exploring each variable in the dataset for the descriptive analysis has laid the foundation for the in-depth analysis to understand customers’ purchasing patterns. This process is crucial for the business understanding. In our instacart analysis, we can summarize our insights and further actions that could be recommended for better customer engagement and profitability. Since the number of products per order mostly stays in the range of 4-8, there is a huge room of improvement. To encourage customers to add more to their carts, Instacart can recommend related products that are already in their baskets or from their past purchase/order items. Either Instacart customers purchase weekly or monthly. How could they improve customer loyalty? Ensuring the website is intuitive and easy to use is imperative to making sure customers complete their transaction and return to shop again. Extracting these insights and knowing which items are most frequently purchased is the first step for Instacart to optimize its software product and recommend items for customers while they shop.