The Business Case to Scale CatskillProvisions.com
BACKGROUND:
CatskillProvisions.comย
Catskill Provisions is a tale of two businesses. ย The first being a wholesaler who sells product like honey, honey based whiskey, NY maple syrup, etc. directly to restaurants and liquor stores. ย The business was first conceived through the love of bee-keeping.
The bulk of the revenue generated for the company is through the wholesale business. ย The ecommerce website, CatskillProvisions.com, generates little revenue as compared to wholesale. ย However, the ecommerce website has been a branding engine and online sales presenceย for the company since inception.
CatskillProvisions.com sells food specialty items such as honey, truffles, marinades, gift sets and more. ย Organic honey was the first item sold online in 2010. ย Honey continues to be the predominant product soldย through the site.
Given its position, CatskillProvisions.comย could generate revenue in much greater proportion to the larger wholesale business. ย With a general understanding of the customer base and some underlying investment, CatskillProvisions.com is positioned for growth.
ECOMMERCE CUSTOMER PROFILE
Reviewing the dataset and conducting simple EDA, the average customer who purchases from CatskillProvisions.com hasย the following profile:
- Female, 76%
- Lives in the North Eastern part of the US, 67%
- One-time purchaser of products, 85%
- Shops Tuesdays and Thursdays (from work)
- Purchased Honey, Truffles and Gift Sets
- Used email domain: Gmail, Yahoo or work domain
WEB STORE CUMULATIVE PRODUCT SALES
The following bar chart not only represents the products purchased through the ecommerce site butย the basket of goods consumers purchased at one time. ย For example, if a consumerย purchased honey, then a secondary purchase was truffles or marinade and so on.
Roughly 20% of all ecommerce consumers are repeat purchasers. ย Such a high percentage of repeat purchasersย illustrates the quality of the products and the customer service provided. ย However the fundamental issue for the ecommerce site is web traffic. ย Traffic and sales go hand-in-hand. ย Yet, traffic has declined over time asย defined in the chart below.
TOP SEO SEARCHย TERMS
Looking further into the traffic issue, the SEO or search engine optimization search terms seem common at best and do not differentiate the ecommerce website from any other website. ย As well, the website is not engaged in an activeย SEM, search engine marketing program to drive traffic or promote product. ย The lack of promotion is the fundamental issue to the traffic issue and subsequently product sales.
Top SEO terms are represented in the following word cloud.
DATASET
The CatskillProvisions.com data is aย transactional data. ย With feature engineering combining traffic data and transactional data, the dataset was expanded further for machine learning purposes. ย Key featuresย included:
- Transaction date
- Transaction day
- Customer information
- Shipping/Billing information
- Repeat Purchased
- Traffic by day
- Purchase total
- Sales
- Sales to Web Visit Conversion
CORRELATION
Given the nature of the data and the small size of the dataset, despite additional feature engineering, the data remains highly correlated. ย When conducting machine learning, the models illustrated the structure of segmentation by identifying top features throughย training output despite correlation.
Predictive features from each model showed great promise. ย Those predictive features identified were shipping region, repeat purchasers and gender. ย Despite this progress, more information is needed to fully test out the models to understand their predictive quality and leverage for the website.
MACHINE LEARNING
CatskillProvisions.com FEATURE: SHIPPING REGION
Testing out the feature shipping region using software Dataiku, a logistic regression model showed promise predicting the feature with an ROC AUC score of .76 and accuracy of .69. ย This model scored higher than others such as SVM, Random forest and XGBoost among others. ย Reviewing the features for shipping region, it was fairly easy to see why the model ranked as well as it did. ย The model categorized the regions given the high inference of the feature. ย Following is a chart quantifying purchasers by shipping region.
Clearly the Eastern section of the US predominates the dataset where purchases and shipping originate enabling the model to accurately categorize this feature. ย A density chart further illustrates the predictive quality of the model feature.
The ROC AUC chart shows solid prediction of the shipping region. ย However with a larger data set, the curve would likely be smooth demonstrating the strength of its predictive accuracy.
CatskillProvisions.com FEATURE: REPEAT PURCHASES
Analyzing the feature repeat purchases, two models showed robustย accurately categorizing and predicting the feature. ย Those models were Lasso Regression and XGBoost. ย Both had similar R2s while also reportingย high correlation. ย However the Lasso regression model showed better results when reviewing overall model output.
Reviewing the model errors for normal distribution, the Lasso error distribution is illustrated as follows:
The errors for the Lasso model fall close to zero but are highly clustered with a non-normal distribution. ย Again, the distribution indicates correlation.
XGBoost
The XGBoost distribution looks a little better than the Lasso distribution but still demonstrates correlation as well with a non-normal distribution.
CatskillProvisions.com FEATURE: GENDER
The model categorizing gender as a predominate feature is a SVM model with an ROC AUC score of .87 and lift of 2.10. ย Like the previousย three models, the SVM model categorized gender as a feature outranking all other possible models. ย With an accuracy of .84 and a dataset primarily female, it is no surprise that the SVM model popped as the strongest predictor and best classifier. ย A chart of ecommerce purchasers follows (primarily female):
GENDER FEATURE
The model density curve illustratingย the SVM model's ability to predict male versus female is highlighted below:
Gender SVM Lift chart further substantiating gender prediction.
SVM ROC AUC chart for gender demonstrating the model's ability to accurately predict gender. ย With a bigger data set, the curve would show a smooth prediction curve.
BUSINESS RECOMMENDATIONS FOR CatskillProvisions.com
Given the various machine learning on the dataset CatskillProvisions.com, the machine learning highlighted the strongest predictive features for the ecommerce website. ย With this information, CatskillProvisions.com should focus on these key features to create promotions to scale web traffic, sales conversion and revenue.
Clearly from the data, focusing marketing efforts on the eastern US through stronger SEO campaigns, including content promotion, would improve traffic to the website. ย Addingย SEM campaigns to not only drive traffic but to promote the top three selling products to stimulateย sales would also add to the web traffic/sales mix.
However, the most important promotion aspect this website should focus on is engaging constant contact with their 1x ย purchasers turning them into multi-buyers. ย Adding a simple CRM using transactional trigger data and targeted messaging will help with this effort. ย Simple to implement, email messages would be additive to the website's traffic and sales so long as the offers provided represent a strong value proposition and reason for going back to the website. ย Discount promotions for repeat purchasers could also be utilized - or at least tested.
The upside for CatskillProvisions.com is truly endless, however the above recommendations represent aย small start to reverse traffic decline and enable growth.