Data Analysis on Liquor Sales in Iowa: Insights into Stores

Posted on Sep 10, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Since before written history, alcohol has always played a huge roll in the social lives of humans. Even during the COVID-19 pandemic, while many industries suffered, the alcohol industry, and specifically the liquor industry, actually saw sizable growth. In 2020, data on spirits or liquor saw a monthly per capita sales increase of 5% compared to the prior three year average, while wine and beer saw a 0.6% increase and 2.9% decrease, respectively (NIAAA).

The purpose of this project is to provide actionable business insights to both distilleries and liquor stores through retail sales analytics.

More specifically, recommendations will be made around pricing, procurement, stocking, and production strategies, and will be broken down on a month-by-month basis in order to maximize sales and profits by meeting variable consumer demand, identifying gaps and saturation points in the market, minimizing overstocking costs, and identifying optimal price points for different liquor categories.

Dataset, Methodology, and Analysis

Dataset Used:

This dataset includes over 12 million alcoholic beverage sales in Iowa (2012 - 2017) reported by the Iowa Department of Commerce.

It includes information such as the store name, location, product sold, wholesale price and volume sold to the store, retail price and volume sold to the end consumer, and a multitude of other variables

Methodology and Data Analysis

Cleaned and Subset Data

  • Removed unnecessary data (ex. "Volume Sold in Gallons" as the "Volume Sold in Liters" was present)
  • Samples 1 out of every 5 sales (done to cut down the amount of time it took to perform the below analysis
  • Modified column names and values (ex. Removed spaces from column names, and "$" from column values
  • Added additional columns (ex. Month and Year columns)

Analysis of Overall Trends

  • Most popular liquor categories
  • YOY trends
  • Profit margins by category
  • % Unsold by category

Breakdown of Specific Products

  • Profit margins by product
  • Product performance by margin
  • Top sellers

Breakdown of Specific Counties and Cities

  • Total sales
  • Average Sales per Store
  • Liquor category preferences by category

Data Analysis of Common Price Points for Whiskey and Vodka

  • Popular price points
  • Potential competitors for craft distilleries

Monthly Data Trends

  • Trends by liquor category
  • Large deviations from monthly normal sales by category

Overall Trends by Category

Breaking down Percent Total Sales by category (Figure 1A), nearly 60% of sales are whiskey and vodka, while Tequila, Schnapps, and Gin combine for less than 15% of total sales. Additionally, plotting Total Sales versus Percent Unsold (Figure 1B), there is a clear correlation between an increase in Total Sales and a decrease in Percent Unsold.

In fact, the linear model indicates significance (p-value < 0.05) for all coefficients and the overall model, while the coefficient of determination (R-squared) indicates that over 50% of the variation in sales can be explained by it's relationship with the % Unsold (Figure 1C). Two important takeaways here are that Vodka shows only 6% Unsold, while Gin and Schnapps are hovering at 40% Unsold.

Data Analysis on Liquor Sales in Iowa: Insights into Stores

Figure 1A: % Sales by Liquor Category

Data Analysis on Liquor Sales in Iowa: Insights into Stores

Figure 1B: Total Sales versus % Unsold by Liquor Category

Data Analysis on Liquor Sales in Iowa: Insights into Stores

Figure 1C: Statistics on Linear Model in Figure 1B

For distilleries, this is a good indication that Vodka would be a profitable endeavor,  while Gin and Schnapps would not be.

For liquor stores, this is a good indication that less Gin and Schnapps should be bought and stocked.

It is also important for Liquor Stores and Distillers alike to understand where trends are going. When considering Percent Sales by Category trends YOY (Figure 2), it is clear that consumer trends have remained stable for the duration of this dataset. Whiskey and Vodka are the clear top preferences, followed by Rum and Liquor, with Tequila, Schnapps, and Gin showing minimal sales comparatively to the other top categories.

Figure 2: Percentage of Yearly Sales by Category

Margins are also an important consideration for liquor stores when stocking their shelves. However, regardless of the type of liquor, the average margins are near identical (Figure 3A). For all categories, margins are tightly clustered at 50.0 - 50.3%. When considering a violin plot (Figure 3B), these trends become even more clear as there is a clear density at about 50% for each, and a steep drop off at slight deviations outside of 50%. Note how tight the y-axis labels are as well (49.9 - 50.3%).

Figure 3A: Margin by Liquor Category (Bar)

Figure 3B: Margin by Liquor Category (Violin)

Stocking the Shelves

Margins for individual items are also very tightly clustered around 50% (Figure 4). Product with sales over $2.5mm exclusively fall within 1% of 50% margins. However, there are some products clearly slightly above the 50% margin mark with significant sales. For liquor stores, the products sold at margins higher than 50% could be a great way to maximize profit if they sell as well as similar products sold at the normal 50% margin (similar % Unsold).

Figure 4: Total Sales vs Margin by Product. Line indicates 50% margin.

Data on Possible Drawbacks

One drawback to the analysis done here is that for products with low sales, the Percent Unsold is unreliable as only 1/5 sales were subset for this analysis. Considering the products with the highest sales at >55% margins, Shellback Silver Tequila and Shellback Spiced Rum show significantly higher sales than the rest of the products at this higher margin (Figure 5A), which makes these products the most reliable to analyze for % Unsold.

These products undersell significantly when compared with their category averages (Figure 5B). In fact, for only a 5% increase in margins, there is about a 20% increase in unsold products. This pattern holds true for all of the Top 20 products by sales sold above 55% margins, however, the rest are less significant due to the drawbacks of this analysis mentioned above.

The main takeaway for liquor stores here is that although selling products at higher margins would increase profit, the decrease in sales far outweighs the gain from the increased margins.

Figure 5A: Products at > 55% Margins by Total Sales

Figure 5B: Shellback Products % Unsold vs Category Average

Liquor stores should also be aware of the Top Sellers statewide and stock their shelves accordingly (Figure 6).

Data on Sales by County and City

For someone opening up a new liquor store or distillery, location choice is an important factor. Maximizing the total sales and average sales per store of a given area should maximize revenue for a store in that area. Looking at all counties in Iowa, Polk County stands above the rest when considering both of these variables (Figure 7A).

Other counties worth considering according to these variables are Linn, Scott, Johnson, Black Hawk, and Woodbury. If one were to open a liquor store or distillery in one of these counties, it is important to know the consumer preferences for each area. All counties show extremely similar preferences of Whiskey and Vodka, followed by Rum and Liqueur, with Gin, Schnapps, and Tequila with the least sales. This is directly in line with the overall trends (Figure 7B).

Figure 7A: Total Sales vs Average Sales Per Store by County

Figure 7B: Top Counties Sales by Category

A new distillery should consider a town in Polk County with low rent and cost-of-living as this would allow the new distillery to be integrated into a community with high liquor sales. However, for liquor stores, it is important to zone in on a specific city.

Three cities in Polk County, Des Moines, West Des Moines, and Windsor Heights are all high in either total sales or average sales per store (Figure 8A). Where Des Moines has the highest sales, West Des Moines has the third highest sales, and Windsor Heights has the second highest average sales per store of all cities in Iowa. These cities are also extremely close to each other with less than 7 square miles separating all three (Figure 8B). Due to its proximity to the high sales of Des Moines and West Des Moines, as well as the extremely high average sales per store, Windsor Heights seems to be the best location for a new liquor store.

Figure 8A: Top 20 Cities in Iowa by Sales vs Average Sales Per Store

Figure 8B: Proximity of Des Moines, West Des Moines, and Windsor Heights

Price Points for Whiskey and Vodka

Distillers should also be aware of the prices vendors are willing to pay for their product. As Whiskey and Vodka are clearly the two most popular liquor types by Sales and Percent Unsold, these should be the two candidates for a distillery looking to maximize sales.

Due to the fact that a new distillery will not be able to compete with the cheaper, larger distilleries, it is important to zone in on popular price points for higher end Whiskey and Vodka. Per Liter, vendors are willing to pay $15 - 25 for higher end Whiskey, and $20 - 25 for higher end Vodka (Figure 9).

Figure 9: Density Plot for $/L for Whiskey and Vodka

It is also important for distillers to understand their top competitors at these price points. For Whiskey, it's Jack Daniels, Jameson, Crown Royal, Maker’s Mark, and other smaller craft brands, while for Vodka, it's Grey Goose, Ciroc, Belvedere, Titos, and smaller other craft brands. This is not only useful when marketing your product to vendors, but also to those going on tours of your distillery, a great way to market your product to the end consumer.

Month-by-Month Data Analysis

As with most products, monthly trends are important to consider for distillers in what they produce and when, and liquor stores in what they buy. The most savvy lean business owners will attempt to minimize inventory as much as possible. This is due to the fact that inventory is actually a form of waste in lean methodology as inventory ties up capital and therefore costs money.

The perfect business would attempt to react to each change in monthly demand for each product (Figure 10A). However, due to constraints around vendor contracts, raw materials, production schedules and other factors, this may not be possible. More realistically, a savvy business owner should at least be reacting to product demand changes more than 10% or 20% (Figures 10B and 10C, respectively).

Figure 10A: Monthly Changes in Demand by Category

Figure 10B: Monthly Changes in Demand by Category over 10%

Figure 10C: Monthly Changes in Demand by Category over 20%

Distillers should keep an especially close watch on these monthly trends as they are also at risk of unnecessary movement, over-production, over-processing, and waiting - 4 other forms of lean waste if they are producing above demand, and not meeting vendor and store demand needs if they are producing below demand.

Recommendations and Conclusions

Liquor Stores Data

  • Regardless of location, expect to see whiskey and vodka to encompass ~60% of sales - stock accordingly
  • Stock less Schnapps and Gin (nearly 40% unsold)
  • Ensure you are stocking Top Best Sellers
  • Ensure you are adjusting order volumes to reflect monthly changes in demand as much as possible
  • Price all products at 50% profit margins (don’t be greedy!)
  • Polk County is a great county for a new liquor store - specifically in Windsor Heights

Distilleries Data

  • Whiskey and Vodka are best bets
  • Price Whiskey at $15 - 25/L and Vodka at $20 - 25/L
  • Know and compare  yourself to competitors (good information for distillery tours)
  • Ensure you are adjusting production volumes, spending, and raw material orders to reflect monthly changes in demand
  • Polk County is a great place to start a distillery

Vodka over Whiskey for New Distilleries

Leveraging both the knowledge gained through this analysis, and my personal knowledge as a former process engineer at a distillery, I would strongly recommend new distilleries to look at making Vodka instead of Whiskey.

  • Cheaper (raw materials and no barrels) 
  • More Flexibility (raw materials)
  • Less Labor and Time Intensive (no aging, easier mash creation)
  • Nearly 3x Less Unsold Bottles
  • Higher Price Point Range ($20 - 25 vs $15 - 25)
  • Less Space Required (no barrel area needed)
  • More Instantaneous Profits (no aging)
  • Less Saturated Craft Market
  • Less Regulation and Preferences around Naming

Future Work to be Explored

  • Use more current data
  • Understand impact of COVID on specific liquor categories 
  • Don’t subset data to dig into individual products and trends with higher accuracy
  • Break down data by “craft” vs larger scale distilleries
  • Look at rent prices and demographics of counties and cities to factor into location recommendations for liquor stores and distilleries

About Author

Jonah Gerstel

Jonah graduated Tulane University in 2017 with a B.S. in Chemical Engineering. He then went on to work in the CPG and Food and Beverage Industries as a Process Engineer, gaining his Lean Six Sigma Green Belt. He...
View all posts by Jonah Gerstel >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI