Data Analysis on Alcohol Sales in Iowa

Posted on Dec 11, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


            Data shows revenue for the alcoholic drinks market in the U.S. has been estimated at 222 million dollars for 2020. Notably, the demand in the U.S. alcoholic beverages market specifically for hard liquors has steadily increased since the 2000s and now accounts for 37.3% of U.S. alcoholic beverage consumption.

While beer has and continues to hold overall dominance in the consumer market and thus in industry space, further research into the U.S. liquor market potentially anticipates the emerging dominance or at least equivalence of liquor and beer consumption and popularity. While recent economic upheavals like the ongoing pandemic have significantly impacted the sales and operations of the alcohol industry, alcoholic beverage demand and consumption in a major market like the U.S. makes an industry recovery in the near future highly likely.


            This project specifically seeks to investigate sales and distribution dynamics in the alcoholic beverage industry via a case study of the Iowa liquor market in order to uncover insights and recommendations useful for a prospective entrant vendor. Besides assessing potential predictive parameters for product profit, the primary research question posed in this project more broadly addresses the status of Iowa’s liquor market.

For example, who are the dominant vendors? What are the most popular product categories, and within those top categories what are the best-selling products? What are some general buyer and product trends that might optimize product success? Thus, this project presents an overview and analysis of some of the more significant regional behaviors and factors that shape Iowa’s liquor market.



            This project primarily used the Iowa Alcoholic Beverages Division liquor sales and distribution dataset (collected from January 2012 through October 2020), with supplementary product data (product aging and proof) supplied by the same organizational entity.

Different Models

The training dataset used a sample of two years, 2012 and 2013, in order to make regression analysis and market basket analysis more feasible as per device memory limitations. The training dataset used for the two analyses contained a little over 3.1 million observations. Data analysis was performed on the full dataset and examined over 19 million observations. Packages used include pandas, numpy, sklearn, and mlxtend. Visualizations were made with the Python packages matplotlib and seaborn.

            Preprocessing for the basket analysis involved creating a transaction dataset by isolating the product descriptions and quantity sold (‘Bottles Sold’).   The analysis omitted transactions where quantity sold was less than 0 and encoded any quantities greater than (or equal to) 1 as 1.

            Preprocessing for the regularized regression analysis involved omitting redundant or correlated features and omitting missing observations from the analysis for the columns used. Imputation was deemed at best imprecise, and at worst misleading or inaccurate for aggregate analyses. The proportion of missing values was negligible relative to the size of the dataset, and considered missing at random from random sampling and informal inspection. Sklearn’s OneHotEncoder was used to transform nominal variables.

       Feature Engineering 

    Feature engineering for the regularized regression analysis in this project involved creating one additional dependent variable, which was used as the target variable.   Subtracting the state bottle cost (the price at which local liquor sellers purchased the products from the state) from the state bottle retail cost (the cost at which the state purchased the products from vendors) created a proxy for product profit.

Another feature involved the unitized price of products (price per liter), but was ultimately discarded. Additional features were appended from the previously mentioned supplementary dataset provided by the IABD, including product age and proof.

    Lasso Regression

        This project specifically utilized lasso regression in order to assess the predictive influence of dataset parameters on product profit. The train test split was 80/20.   Multiple iterations of the regression were run on different feature combinations of the nominal columns selected due to device memory limitations (a vendor regression, a proof regression, and a county and product category regression).


Data Results

Data Analysis

            Data analysis uncovered general context and some concrete trends in the data.

Market Size

Elaborating, in terms of Iowa’s liquor market size fluctuation, the total amount of unique products in a year seems to hover around 3000 to 3500. A large spike in new liquor products was seen in 2016 and 2017, though this spike quickly subsided to normal levels in the following year. Overall, this combined with the approximate 39.7% survival rate of products sold over the 8 year span of the dataset suggests that the Iowa liquor market remains relatively competitive – that is, while new brands regularly surface, less than half of the brands sold in 2012 were still being sold in 2020.

Data Analysis on Alcohol Sales in Iowa

Fig. 1: Overall liquor market size in Iowa State from Jan. 2012 to Oct. 2020.



Data Analysis on Alcohol Sales in Iowa

Fig. 2: Survival rate of products sold in 2012 tracked through 2020.



         Then, aggregating by profit over the time span of the entire dataset, whiskies by far dominated Iowa’s market as a top-selling product category. Of the top ten most profitable categories, whiskies held six positions on the list including the top selling position with Canadian whiskies. Black Velvet, Crown Royal Canadian Whisky, and Crown Royal were the top three (in descending order of performance) best selling Canadian whisky products.

Notably, examining vendor competition, while Black Velvet is a whisky brand owned by the Kentucky-based Heaven Hill, the top Canadian whisky vendor by total profit was Diageo Americas, a London-based company. Interestingly, although Diageo Americas owns the Crown Royal brand, it also once owned the Black Velvet brand before selling it to competing vendor Constellation Brands in 1999 – which ultimately sold the brand to Heaven Hill. Thus, while a highly profitable liquor category, this hints at the competitiveness and narrowness of the Canadian whisky market and the overall difficulty a new product may in gaining traction.

Data Analysis on Alcohol Sales in Iowa

Fig. 3: Illustrating the profitability and also possible saturation of the whiskey market in Iowa State.


        Raw Profit

    Significantly, here, aggregating vendors by raw profit in the Iowa liquor market demonstrates how the top vendors (like Diageo Americas, Heaven Hill, Beam Suntory/Jim Beam, and Sazerac) are visible in and compete across Iowa’s bestselling product categories (though with varying levels of success).

Additionally, while some specialty premium products can be very costly, the price for a liter of Canadian whisky, straight bourbon whisky, or American vodka ranged from 8 dollars at the cheapest (cost of a liter of Five O’Clock Vodka) to around 43 dollars at the most expensive (cost for a liter of Crown Royal). This underscores both the accessibility of these products necessary in order for them to succeed, but also the likely slimmer profit margins of vendors who employ this pricing strategy.

This quandary – selling enough product volume to offset production costs at a relatively low per bottle profit margin – suggests that this pricing strategy is likely most sustainable for multinational companies like Diageo Americas, Jim Beam Brands, and Sazerac Company Inc. because they have multiple product lines across liquor categories and brands at a range of prices in many countries. This simply further highlights some of the difficulties and market pressure a new product and vendor must overcome to gain traction in the liquor industry.

Sales Distribution

Fig. 4: Demonstrating the dominant presence of a few, major multinational companies competing across the top product categories.

            Exploring other methods of maximizing chances of product success, in terms of sales distribution at liquor stores, Des Moines was the overall top city in Iowa for liquor sales, followed by Cedar Rapids and Davenport. Additionally, aggregating sales by profit illustrates how the regional (Midwest) supermarket chain Hy-Vee was by far the most common place where liquor purchases were made in Iowa.

Net Profit

Occupying six of the top ten most profitable store locations, the Des Moines Hy-Vee location took first place in terms of net profit for liquor sales. This suggests that supermarkets in Iowa are an important and high-exposure distribution and sales channel for new products.

Fig. 5:  More densely populated urban centers/cities are better for higher product exposure/consumption.


Fig. 6: Based on profitability by store, supermarket chain Hy-Vee claims the top spot as the optimal regional sales and distribution channel.


        Temporal Trends

   Finally, examining temporal trends, the overall frequency of liquor purchase has noticeable dips in the fall and late winter (i.e., after December). Demand spikes begin around March and peak in the summer, and then spike again in December. By category, Canadian whiskies and straight bourbon whiskies were especially popular during the holiday season in December, whereas American vodkas actually drop in popularity starting around September, only picking up again in the following spring.

Here, American vodkas actually enjoyed their highest demand during the summer (while whisky sales, relative to their holiday spike, were comparatively depressed).  Throughout the year, however, demand was stronger and thus higher in profit for the two whiskey categories compared to American vodkas.

Fig. 7: From top to bottom, trend lines for consumption of Canadian whiskey, straight bourbon whiskey, and American vodka.


Market Basket Analysis

            Performing further market research and market basket analysis on the dataset revealed a key consumer demographic driving Iowa’s liquor market. Elaborating, setting the minimum support threshold for products to 5%, support for three top products – Black Velvet (Canadian whisky), Hawkeye Vodka (American vodka), and Five O’Clock Vodka (American vodka) – was found to be predictably high throughout the entire dataset (though Five O’Clock vodka dipped to around 42% support in 2017-2018).

While Black Velvet’s appeal seems to reach a broader demographic (due to its consistently high support and impressive aggregate profit), further research revealed that Five O’Clock Vodka and Hawkeye Vodka (both roughly $10 a liter) in particular enjoy both a high amount of support and a certain degree of infamy amongst Iowa college students for its high alcohol content and inexpensive price.

Analysis Findings

            Performing a basket analysis on transactions made in 2020 reveals how Black Velvet seems resilient to the recent pandemic (and has a more positive reputation as a better-tasting liquor at an equally affordable price of approximately $12 a liter). Interestingly, coinciding with its top profits in the product category, a new competitor in the American vodka market in the form of Tito’s Handmade Vodka rose to 54% support in 2020.

Here, while both Hawkeye and Five O’Clock vodka seemed to be suffering lower levels of support (the latter, in fact, seemed to drop below 5% support as it was not present in the 2020 frequent item set), suggests that while possibly fickle, the college student population remains an incredibly strong and voracious consumer demographic.

This demonstrates that while historically Iowan college students may have favored cheaper American vodkas, the market has room for a higher quality product like Tito’s, which has enjoyed steady growth likely from the same consumer demographic and at a higher profit margin (at around 21 dollars a liter). As an important aside, this conclusion must be revisited to ensure complete accuracy, as data was only available for transactions recorded through October 1st, 2020.

Fig. 8: Association rules demonstrating the recent surge in support for Tito's Handmade Vodka.  (Transaction lift levels were omitted in this figure as they were close to the  minimum threshold and held minimal new insight into buying behaviors).


Data on Lasso Regression

            Summarizing the results of the regularized linear regression, none of the features analyzed were found to specifically predict product profit. Formal model parameter selection from lasso cross validation was deemed unnecessary as none of the coefficients deviated from a slope of zero, even when the lasso regression alpha level was set to zero.



            Ultimately, in response to the research questions, a prospective new vendor must acknowledge the competitiveness of the Iowan liquor market stemming from a few, sometimes multinational companies. With an approximate 40% product attrition rate over an eight-year period, a new vendor selling in Iowa would find strong demand for whisky products, though may be outcompeted in this top-selling category by established multinational vendors like Diageo Americas and Beam Suntory/Jim Beam.

Additionally, selling products in a reasonable price range (i.e., $12 - $40 per liter) in more urban areas like Des Moines and Cedar Rapids would likely give a new vendor the best product exposure and likelihood of succeeding, especially if they were able to gain enough brand traction to sell products through a distribution channel like the regional supermarket chain Hy-Vee.

Finally, while demand remands fairly consistent throughout the year, launching a product during the fall or late winter (i.e., post-December), regardless of product category, would likely lead to lower than typical demand, as even popular liquor categories see a sharp drop in sales during those periods.

            Significantly, in overcoming the profit gap between selling sufficiently high volumes of an affordably priced product in order to offset production costs, market research and basket analyses illustrate that while cheaper liquors perform well specifically in the college student demographic, those same consumers also seem willing to buy higher quality products for a higher (but still reasonable) price.

Thus, while whiskies overall are an extremely competitive product category, American vodkas seem like a comparatively less competitive market and present a potential product niche opportunity that brands like Tito’s Handmade Vodka have successfully taken advantage of.

Future Data Research

            Future research might examine which product features are more specifically tied to predicting product profit. For example, flavors, ingredients, and branding/marketing (i.e., endorsed by a specific celebrity or seen in a memorable commercial) may all play a much stronger predictive role than more general factors like product age, proof, or sale location. Additionally, future analyses might more efficiently map demand given more typical demographic information – for example, what young adult versus older buyers tend to purchase, or segmenting the market by race or gender. Limitations of this project’s conclusions mostly rest on the limited generalizability of Iowa’s market dynamics, which may only be relevant, for example, for other liquor markets in the U.S.’s Upper Midwest region.

            Overall, then, in investigating some of the regional patterns and trends in Iowa’s alcoholic beverage purchases, a new vendor should strategically evaluate their specific product category niche and target consumer demographic – including what kind of market gap their product may be able to fill, like the affordable, higher quality of Tito’s Handmade Vodka.

This entails investigating the nature and stature of the new product's competition (i.e., a multinational versus a domestic vendor), selecting an optimal product launch time, and determining the best distribution locations and sales channels in order to gain traction in the market and fully maximize the product’s chance of success and future profitability.

Link to code: Github repository



Alcoholic Drinks. Oct, 2020.

The Alcohol Industry in Data. Oct, 2020.

Drizly: Your Online Liquor Store. Oct, 2020.

Iowa Data, Iowa Liquor Products. Oct, 2020.

Iowa Data, Iowa Liquor Sales. Oct, 2020. 

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI