NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > Data Analysis on Alcohol Sales in Iowa

Data Analysis on Alcohol Sales in Iowa

Bram W
Posted on Dec 11, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

            Data shows revenue for the alcoholic drinks market in the U.S. has been estimated at 222 million dollars for 2020. Notably, the demand in the U.S. alcoholic beverages market specifically for hard liquors has steadily increased since the 2000s and now accounts for 37.3% of U.S. alcoholic beverage consumption.

While beer has and continues to hold overall dominance in the consumer market and thus in industry space, further research into the U.S. liquor market potentially anticipates the emerging dominance or at least equivalence of liquor and beer consumption and popularity. While recent economic upheavals like the ongoing pandemic have significantly impacted the sales and operations of the alcohol industry, alcoholic beverage demand and consumption in a major market like the U.S. makes an industry recovery in the near future highly likely.

Objective

            This project specifically seeks to investigate sales and distribution dynamics in the alcoholic beverage industry via a case study of the Iowa liquor market in order to uncover insights and recommendations useful for a prospective entrant vendor. Besides assessing potential predictive parameters for product profit, the primary research question posed in this project more broadly addresses the status of Iowaโ€™s liquor market.

For example, who are the dominant vendors? What are the most popular product categories, and within those top categories what are the best-selling products? What are some general buyer and product trends that might optimize product success? Thus, this project presents an overview and analysis of some of the more significant regional behaviors and factors that shape Iowaโ€™s liquor market.

 

Methods

            This project primarily used the Iowa Alcoholic Beverages Division liquor sales and distribution dataset (collected from January 2012 through October 2020), with supplementary product data (product aging and proof) supplied by the same organizational entity.

Different Models

The training dataset used a sample of two years, 2012 and 2013, in order to make regression analysis and market basket analysis more feasible as per device memory limitations. The training dataset used for the two analyses contained a little over 3.1 million observations. Data analysis was performed on the full dataset and examined over 19 million observations. Packages used include pandas, numpy, sklearn, and mlxtend. Visualizations were made with the Python packages matplotlib and seaborn.

            Preprocessing for the basket analysis involved creating a transaction dataset by isolating the product descriptions and quantity sold (โ€˜Bottles Soldโ€™).   The analysis omitted transactions where quantity sold was less than 0 and encoded any quantities greater than (or equal to) 1 as 1.

            Preprocessing for the regularized regression analysis involved omitting redundant or correlated features and omitting missing observations from the analysis for the columns used. Imputation was deemed at best imprecise, and at worst misleading or inaccurate for aggregate analyses. The proportion of missing values was negligible relative to the size of the dataset, and considered missing at random from random sampling and informal inspection. Sklearnโ€™s OneHotEncoder was used to transform nominal variables.

       Feature Engineering 

    Feature engineering for the regularized regression analysis in this project involved creating one additional dependent variable, which was used as the target variable.   Subtracting the state bottle cost (the price at which local liquor sellers purchased the products from the state) from the state bottle retail cost (the cost at which the state purchased the products from vendors) created a proxy for product profit.

Another feature involved the unitized price of products (price per liter), but was ultimately discarded. Additional features were appended from the previously mentioned supplementary dataset provided by the IABD, including product age and proof.

    Lasso Regression

        This project specifically utilized lasso regression in order to assess the predictive influence of dataset parameters on product profit. The train test split was 80/20.   Multiple iterations of the regression were run on different feature combinations of the nominal columns selected due to device memory limitations (a vendor regression, a proof regression, and a county and product category regression).

 

Data Results

Data Analysis

            Data analysis uncovered general context and some concrete trends in the data.

Market Size

Elaborating, in terms of Iowaโ€™s liquor market size fluctuation, the total amount of unique products in a year seems to hover around 3000 to 3500. A large spike in new liquor products was seen in 2016 and 2017, though this spike quickly subsided to normal levels in the following year. Overall, this combined with the approximate 39.7% survival rate of products sold over the 8 year span of the dataset suggests that the Iowa liquor market remains relatively competitive โ€“ that is, while new brands regularly surface, less than half of the brands sold in 2012 were still being sold in 2020.

Data Analysis on Alcohol Sales in Iowa

Fig. 1: Overall liquor market size in Iowa State from Jan. 2012 to Oct. 2020.

 

 

Data Analysis on Alcohol Sales in Iowa

Fig. 2: Survival rate of products sold in 2012 tracked through 2020.

 

   Profit

         Then, aggregating by profit over the time span of the entire dataset, whiskies by far dominated Iowaโ€™s market as a top-selling product category. Of the top ten most profitable categories, whiskies held six positions on the list including the top selling position with Canadian whiskies. Black Velvet, Crown Royal Canadian Whisky, and Crown Royal were the top three (in descending order of performance) best selling Canadian whisky products.

Notably, examining vendor competition, while Black Velvet is a whisky brand owned by the Kentucky-based Heaven Hill, the top Canadian whisky vendor by total profit was Diageo Americas, a London-based company. Interestingly, although Diageo Americas owns the Crown Royal brand, it also once owned the Black Velvet brand before selling it to competing vendor Constellation Brands in 1999 โ€“ which ultimately sold the brand to Heaven Hill. Thus, while a highly profitable liquor category, this hints at the competitiveness and narrowness of the Canadian whisky market and the overall difficulty a new product may in gaining traction.

Data Analysis on Alcohol Sales in Iowa

Fig. 3: Illustrating the profitability and also possible saturation of the whiskey market in Iowa State.

 

        Raw Profit

    Significantly, here, aggregating vendors by raw profit in the Iowa liquor market demonstrates how the top vendors (like Diageo Americas, Heaven Hill, Beam Suntory/Jim Beam, and Sazerac) are visible in and compete across Iowaโ€™s bestselling product categories (though with varying levels of success).

Additionally, while some specialty premium products can be very costly, the price for a liter of Canadian whisky, straight bourbon whisky, or American vodka ranged from 8 dollars at the cheapest (cost of a liter of Five Oโ€™Clock Vodka) to around 43 dollars at the most expensive (cost for a liter of Crown Royal). This underscores both the accessibility of these products necessary in order for them to succeed, but also the likely slimmer profit margins of vendors who employ this pricing strategy.

This quandary โ€“ selling enough product volume to offset production costs at a relatively low per bottle profit margin โ€“ suggests that this pricing strategy is likely most sustainable for multinational companies like Diageo Americas, Jim Beam Brands, and Sazerac Company Inc. because they have multiple product lines across liquor categories and brands at a range of prices in many countries. This simply further highlights some of the difficulties and market pressure a new product and vendor must overcome to gain traction in the liquor industry.

Sales Distribution

Fig. 4: Demonstrating the dominant presence of a few, major multinational companies competing across the top product categories.

            Exploring other methods of maximizing chances of product success, in terms of sales distribution at liquor stores, Des Moines was the overall top city in Iowa for liquor sales, followed by Cedar Rapids and Davenport. Additionally, aggregating sales by profit illustrates how the regional (Midwest) supermarket chain Hy-Vee was by far the most common place where liquor purchases were made in Iowa.

Net Profit

Occupying six of the top ten most profitable store locations, the Des Moines Hy-Vee location took first place in terms of net profit for liquor sales. This suggests that supermarkets in Iowa are an important and high-exposure distribution and sales channel for new products.

Fig. 5:  More densely populated urban centers/cities are better for higher product exposure/consumption.

 

Fig. 6: Based on profitability by store, supermarket chain Hy-Vee claims the top spot as the optimal regional sales and distribution channel.

 

        Temporal Trends

   Finally, examining temporal trends, the overall frequency of liquor purchase has noticeable dips in the fall and late winter (i.e., after December). Demand spikes begin around March and peak in the summer, and then spike again in December. By category, Canadian whiskies and straight bourbon whiskies were especially popular during the holiday season in December, whereas American vodkas actually drop in popularity starting around September, only picking up again in the following spring.

Here, American vodkas actually enjoyed their highest demand during the summer (while whisky sales, relative to their holiday spike, were comparatively depressed).  Throughout the year, however, demand was stronger and thus higher in profit for the two whiskey categories compared to American vodkas.

Fig. 7: From top to bottom, trend lines for consumption of Canadian whiskey, straight bourbon whiskey, and American vodka.

 

Market Basket Analysis

            Performing further market research and market basket analysis on the dataset revealed a key consumer demographic driving Iowaโ€™s liquor market. Elaborating, setting the minimum support threshold for products to 5%, support for three top products โ€“ Black Velvet (Canadian whisky), Hawkeye Vodka (American vodka), and Five Oโ€™Clock Vodka (American vodka) โ€“ was found to be predictably high throughout the entire dataset (though Five Oโ€™Clock vodka dipped to around 42% support in 2017-2018).

While Black Velvetโ€™s appeal seems to reach a broader demographic (due to its consistently high support and impressive aggregate profit), further research revealed that Five Oโ€™Clock Vodka and Hawkeye Vodka (both roughly $10 a liter) in particular enjoy both a high amount of support and a certain degree of infamy amongst Iowa college students for its high alcohol content and inexpensive price.

Analysis Findings

            Performing a basket analysis on transactions made in 2020 reveals how Black Velvet seems resilient to the recent pandemic (and has a more positive reputation as a better-tasting liquor at an equally affordable price of approximately $12 a liter). Interestingly, coinciding with its top profits in the product category, a new competitor in the American vodka market in the form of Titoโ€™s Handmade Vodka rose to 54% support in 2020.

Here, while both Hawkeye and Five Oโ€™Clock vodka seemed to be suffering lower levels of support (the latter, in fact, seemed to drop below 5% support as it was not present in the 2020 frequent item set), suggests that while possibly fickle, the college student population remains an incredibly strong and voracious consumer demographic.

This demonstrates that while historically Iowan college students may have favored cheaper American vodkas, the market has room for a higher quality product like Titoโ€™s, which has enjoyed steady growth likely from the same consumer demographic and at a higher profit margin (at around 21 dollars a liter). As an important aside, this conclusion must be revisited to ensure complete accuracy, as data was only available for transactions recorded through October 1st, 2020.

Fig. 8: Association rules demonstrating the recent surge in support for Tito's Handmade Vodka.  (Transaction lift levels were omitted in this figure as they were close to the  minimum threshold and held minimal new insight into buying behaviors).

 

Data on Lasso Regression

            Summarizing the results of the regularized linear regression, none of the features analyzed were found to specifically predict product profit. Formal model parameter selection from lasso cross validation was deemed unnecessary as none of the coefficients deviated from a slope of zero, even when the lasso regression alpha level was set to zero.

 

Conclusion

            Ultimately, in response to the research questions, a prospective new vendor must acknowledge the competitiveness of the Iowan liquor market stemming from a few, sometimes multinational companies. With an approximate 40% product attrition rate over an eight-year period, a new vendor selling in Iowa would find strong demand for whisky products, though may be outcompeted in this top-selling category by established multinational vendors like Diageo Americas and Beam Suntory/Jim Beam.

Additionally, selling products in a reasonable price range (i.e., $12 - $40 per liter) in more urban areas like Des Moines and Cedar Rapids would likely give a new vendor the best product exposure and likelihood of succeeding, especially if they were able to gain enough brand traction to sell products through a distribution channel like the regional supermarket chain Hy-Vee.

Finally, while demand remands fairly consistent throughout the year, launching a product during the fall or late winter (i.e., post-December), regardless of product category, would likely lead to lower than typical demand, as even popular liquor categories see a sharp drop in sales during those periods.

            Significantly, in overcoming the profit gap between selling sufficiently high volumes of an affordably priced product in order to offset production costs, market research and basket analyses illustrate that while cheaper liquors perform well specifically in the college student demographic, those same consumers also seem willing to buy higher quality products for a higher (but still reasonable) price.

Thus, while whiskies overall are an extremely competitive product category, American vodkas seem like a comparatively less competitive market and present a potential product niche opportunity that brands like Titoโ€™s Handmade Vodka have successfully taken advantage of.

Future Data Research

            Future research might examine which product features are more specifically tied to predicting product profit. For example, flavors, ingredients, and branding/marketing (i.e., endorsed by a specific celebrity or seen in a memorable commercial) may all play a much stronger predictive role than more general factors like product age, proof, or sale location. Additionally, future analyses might more efficiently map demand given more typical demographic information โ€“ for example, what young adult versus older buyers tend to purchase, or segmenting the market by race or gender. Limitations of this projectโ€™s conclusions mostly rest on the limited generalizability of Iowaโ€™s market dynamics, which may only be relevant, for example, for other liquor markets in the U.S.โ€™s Upper Midwest region.

            Overall, then, in investigating some of the regional patterns and trends in Iowaโ€™s alcoholic beverage purchases, a new vendor should strategically evaluate their specific product category niche and target consumer demographic โ€“ including what kind of market gap their product may be able to fill, like the affordable, higher quality of Titoโ€™s Handmade Vodka.

This entails investigating the nature and stature of the new product's competition (i.e., a multinational versus a domestic vendor), selecting an optimal product launch time, and determining the best distribution locations and sales channels in order to gain traction in the market and fully maximize the productโ€™s chance of success and future profitability.

Link to code: Github repository

 

Sources

Alcoholic Drinks. Oct, 2020. https://www.statista.com/outlook/10000000/109/alcoholic-drinks/united-states

The Alcohol Industry in Data. Oct, 2020.

https://www.alcohol.org/guides/the-alcohol-industry-in-data/

Drizly: Your Online Liquor Store. Oct, 2020.

https://drizly.com/

Iowa Data, Iowa Liquor Products. Oct, 2020.

https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Products/gckp-fe7r

Iowa Data, Iowa Liquor Sales. Oct, 2020.

https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy 

About Author

Bram W

View all posts by Bram W >

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application