Data Analysis of Iowa Liquor Sales in College Towns
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
College binge drinking culture is a notable part of the college experience in America. Data shows that college students have a higher prevalence of occasions of heavy drinking, and are also more frequently intoxicated than their non-college counterparts.
In order to determine what kind of impact college students have on the liquor market, I selected the top three college towns in Iowa and compared liquor sales in these towns to liquor sales in non-college towns. I analyzed the volume, product, and price distributions, as well as the seasonality of products and prices, of liquor store orders from 2018 to the present using Python. The insights provided from this analysis can serve as a guide for liquor store owners in college towns to help determine what products to stock and when to stock them.
The Iowa Liquor Sales dataset, obtained from the Alcoholic Beverages Division of the Iowa Department of Commerce, contains purchasing information of liquor products by Iowa Class "E" liquor licensees. The data set contained 23.3 million observations, each of which is a transaction between a liquor store and its liquor vendor from January 1, 2012 to the present. Since the dataset was obtained in March 2022, it contained data up through February 28, 2022.
However, in an effort to cut down on size and only utilize relevant data, I filtered out all transactions from before 2018, resulting in around ten million observations. Liquor trends from five to ten years ago most likely do not hold significant impact or provide real insight into trends in the present day, especially if one considers the vast changes in alcohol products in recent years, such as the popularization of hard seltzers.
For each observation in the dataset, there were 24 variables that detailed information about the transaction, such as store and vendor information, bottle size, bottle price, sales date, etc. Since the dataset was so massive, I created a relational database to better organize the data and to make it easier to analyze. The database contained seven tables: transactions, counties, vendors, products, prices, stores, and liquor categories.
Volume Distribution Data
First, I sought to determine if there is a significant difference in liquor bottle volume in college town vs non-college town liquor stores. The boxplot above creates a useful visualization for comparing the volume distributions across different liquor categories and the two types of towns, but it does not tell the whole story. The different bottle volumes are in discrete categories (500 ml, 750 ml, 1000 ml, etc.), essentially splitting up the data into groups. So even if the boxplots between the two towns look the same, that does not mean that the distributions are actually the same.
In order to determine whether the two types of towns differ significantly or not, I conducted T-tests for each of the ten liquor categories listed in the boxplot above. Every single T-test resulted in a rejection of the null hypothesis, meaning that the mean bottle volumes differ significantly between college and non-college towns across all ten liquor categories.
I found that mean bottle volumes were higher in non-college towns in every liquor category except for Unknown after further investigation (to clarify, the Unknown category consisted of various unspecified liquors, as well as many seasonal drinks). Although there is no clear reason as to why this is the case, it could be due to older adults buying larger bottles of liquor on occasion and going through them slowly overtime, whereas college students might tend to buy smaller bottles of liquor more frequently for parties and such.
Product Distribution Data
Next I sought to determine if there is a difference in the types of liquor that are stocked in college town liquor stores vs non-college town liquor stores. The boxplot above depicts the spread of the percentage of liquor store orders that each individual type of liquor takes up in both kinds of towns. A simple glance at the plot reveals that some liquor categories appear to have the same or similar trends in both types of towns, but some also appear different.
I again conducted T-tests for each liquor category to quantify whether or not these differences were significant. This resulted in failing to reject the null hypothesis in some categories, and rejecting the null hypothesis in others, meaning that certain liquors see the same buying patterns in college and non-college towns, whereas others differ significantly. Of those that were different, brandy, gin, tequila, and vodka were all more popular in college towns, and rum and whiskey were more popular in non-college towns. Cocktails, liqueurs, mezcal, and unknown liquors did not see any significant differences.
Price Distribution Data
To analyze the price distribution, I split the data up into two groups: liquor bottles under $100, and liquor bottles over $1,000. The first group contained the vast majority of the data, as most people do not buy luxury spirits, and thus showcased the buying patterns of the bulk of the population. The second group showcases the most expensive spirits stocked in liquor stores, and gives us a deeper look at buying patterns for expensive liquors.
Liquor under $100
As previously stated, the vast majority of the liquor orders were for bottles priced under $100. The plot above shows the price densities for both college and non-college towns, and we can see that the two densities are almost identical. The same patterns are occurring for both types of towns, with peaks and valleys happening at the same price values.
The only noticeable difference is in the sharpness of the peaks; non-college towns see higher peaks that hit a sharper point, whereas college towns aren't spiking up as high and its peaks are generally more rounded. This difference could potentially be due to more variety in college towns; if college students are buying a slightly larger variety of products, then the price data won't be quite as consolidated around specific price points.
Liquor over $1,000
The price range represented in the violin plot above, liquor bottles over $1,000, represents the most expensive bottles of liquor ordered. The gap between this group and the previous group, liquor bottles under $100, was almost entirely empty for college towns.
Overall, buying expensive liquor was much less common in college towns, and there is also much less variance in the prices of expensive liquor being purchased in college towns. We see this in the graph above, where the college towns violin plot is much more condensed, while the non-college towns violin plot is more spread out and sees two separate peaks, as opposed to one in college towns.
In general, most products typically do not sell at the same rate throughout the year: water guns and ice cream sell more in the summer, ice scrapers and hot chocolate sell more in the winter, and so on. Liquor products are no different. To analyze the seasonal patterns in Iowa liquor stores, I compared both product and price seasonality in college and non-college towns.
The line graphs above depict the bottles of each type of liquor sold per month from January 2018 through February 2022 in college and non-college towns. It is important to note that the y-axis in the non-college towns graph is an order of magnitude larger than in the college towns graph. Both types of towns are seeing the same major annual peak: December (on both graphs, the blue vertical lines are dated December 31, representing the sum of all sales throughout the month of December). However, this does not mean that there aren't other seasonal trends for certain kinds of liquor.
Next, we will examine three particular liquors that saw different annual trends and analyze their meaning.
One Peak: Liqueurs
Liqueurs have one major peak every year: a holiday spike in December, which coincides with the spike in the previous graph of all of the liquor products. However, if we look more closely at the line graph for college towns, we see that there is a smaller spike in the Fall every year that precedes the December spike. This pattern is not occurring in non-college towns. The difference in these patterns is most likely because of Halloween, which is much more of a drinking holiday in college towns than it is in non-college towns.
For vodka, both college and non-college towns see the same holiday spike in December that we've consistently seen for other liquors. In addition, there is a Fall spike in either September or October, depending on the year. However, the behavior of this spike differs depending on the type of town. In college towns, the Fall spike is typically larger than the December spike, but in non-college towns, it is usually smaller or nonexistent. This is most likely caused by the same reason that we saw for liqueurs: college towns partying more for Halloween.
Covid Peak: Cocktails
Cocktails saw a significant rise in liquor store orders in April and May of 2020, which marked the beginning phase of the Covid-19 pandemic. This was most likely caused by the lockdowns that were in place across the country; if you're at home by yourself, maybe attending a Zoom happy hour, it's much easier to crack open a ready-made cocktail than it is to prepare a drink on your own. Even after the peak came back down, orders for cocktails were still notably higher than they were beforehand. Aside from the Covid spike, there are two major annual peaks: a Spring/Summer peak, and a holiday peak in December.
Lastly, we will analyze the average price of bottles that liquor stores ordered over time in order to study price seasonality. The green vertical line marks March 31, 2020. Since the data is grouped by month, March 2020 represents the last month that contained any pre-Covid data. Before the pandemic began, both college and non-college towns follow the same patterns and shape, and go back and forth between which one is averaging slightly higher than the other. After the pandemic began, college towns break off and are averaging consistently higher bottle prices than non-college towns.
One possibility as to why this may be is that when the lockdowns began, college students all went home. If they were buying cheaper alcohol than the rest of the town residents, then the average prices of liquor being purchased would go up once they left.
We also see that in the Fall of 2021, the gap between college and non-college towns narrows, which could have been caused by college students moving back in and thus driving the average prices down a little bit. Another important pattern to notice is that there seems to be a consistent spike in average prices in December, most likely due to customers buying more expensive liquor as gifts or for holiday parties.
After performing the analysis above, our key findings were as follows:
- Volume Distribution:
- Mean bottle volumes differ significantly between college and non-college towns in every liquor category
- Non-college towns tend to buy larger bottles of liquor
- Product Distribution:
- Certain liquors are more popular depending on the type of town
- Brandy, gin, tequila, and vodka are more popular in college towns
- Price Distribution:
- Liquor under $100: college towns and non-college towns follow the same patterns
- Liquor over $1,000: less common and less spread out in college towns
- Main peak: December
- Some liquors see other seasonal peaks
- Covid-19 has had an impact on the price trends and sales in certain liquor categories