A Data Analysis on Prescription Drug Discount Coupons

Posted on Feb 21, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

A Data Analysis on Prescription Drug Discount Coupons


Prescription drug costs in the United States have gone from $121 billion in 2000 to almost $360 billion in 2020 with the data we have.  An average American spends $1,200 per person per year on prescription drugs, the highest in the world.  The high cost is not the result of a more than average consumption but the high prices of prescription drug in the United States. For those with insurance, this burden can be lightened, but for the 10.3% of Americans who do not have insurance, this out of pocket outlay can be a huge burden. 

Prescription drug discount coupons are one way to curb out of pocket prescription costs.   These coupons are available on numerous websites.  A consumer can go to these websites and enter the drug name and their zip codes.  The website provides a list of the prices (with the coupon) for the prescription drug along with the name of the store offering the listed price.  The consumer can choose the pharmacy with the cheapest price and take the coupon to the drug store to purchase the drug.

Pharmaceutical companies use drug discount coupons as a marketing tool.  This practice came under attack by consumer advocates who found that most of these coupons were offered for brand name drugs.  The coupons would lead consumers to buy more expensive brand name drugs when cheaper generic alternatives were available.  In recent years, however,  consumer watch dogs started recommending the use of drug discount coupons as drug discount websites are evolving by offering more coupon choices and putting money back into the pockets of consumers.

Objective and Data set

The objective of this project was to scrape drug discount websites in the Pittsburgh area and determine (1) which types of stores are offering the best deals on drug discounts (2) do more popular drugs have better deals than less popular drugs (3) do more expensive drugs have better deals than cheaper drugs and finally (4) how can local pharmacies (the small guys) use drug discounts to be more competitive with larger pharmacy chains?

In order to perform the analysis the following data was gathered:

  1. Data on top 200 prescription drugs based on number of prescriptions filled.  This data was scraped from a website that teaches pharmacology students about the most popular drugs.  Each of the top 200 drugs had the following data: drug name, number of prescriptions filled for that drug in 2018 and ranking.
  2. Three prescription drug discount websites were scraped for the 200 drugs above using the Pittsburgh zip code.  The drug coupon discounted price was retrieved for 154 drugs from Site 1, 102 drugs from Site 2 and 110 drugs from Site 3.  Along with the drug price, the store offering the price was also collected.  Site 1 had 10 stores, site 2 had 8 stores and site 3 had 6 stores.
  3. Store category information was also included for each of the stores.  There were 5 category of stores - Local Pharmacy, Pharmacy Chain (CVS, Walgreens etc.), Grocery Chain (Kroger, Giant etc.), Department Store (Walmart, Target etc.) and Wholesale(Costco)

Data analysis

Initial view of the variables of interest

The initial view of the data revealed the following graphs.  It showed that most drugs were in similar ranges(in terms of prices and popularity) on all three web sites.  This is to be expected since all three websites were provided the same drug list.  The concentration of data points in the lower left grid needed to be studied further.


A Data Analysis on Prescription Drug Discount Coupons
Scatter plot for three

Drilling into the price related statistics of each site showed an anomaly in the standard deviation of Site 1.  This difference was attributed to an outlier drug.


A Data Analysis on Prescription Drug Discount Coupons
Stat summary for Price column

Removal of the drug revealed the following boxplot that showed that the three sites were quite similar in terms of drug price composition


Box plot for Price for all three web sites

Even though the combined data from the three websites would have been better to work with, one limitation was that the drug composition across the three websites were different. Consequently, each site had to be evaluated independently.

Store composition by website

The data had stores in five different categories (1) Local pharmacies which were smaller stores or local groceries with pharmacies (2) pharmacy chains which were large national drug stores (3) grocery chains which were large grocery stores (4) department stores which were large national department stores and (5) wholesale warehouses.  

A very interesting finding was that each site had a different makeup of store types.  Sites 2 and 3 seemed to have more local pharmacies than Site 1.  Site 1 had all five categories of stores.  Site 2 had four categories and Site 3 had three categories.  The store compositions were as follows:


Web site 1 store types

Web site 2 store types

Web site 3 store type

Which store had the lowest prices and which one was charging the most?

To find the stores that were offering the lowest prices and charging the highest prices, the statistical minimum and statistical maximum drug price in each drug category were used.  The minimum and maximum value were aggregated by store type.  

In Site 1, grocery chains had the lowest prices, followed by department stores.  This is mainly because grocery chains and department stores tend to offer low prescription drug prices to attract customers to come into the store.  Local pharmacies were not as cut throat when it came to drug prices.  In Site 2 and Site 3, local pharmacies were offering the lowest prices but were only better than pharmacy chains and department stores.  It is possible that local pharmacies looked like leaders in these sites because there were no grocery chains and few department stores.


Web site 1 store types with the lowest price

Web site 2 store types with the lowest price

Web site 3 store types with the lowest price

All three sites showed that Pharmacy chains were charging the most for prescription drugs followed by department stores.  Local pharmacies showed that they were competitive with pharmacy chains, department stores and wholesale stores.


Web site 1 store types with the highest price

Web site 2 store types with the highest price

Web site 3 store types with the highest price

The takeaway for local pharmacies is that they need to study drug prices of grocery chains and department stores to determine how they can be competitive.

Data on the Competitiveness in Each Store

In addition to looking at the lowest and highest prices, the analysis also included studying stores that offered the most competitive price.  This was calculated by subtracting the lowest prices from all the prices in each drug category.  This value was labelled as the “Price Difference.”   Stores that were competitive had a low “Price Difference.”   

Site 1 showed that even though local pharmacies were not offering the lowest price, they were behind grocery stores and were more competitive than department stores.  Site 2 and Site 3 showed that local pharmacies were very competitive on price.  


Web site 1 - price competitiveness

Web site 2 - price competitiveness

Web site 3 - price competitiveness

The learning from the above graphs was that the local pharmacies need not offer the lowest price for their drugs but must gain margin from grocery chains by being even more competitive.

Data on Drug popularity and competitiveness

Using the same competitiveness parameter of “Price Difference,” the analysis sliced into the category of ten most popular drugs and the ten least popular drugs.  Drug popularity was based on the number of prescriptions filled for each drug.  

Since Site 3 was already dominated by local pharmacies, this analysis focused on Sites 1 and 2.  In Site 1, for the ten most popular drugs, local pharmacies and grocery chains were neck and neck.  For the least popular drugs, grocery chains were ahead.  In Site 2, for the ten most popular drugs, local pharmacies were ahead of grocery chains.  So, when it came to drug popularity, local pharmacies were very competitive, even against the grocery chains.  This indicates that local pharmacies needed to study the mid range drugs to be more competitive with grocery chains.


Web site 1 competitiveness among top 10 popular drugs

Web site 2 competitiveness among top 10 popular drugs

Web site 1 competitiveness among 10 least popular drugs

Web site 2 competitiveness among 10 least popular drugs

Drug price and competitiveness

The final part of the analysis studied the “Price Difference” variable in light of the ten most expensive drugs and the ten cheapest drugs.  In Site 1, there was a huge variation in the distribution for the ten most expensive drugs which meant that there were a lot of opportunities to be competitive in this area.  Further study will be required to determine what kind of coupons are offered in these categories and whether local pharmacies can afford to be competitive in this area.  Among the least expensive drugs, local pharmacies were very competitive.

In Site 2, local pharmacies were not competitive in the most expensive category but were competitive in the least expensive drugs category.  This is probably an area local pharmacies can ignore.


Web site 1 competitiveness among ten most expensive drugs

Web site 2 competitiveness among ten most expensive drugs

Web site 1 competitiveness among ten least expensive drugs

Web site 2 competitiveness among ten least expensive drugs


Based on the above analysis, the following recommendations can be made to local pharmacies when it comes to offering drug discount coupons:

  1.  Local pharmacies need to have a larger presence in more drug discount websites so they can be more visible to price conscious customers
  2. Pharmacies need to study pricing of grocery chains and determine if they can cut into their margins especially for drugs that are in the mid price category
  3. Pharmacies can expand their competitiveness in the popular drug category

Future work

The following areas needed to be studied further:

  1. Analysis based on median or mean
  2. Get more data on drugs and add analysis on those features: drug types i.e. generic vs.brand dosages
  3. Normalize data across sites based on exactly same drug types and perform analysis across sites
  4. Broaden the study to other geographical areas

About Author

Chitra Sharathchandra

Chitra Sharathchandra is a software engineer who is passionate about technology. Her current focus is on data science and data engineering. Chitra enjoys teaching South Indian classical music.
View all posts by Chitra Sharathchandra >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI