Prescription Drug Pricing, A Case Study in Market Failure

Posted on Jul 29, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Living in the only developed nation without universal health coverage, Americans are at the mercy of pharmaceutical companies, insurers, and pharmacy benefit managers when it comes to the cost of prescription drug. As a result, we pay substantially more for our medication than people in other countries:

Drug prices by country

The out-of-pocket cost of a prescription drug largely depends on the patient’s insurance. Data on insurance formularies is spread across numerous disparate sources. Even for patients paying out of pocket, US pharmacies do not publish their prices online. The resulting confusion, plus the inelastic demand for medicine, creates a highly dysfunctional market environment, with consumers footing the bill. This is a textbook example of market failure, wherein the market’s natural price-setting mechanisms become wildly distorted and inefficient, leading to a net social welfare loss.

Current Drug Pricing Mechanisms

The current pricing model involves three major players: pharmaceutical companies, insurers, and pharmacy benefit managers. With so many intermediaries taking a piece of the pie, prices have become inflated. The actual workings of this system are too complex to summarize here, but if you want to know more, Hasan Minhaj gives an informative (and hilarious) take on the situation.

Take the example of Lipitor: Buying directly from the manufacturer, a month’s supply costs around $3. Because pharmaceutical companies know insurance companies (by way of pharmacy benefit managers) negotiate heavy discounts, they price their product exorbitantly in anticipation of these negotiations. In the case of a common generic like Lipitor, most patients will end up paying a small co-pay. But patients without insurance are left to foot the bill. In this regard, a $3 bottle of Lipitor can cost an uninsured or underinsured patient as much as $130:

Prescription drug pricing scheme

In recent years, prescription savings clubs such as GoodRx, WellRx, and Familywize have emerged to help combat this problem. They negotiate directly with pharmacies and provide search aggregation tools for consumers to compare prices across pharmacies in their area. However, in spite of these tools, the market is still comically inefficient. A quick search on WellRx shows that $3 bottle of Lipitor selling for between $9 and $57 in New York, as of this writing. Looking in Dallas, it could cost as little as $5.50.

That got me thinking — if prices vary so much within a city and between cities, are there any other patterns that could help consumers make more educated decisions about where they buy their prescriptions?

The Data

Given the utter lack of transparency in this industry, it is extremely difficult to find large, authoritative datasets on retail drug pricing. However, aggregators provide a useful, albeit imperfect, view on the state of the market. To get a better view on this, I built a scraping tool to perform automated searches on WellRx and log prices of the 30 most prescribed drugs across the 30 largest US cities, yielding approximately 12,000 data points after outliers were removed.

A histogram grid gives a sense of the price variation among different drugs. I added in red dashed lines to indicate the price paid by Medicaid based on data from NADAC. It doesn’t take a data scientist to note that the red lines tend toward the left of their respective charts, sometimes dramatically so.

It’s a poignant indication of how the government uses its bargaining power to bring down prices. In all developed countries, with the exception of the US, citizens expect their governments to apply this leverage to keep prices low for everyone. Here, only a select few — limited to low-income people and those over 65 — benefit from that leverage. The rest of us, meanwhile, are left to fend for ourselves. The data reveals this hasn't been working out in our favor.

Location, Location, Location

To better visualize geographic price variation, I constructed a heatmap of z-scores for each city-drug pair relative to the average price of each drug across all cities, with higher (red) z-scores indicating higher-than-average prices:

Most Expensive Least Expensive
1. San Francisco, CA 1. Los Angeles, CA
2. Chicago, IL 2. Riverside, CA
3. St. Louis, MO 3. Minneapolis, MN
4. Portland, OR 4. Houston, TX
5. Austin, TX 5. Detroit, MI

Interestingly, the price variations do not appear to correlate with differences in cost of living, given Los Angeles’s top spot on the list. Nor does it necessarily reflect effectiveness of policy, considering the most expensive city, San Francisco, is in the same state.

Turning my attention to pharmacies, I calculated average z-scores for the 20 large pharmacy franchises, with lower z-scores indicating lower prices:

Pharmacy Z-Score Pharmacy Z-Score


Given the distorted state of the market, this data does not give a definitive answer as to how much people are paying for their prescription drugs. However, the emergence of prescription price aggregators such as WellRx are making the market more transparent. The transparency they provide creates a quasi-free market among their users as pharmacies compete to appear at the top of the search results.

The data are also influenced by other externalities, such as WellRx’s ability to negotiate in a given market or the introduction of a major competitor such as Walmart, for example. Furthermore, the z-score metric is based on the average price, which fails to take user behavior into account. After all, most consumers shop around, which is why WellRx exists in the first place. That said, using the minimum price as a metric carries its share of caveats as well, as the cheapest option may not be available to all residents of a particular city.

Next Steps

This analysis serves as a starting point for further research on drug pricing. It would be interesting to overlay more data, such as demographic data and cost-of-living indices, to identify further trends. With more granular data, geocoding the data to facilitate analysis of higher-level regional trends as well as neighborhood-level micro-trends to highlight underserved communities.

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI