Data Visualization of CVS's Next Location

Posted on Aug 3, 2020
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

For this project, CVS location and US population data were analyzed to suggest where CVS should consider opening new stores.


CVS pharmacy is the largest pharmacies in the United Sates, with almost 10,000 locations as of August, 2020. Last year they reported $256B in revenue, and were listed as #5 on the Fortune 500 list.

According to CVS's SEC filing, their major sources of income are pharmacy and in-store sales. In addition, the majority of the pharmacy sales are within network, as opposed to online or mail-order. Therefore, the vast majority of revenue for CVS comes from people physically walking into the store.

If CVS wants to continue growing, the easiest way seems to be opening new stores. However, CVS is challenged to open new stores, because they already operate in so many locations. Therefore, CVS should target areas with higher demand when opening stores.

Data Collection

In this project, the location and count of every CVS store in the United States were scraped from the CVS website1. Store location data was collect and processed with the Python BeautifulSoup package.

Population data was downloaded from the US Census website, and population estimates for 2019 we used for all states (including District of Columbia and Puerto Rico) and for all cities > 50,000 residents.

These two datasets were combined using Python Pandas to generate a data frame of CVS location by state and city population:

Data Visualization of CVS's Next Location

Data Analysis

This data was graphed using Python Matplotlib to show the number of stores per capita. A linear trendline was calculated, and the slope was approximately 1 CVS store per 33,000 people. This trend was the same for both cities and states.

These states/cities can be grouped by whether they fall above or below the trendline (average number of stores per capita). The regions in green are where there are more than 33,000 stores per person, and the regions in red are where there are less.

Using this framework, we can assume that locations with below average CVS counts have a higher demand for new CVS stores. We can also rank the highest and lowest locations based on demand by population:

We can see that states like Washington and Colorado have a higher demand while states like Florida and Massachusetts have a lower demand. This could be do to the difference in ages between the states. Florida and Massachusetts have older populations, who would need more prescriptions than younger populations.

Looking at the cities, New York City and Los Angeles have a higher demand while Miami and Washington DC have a lower demand. This could be due to the high cost of real estate in NYC and LA, where the operating costs might hurt the profitability of a store.


We can cross-reference the city and state information to form a strategy about where CVS should expand.

If a state has a high demand for CVS and also has cities with a high demand, then CVS should make a larger presence in those cities. However, if a state has a high demand for CVS, but low demand in cities, then CVS should try expanding to new cities in that state.


In conclusion, if CVS wants to continue adding new stores, they could focus on expanding their presence in these cities where demand is high:

  • Seattle, WA
  • Denver, CO
  • Portland, OR
  • San Juan, PR

Alternatively, they could add stores in these states where demand is high by expanding to new towns:

  • Wisconsin
  • Arkansas
  • Utah
  • Iowa

Future Work

This analysis could be expanded by adding more demographic data, such as age and income levels, to better predict where a store would be profitable. In addition, competitor data from walgreens, walmart, rite aid, etc. could be scraped from the web to look for areas with less competition.


1 Data was scraped from the CVS Store Locator online. The CVS robots.txt file was reviewed beforehand and the store locator pages were allowed.

About Author

Stephen Kita

Stephen is a biomedical engineer who likes to work with data and develop innovative healthcare products. He is an excellent problem-solver with a diverse background in entrepreneurship.
View all posts by Stephen Kita >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI