Data Visualization of CVS's Next Location
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
For this project, CVS location and US population data were analyzed to suggest where CVS should consider opening new stores.
CVS pharmacy is the largest pharmacies in the United Sates, with almost 10,000 locations as of August, 2020. Last year they reported $256B in revenue, and were listed as #5 on the Fortune 500 list.
According to CVS's SEC filing, their major sources of income are pharmacy and in-store sales. In addition, the majority of the pharmacy sales are within network, as opposed to online or mail-order. Therefore, the vast majority of revenue for CVS comes from people physically walking into the store.
If CVS wants to continue growing, the easiest way seems to be opening new stores. However, CVS is challenged to open new stores, because they already operate in so many locations. Therefore, CVS should target areas with higher demand when opening stores.
In this project, the location and count of every CVS store in the United States were scraped from the CVS website1. Store location data was collect and processed with the Python BeautifulSoup package.
Population data was downloaded from the US Census website, and population estimates for 2019 we used for all states (including District of Columbia and Puerto Rico) and for all cities > 50,000 residents.
These two datasets were combined using Python Pandas to generate a data frame of CVS location by state and city population:
This data was graphed using Python Matplotlib to show the number of stores per capita. A linear trendline was calculated, and the slope was approximately 1 CVS store per 33,000 people. This trend was the same for both cities and states.
These states/cities can be grouped by whether they fall above or below the trendline (average number of stores per capita). The regions in green are where there are more than 33,000 stores per person, and the regions in red are where there are less.
Using this framework, we can assume that locations with below average CVS counts have a higher demand for new CVS stores. We can also rank the highest and lowest locations based on demand by population:
We can see that states like Washington and Colorado have a higher demand while states like Florida and Massachusetts have a lower demand. This could be do to the difference in ages between the states. Florida and Massachusetts have older populations, who would need more prescriptions than younger populations.
Looking at the cities, New York City and Los Angeles have a higher demand while Miami and Washington DC have a lower demand. This could be due to the high cost of real estate in NYC and LA, where the operating costs might hurt the profitability of a store.
We can cross-reference the city and state information to form a strategy about where CVS should expand.
If a state has a high demand for CVS and also has cities with a high demand, then CVS should make a larger presence in those cities. However, if a state has a high demand for CVS, but low demand in cities, then CVS should try expanding to new cities in that state.
In conclusion, if CVS wants to continue adding new stores, they could focus on expanding their presence in these cities where demand is high:
- Seattle, WA
- Denver, CO
- Portland, OR
- San Juan, PR
Alternatively, they could add stores in these states where demand is high by expanding to new towns:
This analysis could be expanded by adding more demographic data, such as age and income levels, to better predict where a store would be profitable. In addition, competitor data from walgreens, walmart, rite aid, etc. could be scraped from the web to look for areas with less competition.
1 Data was scraped from the CVS Store Locator online. The CVS robots.txt file was reviewed beforehand and the store locator pages were allowed.