Data Study on Starbucks Coffee Store Amenities
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
As I’m walking from Grand Central Station to the NYC Data Science Academy for my first day of class, I’m thinking of which Starbucks Coffee store serves breakfast sandwiches. It is a journey in trial and error and wasted time as I walk into stores along my path until I find the right one. Providing a solution to this problem was the basis for project three of the NYC Data Science boot camp.
The scope of the project (solve a business problem using web scraping technology and present your insight) was a great opportunity to use Scrapy (a web scraping framework) for data capture, R Studio for data analysis and CARTO to prototype a web based product. My solution allows end users to view the Starbucks Coffee Store amenities and their locations in one place. Go ahead, give it a try here.
- Scrapy for web scraping stores info including their amenities via the Starbucks Coffee Store Locator website
- R Studio for EDA (Exploratory Data Analysis)
- CARTO for rapid web-based product development
- MCA (Multiple Correspondence Analysis) for analyzing commonalities between the amenities
You can view the CARTO based end user product here.
The store locations are distributed across multiple URLs, spanning New York City. The web browser image below highlights the geospatial (longitude / latitude) coordinates within the URL.
Most of the processing work is performed within Scrapy, the magic sauce that allows you to scrape web sites, munge and process the data for analysis and feed to CARTO for map visualization. The following steps were performed within Scrapy:
- Regex (regular expressions) for creating amenities features - used as filter criteria in CARTO map layers
- HTML to wrap store locations as hyperlinks - CARTO renders fields in HTML format out of the box
The Spyder framework integrates web scraping and Python programming for a flexible and adaptable solution to capture and process web-based content. R Studio provides a smooth interface and great libraries for EDA to gain insight from the data. The CARTO dataset upload and mapping process is intuitive and allows you to visualize your data on base maps within minutes.
EDA in R Studio identified the most common and least common amenities within stores.
Five most common amenities:
- LB: LaBoulange
- WA: Oven-warmed Food
- LU: Lunch
- DR: Digital Rewards
- XO: Mobile Order and Pay
Five least common amenities:
- DT: Drive-Through
- EM: Starbucks Evenings
- WT: tbd - Walk-T
- FZ: Fizzio Handcrafted Sodas
- hrs24: Open 24 hours per day
MCA(Multiple Correspondence Analysis) was performed to analyze the systematic patterns of variations with the amenities. The process requires the features to be of categorical data type (factors in an R dataframe).
Based on eigenvector values, the clusters identify amenities with the most commonalities. In the diagram above, the cluster on the bottom right represents the most common amenities across 200 store locations. The cluster to the left has a lower distribution across store locations. Amenity FZ (Fizzio Handcrafted Sodas) stands out as having high direction from the zero intercept. It is the only amenity found in one store location within NYC and perhaps worthy of highlighting to Starbucks Coffee consumers.
Combining Open Source and vendor applications (Scrapy, R Studio, CARTO) allowed me to deliver an interactive product that uses a website as the data source within a two week time line. The web app prototype enables end users to visually explore, analyze and find Starbucks Coffee stores with the most / least common amenities. But most importantly, you can view a store's amenities with a minimum amount of clicks.
- Highlight the Most / Least common amenities when the user hovers over a store location in CARTO
- Create a Map Layer for Shiny application Citibike Analysis, allowing users to locate Starbucks Coffee stores based on amenities and proximity
- Concept / Development / Design: Chris Valle, Joshua Litven, Fred Cheung, Conred Wang, Chris Makris, Zheyu (Sammy) Zhang
- CARTO End User Testing: Carlos Peguero, Jeffrey Regalado, Yasmin Regalado, Cris Macario, Alexander Ryzhkov
Source code is available at GitHub.