Data Visualization on Starbucks Collectors

Posted on Aug 22, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Contributed by Amy Tzu-Yu Chen. Amy is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on her third class project - Web Scraping (due on the 6th week of the program).

You may also explore this project via Starbucks Secondhand Market Explorer on Shiny and R and Python Codes on Github.

Introduction

Data shows lots of us have hobbies that outsiders could not easily understand. Poster collectors tirelessly hunt for vintage posters on eBay and at local antique shops. Sneaker collectors camp outside of Footlocker overnight just to get a pair of limited edition basketball shoes. The outsiders call these addicted collectors crazy for spending so much time and money on chasing after incomprehensible collectibles. However, collectors need not care about judgments from outsiders because they often have formed their own community, oftentimes online, where they make friends with collectors who share the same passion and insider language.

This project focuses on using web scraping, data visualization, and K-means clustering (an unsupervised method) to understand a special group of collectors: Starbucks collectors. These collectors often refer to themselves as "muggers" and are usually active on Facebook, eBay, and other online social platforms. Muggers actively exchange or purchase secondhand Starbucks products from other collectors around the world.

Web Scraping: Data Acquisition

One of the biggest online platforms for the Starbucks collector community is a user-contributed website called Fredorange. An Austrian Starbucks collector created the site as a virtual platform for muggers to share and contribute information about Starbucks products. Muggers can also use this site to set up deals. Fredorange.com has been the go-to website whenever old and new muggers want to find out latest Starbucks product release or to simply showcase their collection.

For this project, I am interested in creating a database for Starbucks collectibles and collector profiles in order to find patterns in supply and demand of mugs. I decided to scrape the following attributes(in red box) from the website.

  1. Mug Profile: name, city, country, edition, # of owners/users, # of seekers, # of traders
  2. User/Mugger Profile: username, city, country, # and percentage of mugs owned, # of mugs seeking, #of mugs trading

 

Data Visualization on Starbucks Collectors Data Visualization on Starbucks Collectors

 

 

 

 

 

 

 

 

 

 

 

 

 

Using the BeautifulSoup and pandas packages in Python, I scraped the desired attributes from Fredorange.com. The following is the code for scraping user profiles. You can find codes for scraping mug profiles here.

Data Cleaning

Data cleaning was difficult because all data were contributed by users. Fredorange does not give users options of city and country when they sign up for an account. As a result, there were many phrases with typos, or in different spellings or foreign languages in city and country attributes. Also, some users chose to only indicate their city but leave their country blank so it was another tedious task to figure out users' country from their city. The code below reflects a small part of this data cleaning process.

Starbucks Secondhand Market Explorer on R Shiny

Starbucks Secondhand Market Explorer is a Shiny app I created for non-collectors to understand how Starbucks muggers communicate and for collectors to visualize current supply and demand of Starbucks secondhand products. The website consists of four parts. The first tab shows the geographic distribution of Starbucks products and the collector community. The second tab shows a ranking system of product values based on K-means clustering result. The third tab shows Scarcity vs. Popularity graph for collectible mug editions. The last tab shows Scarcity vs. Popularity graph for popular collectable countries of origin.

Data Visualizing Starbucks Collector Community

After data cleaning, it was possible to visualize the geographic distribution of Starbucks products and collectors. Unsurprisingly, United States has the largest numbers of products and collectors. One can visualize the geographic distributions of collectors and products outside of United States by selecting maps that say "... excluding USA".

This is how the geographic distribution map of collectors outside of USA looks on the Shiny App. Other than United States, collecting Starbucks product is a popular activity in Canada, parts of western Europe, and parts of East Asia.

Data Visualization on Starbucks Collectors

K-means Clustering on Mugs

Using K-means clustering algorithm with numbers of owners, seekers, and traders as inputs, I separated all mugs into 5 distinct groups. The choice of k was based on examination of within-cluster variances of different k's. The parameter, nstart, was set to 100 so that the algorithm was run 100 times before selecting the lowest within-cluster variance.

The K-means clustering result clearly separate all mugs into 5 groups with distinct characteristics, which I labeled into five categories: Easy to Find Mugs (Purple), Medium Difficulty (Red), Hard to Get Mugs (Yellow), Very Hard to Get Mugs (Green), and Inconclusive (Blue)The screenshot below shows how the ranking system based on K-means clustering looks on the Shiny App.

Note that K-means clustering is an unsupervised algorithm so the ranking system was not a classification model. However, the clustering result does allow us to effectively have an idea about the value of each mug in the secondhand market. The five categories are self-explanatory except Inconclusive. One cannot easily label those mugs with few traders, seekers, and owners because there are two possible explanations. They might be new products, which just have not started trading yet, or they could simply be unpopular. 

Screen Shot 2016-08-21 at 23.47.41

Data Visualizing Collectible Editions & Countries of Origin

Four editions and six countries were included in the last two tabs of the Shiny app because they have relatively large numbers of "high difficulty mugs". Users can select only editions or country of interest to visualize each mug's supply and demand in the secondhand market. Details about each mug will show up when users hover over each data point.

This is how the Scarcity vs. Popularity graph looks on the Shiny App. Two new variables derived from seekers and owners are used to help visualize supply and demand of mugs from popular editions and counties of origin. Popularity is an index ranges from 0-1 using the formula, Seeker/max(Seeker). Scarcity is another index ranges from 0-1 using the formula, |Owner/max(Owner)-1|. For example, mugs with high popularity and high scarcity are most likely Hard to Find Mugs. Similarly, mugs with low popularity and low scarcity are most likely Easy to Find Mugs. 

Screen Shot 2016-08-22 at 00.22.23

 


Python Packages used:

  • BeautifulSoup
  • pandas

R Packages used:

  • shiny
  • dplyr
  • plotly
  • countrycode

View Complete R Codes here.

View Shiny Application here

About Author

Amy Tzu-Yu Chen

Amy Tzu-Yu Chen is a recent college graduate who earned her BS in Statistics with three minors in German, Japanese, and Urban/Regional Studies from University of California, Los Angeles (UCLA). As a statistician, she is deeply passionate about...
View all posts by Amy Tzu-Yu Chen >

Leave a Comment

Ricardo May 30, 2017
The standard manner for advert/spyware and adware packages to get on your computer is by attaching themselves to other belongings you download. So ensure you test the veracity of download sources before getting files.
Dominic May 30, 2017
It's an remarkable piece of writing in support of all the web viewers; they will take advantage from it I am sure.
fitness and health new york May 30, 2017
One instance is the University of Chicago Hospital, which has a Middle for the Surgical Treatment of Weight problems.
Oren May 29, 2017
Identical for most individuals there isn't any ONE SIZE matches all.
fitness and health new york May 29, 2017
Pro-Ana web sites represent themselves as online communities for those who are existing Anorexics and as such aren't meant (as is continuously assumed), to lure non-victims into the illness. To the people who join them, they are often 'a spot' that accepts their standing with out ethical censure or social stigma as well as a web site of advice, tips and support from fellow anorexics to help them become 'better' anorexics.
computer-services-florida.com May 26, 2017
Thanks Pc Guys, thank you John for all your assist!
healthy eating in new york May 25, 2017
a touch of garlic powder and paprika.
Gavin May 24, 2017
COMPUTER efficiency can also be affected for those who download many movies, music, and movies. Remove all junk files to unlock the house of your computer. The drive C should have a free space of above 20%.
Johnette May 24, 2017
Throughout that time we've discovered loads of ideas and methods that permit our technicians to restore your system quickly and successfully. Your gadget is important to you, and you'll trust that our expertise in iPhone repairs will guarantee your restore goes smoothly.
removing all viruses May 23, 2017
Properly, we tend to connect to the net through a router, moderately than only a modem, and routers present a hardware firewall.
free online games May 20, 2017
All such radical adjustments have made it an actual problem for folks. It has also has organic results, such as reducing the ability of retina.
Lilian May 19, 2017
It detects both Mac and Windows malware (so you will not unwittingly infect your individual Mac, or your Windows buddies' computer systems). Avira enables you to schedule common scans, a feature you won't discover on many Mac antivirus applications.
antivirus May 19, 2017
This can cease the pc from crashing.
online shooter May 18, 2017
One might easily play these video games by utilizing a controller that had a couple of motion particular buttons.
Alta May 16, 2017
The only choice left with a broken registry will be to format and reinstall the Home windows once more. To prevent all this, you can get your system checked by means of a free registry scan that is accessible on-line on some websites. These web sites have an lively code that downloads itself on your laptop partially and scans your computer systems' registry.
Flossie May 15, 2017
Nationwide Institute of Health Web site: 'Consuming Problems', Pew Internet and American Life Venture (2001) 'Teenage Life On-line: the rise of the Instantaneous Message Era and Web's Influence on Friendships and Household Relationships'.
Veta May 11, 2017
See Andrea in motion at /andrea/.
Daisy May 9, 2017
Come again. Only with train, education about diet and a love of life can we keep at a wholesome weight.
Best Computers May 8, 2017
If some one desires expert view concerning blogging and site-building afterward i suggest him/her to pay a quick visit this weblog, Keep up the good job.
https://keep-pc-working.com May 8, 2017
These are transmitted via websites, e mail attachments, instantly over the web or by way of some other removable media.
PC Games May 7, 2017
Asking questions are genuinely fastidious thing if you are not understanding anything completely, but this paragraph provides fastidious understanding even.
Mona May 4, 2017
Thanks a bunch for sharing this with all folks you actually recognise what you're speaking about! Bookmarked. Please also visit my website =). We can have a hyperlink change agreement among us
I wanna play online game April 28, 2017
Dizzywood is for youngsters ages 8-12 and permits your youngster to express their creativity by creating their own adventures, cooperating with different players, and having fun whereas they be taught.
Malware Cleaner April 27, 2017
It's time to battle in your right to restore and defend local restore jobs—the corner mom-and-pop restore outlets that hold getting squeezed out.
Guadalupe April 21, 2017
In want of reliable laptop help from a group of PC support consultants?
Donnie April 21, 2017
A full xbox x clamp repair requires much more work removing the x clamps and cleansing and changing the thermal paste that conducts the warmth form the overheating processors out of the xbox by the two heatsinks.
Jordan April 19, 2017
They try to sneak in and replicate on the pc. As soon as loaded, they typically begin to ship spam e mail from your computer without your data.
anti-virus April 16, 2017
Tell them that you really want the appropriate to restore your purchases.
fitness and health ny April 5, 2017
On your wellbeing, dependably counsel your specialist earlier than making any noteworthy dietary, nourishing or lifestyle modifications. The American Coronary heart Affiliation (AHA) for essentially the most part suggests an eating regimen with underneath 30% fats.
computer services April 4, 2017
Convey your COMPUTER to us and we are able to restore most jobs proper right here in our Santa Fe Service Heart. There is no have to ship it off or drive out of town to get the job accomplished right. We will replace your failing onerous drive, LCD display screen, motherboard, DVD drive, energy supply or another points. Simply call us or come by our Santa Fe location for extra info.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI