Data Visualization on Starbucks Collectors

Amy Tzu-Yu Chen

Posted on Aug 22, 2016

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Amy Tzu-Yu Chen. Amy is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on her third class project - Web Scraping (due on the 6th week of the program).

You may also explore this project via Starbucks Secondhand Market Explorer on Shiny and R and Python Codes on Github.

Introduction

Data shows lots of us have hobbies that outsiders could not easily understand. Poster collectors tirelessly hunt for vintage posters on eBay and at local antique shops. Sneaker collectors camp outside of Footlocker overnight just to get a pair of limited edition basketball shoes. The outsiders call these addicted collectors crazy for spending so much time and money on chasing after incomprehensible collectibles. However, collectors need not care about judgments from outsiders because they often have formed their own community, oftentimes online, where they make friends with collectors who share the same passion and insider language.

This project focuses on using web scraping, data visualization, and K-means clustering (an unsupervised method) to understand a special group of collectors: Starbucks collectors. These collectors often refer to themselves as "muggers" and are usually active on Facebook, eBay, and other online social platforms. Muggers actively exchange or purchase secondhand Starbucks products from other collectors around the world.

Web Scraping: Data Acquisition

One of the biggest online platforms for the Starbucks collector community is a user-contributed website called Fredorange. An Austrian Starbucks collector created the site as a virtual platform for muggers to share and contribute information about Starbucks products. Muggers can also use this site to set up deals. Fredorange.com has been the go-to website whenever old and new muggers want to find out latest Starbucks product release or to simply showcase their collection.

For this project, I am interested in creating a database for Starbucks collectibles and collector profiles in order to find patterns in supply and demand of mugs. I decided to scrape the following attributes(in red box) from the website.

Mug Profile: name, city, country, edition, # of owners/users, # of seekers, # of traders
User/Mugger Profile: username, city, country, # and percentage of mugs owned, # of mugs seeking, #of mugs trading

Using the BeautifulSoup and pandas packages in Python, I scraped the desired attributes from Fredorange.com. The following is the code for scraping user profiles. You can find codes for scraping mug profiles here.

Data Cleaning

Data cleaning was difficult because all data were contributed by users. Fredorange does not give users options of city and country when they sign up for an account. As a result, there were many phrases with typos, or in different spellings or foreign languages in city and country attributes. Also, some users chose to only indicate their city but leave their country blank so it was another tedious task to figure out users' country from their city. The code below reflects a small part of this data cleaning process.

Starbucks Secondhand Market Explorer on R Shiny

Starbucks Secondhand Market Explorer is a Shiny app I created for non-collectors to understand how Starbucks muggers communicate and for collectors to visualize current supply and demand of Starbucks secondhand products. The website consists of four parts. The first tab shows the geographic distribution of Starbucks products and the collector community. The second tab shows a ranking system of product values based on K-means clustering result. The third tab shows Scarcity vs. Popularity graph for collectible mug editions. The last tab shows Scarcity vs. Popularity graph for popular collectable countries of origin.

Data Visualizing Starbucks Collector Community

After data cleaning, it was possible to visualize the geographic distribution of Starbucks products and collectors. Unsurprisingly, United States has the largest numbers of products and collectors. One can visualize the geographic distributions of collectors and products outside of United States by selecting maps that say "... excluding USA".

This is how the geographic distribution map of collectors outside of USA looks on the Shiny App. Other than United States, collecting Starbucks product is a popular activity in Canada, parts of western Europe, and parts of East Asia.

K-means Clustering on Mugs

Using K-means clustering algorithm with numbers of owners, seekers, and traders as inputs, I separated all mugs into 5 distinct groups. The choice of k was based on examination of within-cluster variances of different k's. The parameter, nstart, was set to 100 so that the algorithm was run 100 times before selecting the lowest within-cluster variance.

The K-means clustering result clearly separate all mugs into 5 groups with distinct characteristics, which I labeled into five categories: Easy to Find Mugs (Purple), Medium Difficulty (Red), Hard to Get Mugs (Yellow), Very Hard to Get Mugs (Green), and Inconclusive (Blue). The screenshot below shows how the ranking system based on K-means clustering looks on the Shiny App.

Note that K-means clustering is an unsupervised algorithm so the ranking system was not a classification model. However, the clustering result does allow us to effectively have an idea about the value of each mug in the secondhand market. The five categories are self-explanatory except Inconclusive. One cannot easily label those mugs with few traders, seekers, and owners because there are two possible explanations. They might be new products, which just have not started trading yet, or they could simply be unpopular.

Data Visualizing Collectible Editions & Countries of Origin

Four editions and six countries were included in the last two tabs of the Shiny app because they have relatively large numbers of "high difficulty mugs". Users can select only editions or country of interest to visualize each mug's supply and demand in the secondhand market. Details about each mug will show up when users hover over each data point.

This is how the Scarcity vs. Popularity graph looks on the Shiny App. Two new variables derived from seekers and owners are used to help visualize supply and demand of mugs from popular editions and counties of origin. Popularity is an index ranges from 0-1 using the formula, Seeker/max(Seeker). Scarcity is another index ranges from 0-1 using the formula, |Owner/max(Owner)-1|. For example, mugs with high popularity and high scarcity are most likely Hard to Find Mugs. Similarly, mugs with low popularity and low scarcity are most likely Easy to Find Mugs.

Screen Shot 2016-08-22 at 00.22.23

Python Packages used:

BeautifulSoup
pandas

R Packages used:

shiny
dplyr
plotly
countrycode

View Complete R Codes here.

View Shiny Application here

About Author

Amy Tzu-Yu Chen

Amy Tzu-Yu Chen is a recent college graduate who earned her BS in Statistics with three minors in German, Japanese, and Urban/Regional Studies from University of California, Los Angeles (UCLA). As a statistician, she is deeply passionate about...

View all posts by Amy Tzu-Yu Chen >

Cancel reply

You must be logged in to post a comment.

Ricardo May 30, 2017

The standard manner for advert/spyware and adware packages to get on your computer is by attaching themselves to other belongings you download. So ensure you test the veracity of download sources before getting files.

Dominic May 30, 2017

It's an remarkable piece of writing in support of all the web viewers; they will take advantage from it I am sure.

fitness and health new york May 30, 2017

One instance is the University of Chicago Hospital, which has a Middle for the Surgical Treatment of Weight problems.

Oren May 29, 2017

Identical for most individuals there isn't any ONE SIZE matches all.

fitness and health new york May 29, 2017

Pro-Ana web sites represent themselves as online communities for those who are existing Anorexics and as such aren't meant (as is continuously assumed), to lure non-victims into the illness. To the people who join them, they are often 'a spot' that accepts their standing with out ethical censure or social stigma as well as a web site of advice, tips and support from fellow anorexics to help them become 'better' anorexics.

computer-services-florida.com May 26, 2017

Thanks Pc Guys, thank you John for all your assist!

healthy eating in new york May 25, 2017

a touch of garlic powder and paprika.

Gavin May 24, 2017

COMPUTER efficiency can also be affected for those who download many movies, music, and movies. Remove all junk files to unlock the house of your computer. The drive C should have a free space of above 20%.

Johnette May 24, 2017

Throughout that time we've discovered loads of ideas and methods that permit our technicians to restore your system quickly and successfully. Your gadget is important to you, and you'll trust that our expertise in iPhone repairs will guarantee your restore goes smoothly.

removing all viruses May 23, 2017

Properly, we tend to connect to the net through a router, moderately than only a modem, and routers present a hardware firewall.

free online games May 20, 2017

All such radical adjustments have made it an actual problem for folks. It has also has organic results, such as reducing the ability of retina.

Lilian May 19, 2017

It detects both Mac and Windows malware (so you will not unwittingly infect your individual Mac, or your Windows buddies' computer systems). Avira enables you to schedule common scans, a feature you won't discover on many Mac antivirus applications.

antivirus May 19, 2017

This can cease the pc from crashing.

online shooter May 18, 2017

One might easily play these video games by utilizing a controller that had a couple of motion particular buttons.

Alta May 16, 2017

The only choice left with a broken registry will be to format and reinstall the Home windows once more. To prevent all this, you can get your system checked by means of a free registry scan that is accessible on-line on some websites. These web sites have an lively code that downloads itself on your laptop partially and scans your computer systems' registry.

Flossie May 15, 2017

Nationwide Institute of Health Web site: 'Consuming Problems', Pew Internet and American Life Venture (2001) 'Teenage Life On-line: the rise of the Instantaneous Message Era and Web's Influence on Friendships and Household Relationships'.

Veta May 11, 2017

See Andrea in motion at /andrea/.

Daisy May 9, 2017

Come again. Only with train, education about diet and a love of life can we keep at a wholesome weight.

Best Computers May 8, 2017

If some one desires expert view concerning blogging and site-building afterward i suggest him/her to pay a quick visit this weblog, Keep up the good job.

https://keep-pc-working.com May 8, 2017

These are transmitted via websites, e mail attachments, instantly over the web or by way of some other removable media.

PC Games May 7, 2017

Asking questions are genuinely fastidious thing if you are not understanding anything completely, but this paragraph provides fastidious understanding even.

Mona May 4, 2017

Thanks a bunch for sharing this with all folks you actually recognise what you're speaking about! Bookmarked. Please also visit my website =). We can have a hyperlink change agreement among us

I wanna play online game April 28, 2017

Dizzywood is for youngsters ages 8-12 and permits your youngster to express their creativity by creating their own adventures, cooperating with different players, and having fun whereas they be taught.

Malware Cleaner April 27, 2017

It's time to battle in your right to restore and defend local restore jobs—the corner mom-and-pop restore outlets that hold getting squeezed out.

Guadalupe April 21, 2017

In want of reliable laptop help from a group of PC support consultants?

Donnie April 21, 2017

A full xbox x clamp repair requires much more work removing the x clamps and cleansing and changing the thermal paste that conducts the warmth form the overheating processors out of the xbox by the two heatsinks.

Jordan April 19, 2017

They try to sneak in and replicate on the pc. As soon as loaded, they typically begin to ship spam e mail from your computer without your data.

anti-virus April 16, 2017

Tell them that you really want the appropriate to restore your purchases.

fitness and health ny April 5, 2017

On your wellbeing, dependably counsel your specialist earlier than making any noteworthy dietary, nourishing or lifestyle modifications. The American Coronary heart Affiliation (AHA) for essentially the most part suggests an eating regimen with underneath 30% fats.

computer services April 4, 2017

Convey your COMPUTER to us and we are able to restore most jobs proper right here in our Santa Fe Service Heart. There is no have to ship it off or drive out of town to get the job accomplished right. We will replace your failing onerous drive, LCD display screen, motherboard, DVD drive, energy supply or another points. Simply call us or come by our Santa Fe location for extra info.