Starbucks Collectors on Fredorange.com

Amy Tzu-Yu Chen
Posted on Aug 22, 2016

Contributed by Amy Tzu-Yu Chen. Amy is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on her third class project - Web Scraping (due on the 6th week of the program).

You may also explore this project via Starbucks Secondhand Market Explorer on Shiny and R and Python Codes on Github.

Introduction

Lots of us have hobbies that outsiders could not easily understand. Poster collectors tirelessly hunt for vintage posters on eBay and at local antique shops. Sneaker collectors camp outside of Footlocker overnight just to get a pair of limited edition basketball shoes. The outsiders call these addicted collectors crazy for spending so much time and money on chasing after incomprehensible collectibles. However, collectors need not care about judgments from outsiders because they often have formed their own community, oftentimes online, where they make friends with collectors who share the same passion and insider language.

This project focuses on using web scraping, data visualization, and K-means clustering (an unsupervised method) to understand a special group of collectors: Starbucks collectors. These collectors often refer to themselves as "muggers" and are usually active on Facebook, eBay, and other online social platforms. Muggers actively exchange or purchase secondhand Starbucks products from other collectors around the world.

Web Scraping: Data Acquisition

One of the biggest online platforms for the Starbucks collector community is a user-contributed website called Fredorange. An Austrian Starbucks collector created the site as a virtual platform for muggers to share and contribute information about Starbucks products. Muggers can also use this site to set up deals. Fredorange.com has been the go-to website whenever old and new muggers want to find out latest Starbucks product release or to simply showcase their collection.

For this project, I am interested in creating a database for Starbucks collectibles and collector profiles in order to find patterns in supply and demand of mugs. I decided to scrape the following attributes(in red box) from the website.

  1. Mug Profile: name, city, country, edition, # of owners/users, # of seekers, # of traders
  2. User/Mugger Profile: username, city, country, # and percentage of mugs owned, # of mugs seeking, #of mugs trading

 

Screen Shot 2016-08-21 at 21.06.43Screen Shot 2016-08-21 at 21.11.54

 

 

 

 

 

 

 

 

 

 

 

 

 

Using the BeautifulSoup and pandas packages in Python, I scraped the desired attributes from Fredorange.com. The following is the code for scraping user profiles. You can find codes for scraping mug profiles here.

Data Cleaning

Data cleaning was difficult because all data were contributed by users. Fredorange does not give users options of city and country when they sign up for an account. As a result, there were many phrases with typos, or in different spellings or foreign languages in city and country attributes. Also, some users chose to only indicate their city but leave their country blank so it was another tedious task to figure out users' country from their city. The code below reflects a small part of this data cleaning process.

Starbucks Secondhand Market Explorer on R Shiny

Starbucks Secondhand Market Explorer is a Shiny app I created for non-collectors to understand how Starbucks muggers communicate and for collectors to visualize current supply and demand of Starbucks secondhand products. The website consists of four parts. The first tab shows the geographic distribution of Starbucks products and the collector community. The second tab shows a ranking system of product values based on K-means clustering result. The third tab shows Scarcity vs. Popularity graph for collectible mug editions. The last tab shows Scarcity vs. Popularity graph for popular collectable countries of origin.

Visualizing Starbucks Collector Community

After data cleaning, it was possible to visualize the geographic distribution of Starbucks products and collectors. Unsurprisingly, United States has the largest numbers of products and collectors. One can visualize the geographic distributions of collectors and products outside of United States by selecting maps that say "... excluding USA".

This is how the geographic distribution map of collectors outside of USA looks on the Shiny App. Other than United States, collecting Starbucks product is a popular activity in Canada, parts of western Europe, and parts of East Asia.

Screen Shot 2016-08-21 at 23.06.04

K-means Clustering on Mugs

Using K-means clustering algorithm with numbers of owners, seekers, and traders as inputs, I separated all mugs into 5 distinct groups. The choice of k was based on examination of within-cluster variances of different k's. The parameter, nstart, was set to 100 so that the algorithm was run 100 times before selecting the lowest within-cluster variance.

The K-means clustering result clearly separate all mugs into 5 groups with distinct characteristics, which I labeled into five categories: Easy to Find Mugs (Purple), Medium Difficulty (Red), Hard to Get Mugs (Yellow), Very Hard to Get Mugs (Green), and Inconclusive (Blue)The screenshot below shows how the ranking system based on K-means clustering looks on the Shiny App. Note that K-means clustering is an unsupervised algorithm so the ranking system was not a classification model. However, the clustering result does allow us to effectively have an idea about the value of each mug in the secondhand market. The five categories are self-explanatory except Inconclusive. One cannot easily label those mugs with few traders, seekers, and owners because there are two possible explanations. They might be new products, which just have not started trading yet, or they could simply be unpopular. 

Screen Shot 2016-08-21 at 23.47.41

Visualizing Collectible Editions & Countries of Origin

Four editions and six countries were included in the last two tabs of the Shiny app because they have relatively large numbers of "high difficulty mugs". Users can select only editions or country of interest to visualize each mug's supply and demand in the secondhand market. Details about each mug will show up when users hover over each data point.

This is how the Scarcity vs. Popularity graph looks on the Shiny App. Two new variables derived from seekers and owners are used to help visualize supply and demand of mugs from popular editions and counties of origin. Popularity is an index ranges from 0-1 using the formula, Seeker/max(Seeker). Scarcity is another index ranges from 0-1 using the formula, |Owner/max(Owner)-1|. For example, mugs with high popularity and high scarcity are most likely Hard to Find Mugs. Similarly, mugs with low popularity and low scarcity are most likely Easy to Find Mugs. 

Screen Shot 2016-08-22 at 00.22.23

 


Python Packages used:

  • BeautifulSoup
  • pandas

R Packages used:

  • shiny
  • dplyr
  • plotly
  • countrycode

View Complete R Codes here.

View Shiny Application here

About Author

Amy Tzu-Yu Chen

Amy Tzu-Yu Chen

Amy Tzu-Yu Chen is a recent college graduate who earned her BS in Statistics with three minors in German, Japanese, and Urban/Regional Studies from University of California, Los Angeles (UCLA). As a statistician, she is deeply passionate about...
View all posts by Amy Tzu-Yu Chen >

Leave a Comment

Avatar
Ricardo May 30, 2017
The standard manner for advert/spyware and adware packages to get on your computer is by attaching themselves to other belongings you download. So ensure you test the veracity of download sources before getting files.
Avatar
Dominic May 30, 2017
It's an remarkable piece of writing in support of all the web viewers; they will take advantage from it I am sure.
Avatar
fitness and health new york May 30, 2017
One instance is the University of Chicago Hospital, which has a Middle for the Surgical Treatment of Weight problems.
Avatar
Oren May 29, 2017
Identical for most individuals there isn't any ONE SIZE matches all.
Avatar
fitness and health new york May 29, 2017
Pro-Ana web sites represent themselves as online communities for those who are existing Anorexics and as such aren't meant (as is continuously assumed), to lure non-victims into the illness. To the people who join them, they are often 'a spot' that accepts their standing with out ethical censure or social stigma as well as a web site of advice, tips and support from fellow anorexics to help them become 'better' anorexics.
Avatar
computer-services-florida.com May 26, 2017
Thanks Pc Guys, thank you John for all your assist!
Avatar
healthy eating in new york May 25, 2017
a touch of garlic powder and paprika.
Avatar
Gavin May 24, 2017
COMPUTER efficiency can also be affected for those who download many movies, music, and movies. Remove all junk files to unlock the house of your computer. The drive C should have a free space of above 20%.
Avatar
Johnette May 24, 2017
Throughout that time we've discovered loads of ideas and methods that permit our technicians to restore your system quickly and successfully. Your gadget is important to you, and you'll trust that our expertise in iPhone repairs will guarantee your restore goes smoothly.
Avatar
removing all viruses May 23, 2017
Properly, we tend to connect to the net through a router, moderately than only a modem, and routers present a hardware firewall.
Avatar
free online games May 20, 2017
All such radical adjustments have made it an actual problem for folks. It has also has organic results, such as reducing the ability of retina.
Avatar
Lilian May 19, 2017
It detects both Mac and Windows malware (so you will not unwittingly infect your individual Mac, or your Windows buddies' computer systems). Avira enables you to schedule common scans, a feature you won't discover on many Mac antivirus applications.
Avatar
antivirus May 19, 2017
This can cease the pc from crashing.
Avatar
online shooter May 18, 2017
One might easily play these video games by utilizing a controller that had a couple of motion particular buttons.
Avatar
Alta May 16, 2017
The only choice left with a broken registry will be to format and reinstall the Home windows once more. To prevent all this, you can get your system checked by means of a free registry scan that is accessible on-line on some websites. These web sites have an lively code that downloads itself on your laptop partially and scans your computer systems' registry.
Avatar
Flossie May 15, 2017
Nationwide Institute of Health Web site: 'Consuming Problems', Pew Internet and American Life Venture (2001) 'Teenage Life On-line: the rise of the Instantaneous Message Era and Web's Influence on Friendships and Household Relationships'.
Avatar
Veta May 11, 2017
See Andrea in motion at /andrea/.
Avatar
Daisy May 9, 2017
Come again. Only with train, education about diet and a love of life can we keep at a wholesome weight.
Avatar
Best Computers May 8, 2017
If some one desires expert view concerning blogging and site-building afterward i suggest him/her to pay a quick visit this weblog, Keep up the good job.
Avatar
https://keep-pc-working.com May 8, 2017
These are transmitted via websites, e mail attachments, instantly over the web or by way of some other removable media.
Avatar
PC Games May 7, 2017
Asking questions are genuinely fastidious thing if you are not understanding anything completely, but this paragraph provides fastidious understanding even.
Avatar
Mona May 4, 2017
Thanks a bunch for sharing this with all folks you actually recognise what you're speaking about! Bookmarked. Please also visit my website =). We can have a hyperlink change agreement among us
Avatar
I wanna play online game April 28, 2017
Dizzywood is for youngsters ages 8-12 and permits your youngster to express their creativity by creating their own adventures, cooperating with different players, and having fun whereas they be taught.
Avatar
Malware Cleaner April 27, 2017
It's time to battle in your right to restore and defend local restore jobs—the corner mom-and-pop restore outlets that hold getting squeezed out.
Avatar
Guadalupe April 21, 2017
In want of reliable laptop help from a group of PC support consultants?
Avatar
Donnie April 21, 2017
A full xbox x clamp repair requires much more work removing the x clamps and cleansing and changing the thermal paste that conducts the warmth form the overheating processors out of the xbox by the two heatsinks.
Avatar
Jordan April 19, 2017
They try to sneak in and replicate on the pc. As soon as loaded, they typically begin to ship spam e mail from your computer without your data.
Avatar
anti-virus April 16, 2017
Tell them that you really want the appropriate to restore your purchases.
Avatar
fitness and health ny April 5, 2017
On your wellbeing, dependably counsel your specialist earlier than making any noteworthy dietary, nourishing or lifestyle modifications. The American Coronary heart Affiliation (AHA) for essentially the most part suggests an eating regimen with underneath 30% fats.
Avatar
computer services April 4, 2017
Convey your COMPUTER to us and we are able to restore most jobs proper right here in our Santa Fe Service Heart. There is no have to ship it off or drive out of town to get the job accomplished right. We will replace your failing onerous drive, LCD display screen, motherboard, DVD drive, energy supply or another points. Simply call us or come by our Santa Fe location for extra info.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp