NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > What Toys Can Tell Us: Insight and Discussion

What Toys Can Tell Us: Insight and Discussion

Emanuel Pizana
Posted on Oct 21, 2019
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

eBay is second only to Amazon in terms of e-commerce sales volume in North America, surpassing  Apple and Walmart. 

While 'electronics' is the largest category in terms of sales,  the 'toys' category is uniquely positioned to give insight into current consumer trends, historical appetite, and - ultimately - measuring the strength of a brand. This information has implications for both the individual as well as the institution.

Propagation of names such as 'Iron Man' and 'Thanos' has facilitated a transition from obscure references to near-household status. While quantifying this transition is beyond the scope of this project, it has lead to the main question: Is there a way to track this propagation in a way that is neatly encapsulated by a consumer product? There is, and the answer is toys. Thus, let's begin by asking additional questions: How much does  a franchise matter to a brand? What are spending habits of the toy shopper?  

In our exploration, we will specifically take a look at the Action Figure Category.

Approach and Challenges

In theory, the questions are quite apparent, but in practice, the retrieval of sold listings proved to be a challenge. On any given day, the completed items in eBay's Action Figure category numbers over 1 million (reflecting several weeks' worth of data, and an ideal beginning sample size). At 100 results per page, this implies that there are over 10,000 pages worth of completed listings, but this proved to be elusive.

The data was scraped using python's scrapy package. The first crawl resulted in only a little over 8000 listings returned. Upon further examination, scrapy's response log indicated that only roughly 160 pages were scraped at 50 results per page. Tweaking settings and several adjustments to the code lead to only marginal improvement; a second scrape produced only 180 pages.

Thus, the first takeaway for improvement is readily apparent: either a way to force eBay's servers to return the 1M+ results is devised, or the crawler should be run every night, preferably over at least a 30 day time interval, with each iteration merged appropriately to avoid duplicating listings. Nevertheless, even with only 2 days' worth of sold listings, we can start asking questions, and envisioning how the answers can be deduced. The total sales over a 2 day period was $625,642.

The second major challenge was the user-populated 'Item Specifics' box. There are upwards of 22 unique fields that the seller can populate in this box, but as nearly every field is optional, the information varied widely from listing to listing. 

 

Analysis

Key to the analysis was 'brand' field.  Luckily, blank/omitted listings only comprised a little under 2.5% of all sales. More challenging was the breadth of spelling variations provided for multiple brands. Certainly, future improvements to the project would implement increasingly complex regex expressions to correct/anticipate the user-provided data. Nevertheless, with some rigorous cleaning, the impact by brand was accurately captured, leading us to view immediate results.

2 obvious "brands" stand out: Marvel and Star Wars. These are not in fact brands but are franchises/intellectual property (incidentally, both belonging to Disney). This illustrates that the user populated data can be "lazy"; the user populates what is foremost and easy-to-identify. Correction for this is highly complex, and mainly dependent on whether the seller provided the brand name either in the auction title or the description. So, for the scope of the project, I left these "brands" intact.

The below illustrations were made from a combination of the seaborn, WordCloud, and plotly packages (unfortunately the interactive nature of plotly's graphs is lost when translating to a blog post).

Findings

A cursory look at sales by brand shows a very clear trend: toy brand Hasbro is a powerhouse:

Hasbro sales volume through 48 hoursโ€™ worth of  data is more than the next 3 largest brands combined, with shoppers purchasing over $150k worth of new and used toys. Of note is the defunct Kenner at 4th place, with roughly $30k, implying that collectors are driving those particular figures. This is a bit more clear when viewing sales by brand broken down by condition.

 

Together, these 20 constitute the heavy majority of the 2 day sales data. The collector segment is well-represented with used purchases driving sales in LJN, Mego, and Kenner, all either defunct or absorbed.

An inspection of the top 5 selling sub-categories shows interesting results.

Had there been a larger overall sample of data across at least 6 months, we could pose a hypothesis test with the H0 that toy sales are independent of current trends in film, television, streaming media, and other platforms. However, with such a large disparity in sales, we can infer that licensing and IP in the form of strongly supported media franchises do exceptionally well. I should note that while 'brand' is user-provided as is thus inconsistent, the sub-categories shown are mandatory fields, and so we can trust these segmentations with full confidence. 

Notes

The only confusion would be how much overlap there is between "Comic Book Heroes" and "TV, Movies & Video Games." That is currently beyond the scope of the project, but the question is interesting. 

It is also noteworthy that the third best selling category, "Transformers & Robots," is given distinction form "Military & Adventure," a point that will be revisited shortly.

I shift to a slightly more "bidder/buyer-centric" view here. This facet (from plotly) shows the average selling price, broken out by "buy-it-now" and "auction" format, classified by new/used condition. The top 5 categories span the columns, while brands populate the rows. Notice that NECA is included while the brand "Marvel" is excluded; I did this to make this plot strictly brand (i.e., manufacturer/producer)-based rather than franchise/IP-based. As such, the presence of "Unbranded" represents knock-off and unlicensed toys. 

Some takeaways from the above: collectors drive the highest prices, with Mattel's proprietary IP, "Masters of the Universe" commanding BIN prices in excess of $100 per item on average within the Military & Adventure category. The "Transformers & Robots" category reveals where buyers have the strongest presence in the Hasbro brand in terms of average price. This implies sellers are uncertain of the value of their goods, and elect for discovery through auction processes, with bidders also meeting their asks.

Bid dispersion for the top 5 categories across all brands is strongest in the TVMVG category, but the Transformers & Robots category exhibits the most bids in the 75 percentile (slightly under a tendency of 25 bids).

Comepetition

When we shift to looking at the top 5 categories with only the top brands, competition is less frequent but is highly centered in the Comic Book Heroes category. Again, this suggests sellers are uncertain of the value of their goods, but when viewed in conjunction with the average selling price above, toys in the Comic Book Heroes category are relatively inexpensive and mostly for new items.

An outlier from the top brands is Hot Toys, which required its own plot.

 

Focusing strictly on the high-end collectors' market, buyers are more than willing to pay on average $200 and up per item.

I examined the top brands a bit more, curious to see how the brands were fairing in the top 5 categories. Again, the graph was particularly illustrative for Hasbro 

This graph was done in plotly, so unfortunately some details are lost with the static image. The vertical bars within the color segments signal demarcations between used and new sales within each category. Still, it's apparent how heavily concentrated every brand is in the TVMVG category, though again with the exception of Hasbro. 

Hasbro

Hasbro has strong diversification away from the comic book/movie related franchises largely due to their own proprietary IPs: GI Joe and Transformers. A Wordcloud pull (shown above) from all auction titles across all brands illustrates just how strong Hasbro and its IPs/licences are, with "Star Wars" being an extremely common string in auction titles, along with "Marvel" and Hasbro's name itself.

Restricting the WordCloud to only listings with Hasbro indicated as the brand yields similar results, with "Spider-Man" and "Optimus Prime" even showing up. Again, the takeaway is clear, Hasro is the "best diversified" of the top brands, with a seemingly unbeatable combination of top licenses (Star Wars, Marvel) and proprietary IP (Transformers, GI Joe).

Mattel

Mattel is the next best โ€˜well-roundedโ€™ after Hasbro, with strong support from collectors for their propietary IP, โ€˜Masters of the Universeโ€™ as well as contemporary DC and sports/WWE. One very significant factor to consider: eBay does not include Barbie in its Action Figure Category, instead dedicating an entire section under "Dolls" for Barbie figures. 

Hot Toys

Hot Toys licenses movie and tv-show related properties to produce high-end goods. Their market, as shown by the high average ending prices, is quite niche. License/IP heavy, they operate almost exclusively within comic book and movie-related categories, with strong support from the Star Wars license.

NECA

NECA has the same strategy as Hot Toys with IP heavy licensing, but at the opposite end of the pricing spectrum: average closing price for their goods are in the $50 range vs. Hot Toysโ€™ high 200s to low 500s per sale. Thus they cater to a niche market that is alienated by Hot Toys' high price point, eschewing the crushing weight of Hasbro and Mattel with their comic-book franchise licenses. Unopened figure listings sell particularly well.

Further Work and Closing Summary

The data presented is less indicative of any over-arching conclusion, due to the extremely small sampling period: essentially only 2 full days of sales. However, when repeated sampling periods are taken, much more comprehensive analysis can follow, such as predictive pricing and correlation analysis and hypothesis testing.

A strength of eBay data over that of Amazon/Walmart is the ability to gauge immediate consumer interest in a given brand/IP on a real-time basis; you cannot tell when someone buys a toy on Amazon/Walmart. If a scraping package could be put together to incorporate all 3 websites, I imagine the trends and insights would be very interesting indeed.

A more robust cleaning methodology would contribute to better results-however as they stand now they are directionally correct and are certainly within โ€˜ball-parkโ€™ range. A text matching algorithm could be used to extract the โ€˜franchiseโ€™ from the listing title; the franchise field  being frequently omitted in the user-submitted details.

As mentioned earlier, Marvel and Star Wars were frequently populated in the โ€˜brandโ€™ field, despite neither being a dedicated toy brand/maker. This suggests that, for long standing IPs with media/film support, there is a customer segment that is brand-agnostic and more franchise aware: they do not care which brand holds the license to make the franchise, only that the franchise continues to be made available for toy purchase. Strong sales of โ€˜unbrandedโ€™/knock off figures support this.  However, for the brand, the franchise is clearly of high importance.

Case in point,  Mattel has allowed their DC license to expire, and analysts postulate they will attempt to wrest control of the Star Wars and Marvel IPs from Hasbroโ€ฆ

https://www.bloomberg.com/news/articles/2018-12-24/mattel-shares-drop-as-dc-comics-gives-boys-toys-to-spin-master

About Author

Emanuel Pizana

An insight-driven data product with the proper context and intuition can really create bridges between Data Science and Business. A former finance professional with over 10 years of experience, I've spent time working with both Finance and Business...
View all posts by Emanuel Pizana >

Related Articles

Python
Using Data to Analyze The Library of Audible
Data Visualization
Yunnan Sourcing Tea Storefront and Analysis of the High End Tea Market
Data Visualization
Scraping Data on Ulta Skin Care
Python
Using Data to Analyze The science Behind Successful Podcasts
Web Scraping
Using Data to Find the Best Gluten Free Restaurants

Leave a Comment

MKsOrb August 28, 2020
MKsOrb [...]Wonderful story, reckoned we could combine a few unrelated data, nevertheless definitely really worth taking a search, whoa did 1 master about Mid East has got additional problerms also [...]

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application