What Toys Can Tell Us: Insight and Discussion
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
eBay is second only to Amazon in terms of e-commerce sales volume in North America, surpassing Apple and Walmart.
While 'electronics' is the largest category in terms of sales, the 'toys' category is uniquely positioned to give insight into current consumer trends, historical appetite, and - ultimately - measuring the strength of a brand. This information has implications for both the individual as well as the institution.
Propagation of names such as 'Iron Man' and 'Thanos' has facilitated a transition from obscure references to near-household status. While quantifying this transition is beyond the scope of this project, it has lead to the main question: Is there a way to track this propagation in a way that is neatly encapsulated by a consumer product? There is, and the answer is toys. Thus, let's begin by asking additional questions: How much does a franchise matter to a brand? What are spending habits of the toy shopper?
In our exploration, we will specifically take a look at the Action Figure Category.
Approach and Challenges
In theory, the questions are quite apparent, but in practice, the retrieval of sold listings proved to be a challenge. On any given day, the completed items in eBay's Action Figure category numbers over 1 million (reflecting several weeks' worth of data, and an ideal beginning sample size). At 100 results per page, this implies that there are over 10,000 pages worth of completed listings, but this proved to be elusive.
The data was scraped using python's scrapy package. The first crawl resulted in only a little over 8000 listings returned. Upon further examination, scrapy's response log indicated that only roughly 160 pages were scraped at 50 results per page. Tweaking settings and several adjustments to the code lead to only marginal improvement; a second scrape produced only 180 pages.
Thus, the first takeaway for improvement is readily apparent: either a way to force eBay's servers to return the 1M+ results is devised, or the crawler should be run every night, preferably over at least a 30 day time interval, with each iteration merged appropriately to avoid duplicating listings. Nevertheless, even with only 2 days' worth of sold listings, we can start asking questions, and envisioning how the answers can be deduced. The total sales over a 2 day period was $625,642.
The second major challenge was the user-populated 'Item Specifics' box. There are upwards of 22 unique fields that the seller can populate in this box, but as nearly every field is optional, the information varied widely from listing to listing.
Key to the analysis was 'brand' field. Luckily, blank/omitted listings only comprised a little under 2.5% of all sales. More challenging was the breadth of spelling variations provided for multiple brands. Certainly, future improvements to the project would implement increasingly complex regex expressions to correct/anticipate the user-provided data. Nevertheless, with some rigorous cleaning, the impact by brand was accurately captured, leading us to view immediate results.
2 obvious "brands" stand out: Marvel and Star Wars. These are not in fact brands but are franchises/intellectual property (incidentally, both belonging to Disney). This illustrates that the user populated data can be "lazy"; the user populates what is foremost and easy-to-identify. Correction for this is highly complex, and mainly dependent on whether the seller provided the brand name either in the auction title or the description. So, for the scope of the project, I left these "brands" intact.
The below illustrations were made from a combination of the seaborn, WordCloud, and plotly packages (unfortunately the interactive nature of plotly's graphs is lost when translating to a blog post).
A cursory look at sales by brand shows a very clear trend: toy brand Hasbro is a powerhouse:
Hasbro sales volume through 48 hours’ worth of data is more than the next 3 largest brands combined, with shoppers purchasing over $150k worth of new and used toys. Of note is the defunct Kenner at 4th place, with roughly $30k, implying that collectors are driving those particular figures. This is a bit more clear when viewing sales by brand broken down by condition.
Together, these 20 constitute the heavy majority of the 2 day sales data. The collector segment is well-represented with used purchases driving sales in LJN, Mego, and Kenner, all either defunct or absorbed.
An inspection of the top 5 selling sub-categories shows interesting results.
Had there been a larger overall sample of data across at least 6 months, we could pose a hypothesis test with the H0 that toy sales are independent of current trends in film, television, streaming media, and other platforms. However, with such a large disparity in sales, we can infer that licensing and IP in the form of strongly supported media franchises do exceptionally well. I should note that while 'brand' is user-provided as is thus inconsistent, the sub-categories shown are mandatory fields, and so we can trust these segmentations with full confidence.
The only confusion would be how much overlap there is between "Comic Book Heroes" and "TV, Movies & Video Games." That is currently beyond the scope of the project, but the question is interesting.
It is also noteworthy that the third best selling category, "Transformers & Robots," is given distinction form "Military & Adventure," a point that will be revisited shortly.
I shift to a slightly more "bidder/buyer-centric" view here. This facet (from plotly) shows the average selling price, broken out by "buy-it-now" and "auction" format, classified by new/used condition. The top 5 categories span the columns, while brands populate the rows. Notice that NECA is included while the brand "Marvel" is excluded; I did this to make this plot strictly brand (i.e., manufacturer/producer)-based rather than franchise/IP-based. As such, the presence of "Unbranded" represents knock-off and unlicensed toys.
Some takeaways from the above: collectors drive the highest prices, with Mattel's proprietary IP, "Masters of the Universe" commanding BIN prices in excess of $100 per item on average within the Military & Adventure category. The "Transformers & Robots" category reveals where buyers have the strongest presence in the Hasbro brand in terms of average price. This implies sellers are uncertain of the value of their goods, and elect for discovery through auction processes, with bidders also meeting their asks.
Bid dispersion for the top 5 categories across all brands is strongest in the TVMVG category, but the Transformers & Robots category exhibits the most bids in the 75 percentile (slightly under a tendency of 25 bids).
When we shift to looking at the top 5 categories with only the top brands, competition is less frequent but is highly centered in the Comic Book Heroes category. Again, this suggests sellers are uncertain of the value of their goods, but when viewed in conjunction with the average selling price above, toys in the Comic Book Heroes category are relatively inexpensive and mostly for new items.
An outlier from the top brands is Hot Toys, which required its own plot.
Focusing strictly on the high-end collectors' market, buyers are more than willing to pay on average $200 and up per item.
I examined the top brands a bit more, curious to see how the brands were fairing in the top 5 categories. Again, the graph was particularly illustrative for Hasbro
This graph was done in plotly, so unfortunately some details are lost with the static image. The vertical bars within the color segments signal demarcations between used and new sales within each category. Still, it's apparent how heavily concentrated every brand is in the TVMVG category, though again with the exception of Hasbro.
Hasbro has strong diversification away from the comic book/movie related franchises largely due to their own proprietary IPs: GI Joe and Transformers. A Wordcloud pull (shown above) from all auction titles across all brands illustrates just how strong Hasbro and its IPs/licences are, with "Star Wars" being an extremely common string in auction titles, along with "Marvel" and Hasbro's name itself.
Restricting the WordCloud to only listings with Hasbro indicated as the brand yields similar results, with "Spider-Man" and "Optimus Prime" even showing up. Again, the takeaway is clear, Hasro is the "best diversified" of the top brands, with a seemingly unbeatable combination of top licenses (Star Wars, Marvel) and proprietary IP (Transformers, GI Joe).
Mattel is the next best ‘well-rounded’ after Hasbro, with strong support from collectors for their propietary IP, ‘Masters of the Universe’ as well as contemporary DC and sports/WWE. One very significant factor to consider: eBay does not include Barbie in its Action Figure Category, instead dedicating an entire section under "Dolls" for Barbie figures.
Hot Toys licenses movie and tv-show related properties to produce high-end goods. Their market, as shown by the high average ending prices, is quite niche. License/IP heavy, they operate almost exclusively within comic book and movie-related categories, with strong support from the Star Wars license.
NECA has the same strategy as Hot Toys with IP heavy licensing, but at the opposite end of the pricing spectrum: average closing price for their goods are in the $50 range vs. Hot Toys’ high 200s to low 500s per sale. Thus they cater to a niche market that is alienated by Hot Toys' high price point, eschewing the crushing weight of Hasbro and Mattel with their comic-book franchise licenses. Unopened figure listings sell particularly well.
Further Work and Closing Summary
The data presented is less indicative of any over-arching conclusion, due to the extremely small sampling period: essentially only 2 full days of sales. However, when repeated sampling periods are taken, much more comprehensive analysis can follow, such as predictive pricing and correlation analysis and hypothesis testing.
A strength of eBay data over that of Amazon/Walmart is the ability to gauge immediate consumer interest in a given brand/IP on a real-time basis; you cannot tell when someone buys a toy on Amazon/Walmart. If a scraping package could be put together to incorporate all 3 websites, I imagine the trends and insights would be very interesting indeed.
A more robust cleaning methodology would contribute to better results-however as they stand now they are directionally correct and are certainly within ‘ball-park’ range. A text matching algorithm could be used to extract the ‘franchise’ from the listing title; the franchise field being frequently omitted in the user-submitted details.
As mentioned earlier, Marvel and Star Wars were frequently populated in the ‘brand’ field, despite neither being a dedicated toy brand/maker. This suggests that, for long standing IPs with media/film support, there is a customer segment that is brand-agnostic and more franchise aware: they do not care which brand holds the license to make the franchise, only that the franchise continues to be made available for toy purchase. Strong sales of ‘unbranded’/knock off figures support this. However, for the brand, the franchise is clearly of high importance.
Case in point, Mattel has allowed their DC license to expire, and analysts postulate they will attempt to wrest control of the Star Wars and Marvel IPs from Hasbro…