The Pains of Growing an e-Commerce Business: A Case Study on Etsy

Posted on Nov 21, 2016

Etsy, A Snapshot

Finding that special gift for your loved one, something handmade or vintage might send you Etsy. It's no surprise that this once micro eCommerce website for hobbyist with 650,000 members in 2008 had grown into 5 million in 2010, and to 54 million in 2014. Rob Kalin, its founder, accidentally learned how to make a website to pay for his rent. He, himself, enjoyed creating things which inspired him to develop a website where other artists like him could sell their work.

Data Gathering Through Web Scraping: Baby Carriers Category

To get a good sense of the kind of customers and sellers in Etsy, I picked a specific product category to analyze. In this case, it was baby carriers. Using Python's Scrapy, I gathered the following information from the eCommerce site:

  1. Product Name
  2. Product Price
  3. Product Views
  4. Seller Name
  5. Seller Rating
  6. Seller Location
  7. Seller Items

This yielded to 28,143 observation and 7 features available for analysis. After pre-processing the data, I performed graphical and numerical exploratory data analysis using R. What follows are my initial findings.

Visitors of Etsy Drawn to Lower-Priced Products

For the baby carrier product category,  it was interesting to find out that the most viewed were the ones priced $25 - $50. Something that I did not expect from a handmade and vintage marketplace.


 High Volume of Lower-Cost Product Inventory

Product inventory price range for baby carriers appears to be leaning towards lower-end with items $25 and below accounting for the bulk followed by $25-$50 range.


Predictive Model To Forecast Product Demand: Linear Regression

I wanted to create a model that predicted product demand using product sales volume and views. However, on the Etsy website, it only published total shop sales with no breakdown per specific product. As a substitute, I used the product views to estimate product demand.

Upon performing the linear regression diagnostics, the summary of results were as follows:

  • there were no significant CORR among variables
  • the samples were not drawn from a normal distribution
  • the input variables were not independent from each other
  • there was no linear relationship shown in the scatter plot
  • therefore, Linear Regression not suitable for prediction of views using feedback, price, item count


  • the data shows interest for more affordable baby carriers
  • proliferation of handmade baby carriers with lower than $50 price range begs the question, are they really truly handmade? 
  • predicting product views using linear regression is not suitable for Etsy’s baby carrier category based on data scraped 
  • more success can lead a business to moving away from its original brand identity and values. Whether it is better or not is yet to be investigated based on agreed KPIs.
  • Etsy seems to have changed from niched to mass-market patterns 

Next Steps

  • Scrape reviews and perform text analysis
  • Product name analysis
  • Shop Inventory analysis
  • Shop location analysis
  • Product pricing recommendation for sellers
  • Shiny app

About Author

Chris Valle

Chris is a Digital Strategy Manager and Marketer who, for 10 years, has been combining her data-driven insights and customer-centric marketing strategies to grow her clients' business. Her forte is monetizing digital and mobile channels to drive international...
View all posts by Chris Valle >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI