Scraping Carousell.ph

Posted on Nov 6, 2019

The Philippines is a beautiful country. I may have been born in the US, but this is still my country. I grew up there and it's still my home. Its beaches are vast, valleys pristine, and mountain ranges so beautiful it'll make you cry. Sadly, the same thing can't be said about Manila.

The heart of the city is full of the urban poor. There's a great class divide and majority of the population live in poverty and hopelessness. This is just the standard for third world countries. Buying things at full price just isn't an option. 

So what is one option to take?

Carousell.ph is the premier buy and sell website all over south east Asia. They have branches in Malaysia, Singapore, Taiwan, Hong Kong, and even in Australia. For my scraping project I decided to scrape this site so I could compare values and resale values for various items on the page. 

My original hypothesis is that the page is popular due to their collections of high quality and affordable fare. So i took a deep dive into the website.

The code for this project can be found on my Github account right HERE.

I started by creating a python script that would scrape various pages through each category(40 in all). I used Selenium in particular since there was a "load more" button at the bottom of each category page. Selenium is a powerful albeit slower scraper that has the ability to click buttons as needed for scraping. 

 

The first step I did for exploratory data analysis was to get the mean or average prices of the thousands of results I was able to scrape from the page. From there I noticed obvious outlier categories when it came to Real Estate, Cars, and even Antiques. Since their inclusion made the graph less readable, I took them out and then filtered a smaller group of categories I could compare.

This gives a better look at the categories. While motorbikes are more expensive, surprising followers included cars and business services as well as the assistive category which includes wheelchairs, canes, medical equipment, and such. Other categories include generally more affordable items such as clothes, health and beauty products, sports attire, and video games, among others.

 

Looking at three random categories, I chose health and beauty as well as car parts and photography. What's interesting to note is that some of the highest sellers seem to be groups or companies. This can be seen with usernames such as "facebeauty.shop" and "snycustoms". I think it would be safe to assume that these users are business resellers rather than individual users.

 

As expected, newer cars are generally more expensive that used cars. Although used cars are the majority of the cars (and even the most expensive one). I used a scatter plot using the items index just to get a general feel of the lot and how they are priced with red items being brand new and blue dots being second hand and used vehicles.

 

I focused on analysis for mobile phones in particular.  I felt this would be more robust as the cellphone market in the Philippines is pretty big. Everyone from every demographic has a cellphone. It doesn't matter if you're class A, B, or even C. In any case, the prices and values seemed to be appropriate but there were certain trends that I found interesting. 

One 256 gb iPhone 11 cost 14 hundred dollars on Carousell while getting one straight from the Mac store would only cost 12 hundred dollars. I felt this was interesting to note as most people would assume you'd get a better deal from a buy and sell website.

An older model like the iPhone X was appropriately priced I think due to the fact that it's been out for a while and most everybody would know how much or at least have a general idea on how much it would cost already. Finally, I found the resale value inflated again when it came to Samsung phones. 

Samsung Galaxy Note10+s and Note10+5g phones were overpriced by a few hundred dollars as well. 

While the general trend is that used items are cheaper and new items are more expensive, it’s not necessarily true especially with higher end items such as latest model mobile phones and luxury vehicles that still cost a pretty penny even if used. There are resellers who possibly get early access to products only to resell it at higher rates (as seen with the iPhone examples). What used to be a simple person to person market has changed as discussed in the categorical analysis with various small groups creating their profiles for business related transactions. 

Future studies would benefit from point of sale and time data upon deal completion as well as more precise item assignments as opposed to user input for product titles. 

About Author

Ira Villar

Ira is currently a Data Science Fellow at the NYC Data Science Academy. He has nearly a decade of experience in film directing and production. This gives him a unique insight and perspective when it comes to data...
View all posts by Ira Villar >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI