Scraping Carousell.ph

Ira Villar
Posted on Nov 6, 2019

The Philippines is a beautiful country. I may have been born in the US, but this is still my country. I grew up there and it's still my home. Its beaches are vast, valleys pristine, and mountain ranges so beautiful it'll make you cry. Sadly, the same thing can't be said about Manila.

The heart of the city is full of the urban poor. There's a great class divide and majority of the population live in poverty and hopelessness. This is just the standard for third world countries. Buying things at full price just isn't an option. 

So what is one option to take?

Carousell.ph is the premier buy and sell website all over south east Asia. They have branches in Malaysia, Singapore, Taiwan, Hong Kong, and even in Australia. For my scraping project I decided to scrape this site so I could compare values and resale values for various items on the page. 

My original hypothesis is that the page is popular due to their collections of high quality and affordable fare. So i took a deep dive into the website.

The code for this project can be found on my Github account right HERE.

I started by creating a python script that would scrape various pages through each category(40 in all). I used Selenium in particular since there was a "load more" button at the bottom of each category page. Selenium is a powerful albeit slower scraper that has the ability to click buttons as needed for scraping. 

 

The first step I did for exploratory data analysis was to get the mean or average prices of the thousands of results I was able to scrape from the page. From there I noticed obvious outlier categories when it came to Real Estate, Cars, and even Antiques. Since their inclusion made the graph less readable, I took them out and then filtered a smaller group of categories I could compare.

This gives a better look at the categories. While motorbikes are more expensive, surprising followers included cars and business services as well as the assistive category which includes wheelchairs, canes, medical equipment, and such. Other categories include generally more affordable items such as clothes, health and beauty products, sports attire, and video games, among others.

 

Looking at three random categories, I chose health and beauty as well as car parts and photography. What's interesting to note is that some of the highest sellers seem to be groups or companies. This can be seen with usernames such as "facebeauty.shop" and "snycustoms". I think it would be safe to assume that these users are business resellers rather than individual users.

 

As expected, newer cars are generally more expensive that used cars. Although used cars are the majority of the cars (and even the most expensive one). I used a scatter plot using the items index just to get a general feel of the lot and how they are priced with red items being brand new and blue dots being second hand and used vehicles.

 

I focused on analysis for mobile phones in particular.  I felt this would be more robust as the cellphone market in the Philippines is pretty big. Everyone from every demographic has a cellphone. It doesn't matter if you're class A, B, or even C. In any case, the prices and values seemed to be appropriate but there were certain trends that I found interesting. 

One 256 gb iPhone 11 cost 14 hundred dollars on Carousell while getting one straight from the Mac store would only cost 12 hundred dollars. I felt this was interesting to note as most people would assume you'd get a better deal from a buy and sell website.

An older model like the iPhone X was appropriately priced I think due to the fact that it's been out for a while and most everybody would know how much or at least have a general idea on how much it would cost already. Finally, I found the resale value inflated again when it came to Samsung phones. 

Samsung Galaxy Note10+s and Note10+5g phones were overpriced by a few hundred dollars as well. 

While the general trend is that used items are cheaper and new items are more expensive, it’s not necessarily true especially with higher end items such as latest model mobile phones and luxury vehicles that still cost a pretty penny even if used. There are resellers who possibly get early access to products only to resell it at higher rates (as seen with the iPhone examples). What used to be a simple person to person market has changed as discussed in the categorical analysis with various small groups creating their profiles for business related transactions. 

Future studies would benefit from point of sale and time data upon deal completion as well as more precise item assignments as opposed to user input for product titles. 

About Author

Ira Villar

Ira Villar

Ira is currently a Data Science Fellow at the NYC Data Science Academy. He has nearly a decade of experience in film directing and production. This gives him a unique insight and perspective when it comes to data...
View all posts by Ira Villar >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp