Data Analysis Home Selling: Scraping Carousell.ph
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The Philippines is a beautiful country. I may have been born in the US, but this is still my country. I grew up there and it's still my home. Its beaches are vast, valleys pristine, and mountain ranges so beautiful it'll make you cry. Sadly, the same thing can't be said about Manila.
The heart of the city is full of the urban poor. There's a great class divide and majority of the population live in poverty and hopelessness. This is just the standard for third world countries. Buying things at full price just isn't an option.
So what is one option to take?
Carousell.ph is the premier buy and sell website all over south east Asia. They have branches in Malaysia, Singapore, Taiwan, Hong Kong, and even in Australia. For my scraping project I decided to scrape this site so I could compare values and resale values for various items on the page.
Hypothesis
My original hypothesis is that the page is popular due to their collections of high quality and affordable fare. So i took a deep dive into the website.
The code for this project can be found on my Github account right HERE.
I started by creating a python script that would scrape various pages through each category(40 in all). I used Selenium in particular since there was a "load more" button at the bottom of each category page. Selenium is a powerful albeit slower scraper that has the ability to click buttons as needed for scraping.
The first step I did for exploratory data analysis was to get the mean or average prices of the thousands of results I was able to scrape from the page. From there I noticed obvious outlier categories when it came to Real Estate, Cars, and even Antiques. Since their inclusion made the graph less readable, I took them out and then filtered a smaller group of categories I could compare.
A look at categories
This gives a better look at the categories. While motorbikes are more expensive, surprising followers included cars and business services as well as the assistive category which includes wheelchairs, canes, medical equipment, and such. Other categories include generally more affordable items such as clothes, health and beauty products, sports attire, and video games, among others.
Looking at three random categories, I chose health and beauty as well as car parts and photography. What's interesting to note is that some of the highest sellers seem to be groups or companies. This can be seen with usernames such as "facebeauty.shop" and "snycustoms". I think it would be safe to assume that these users are business resellers rather than individual users.
As expected, newer cars are generally more expensive that used cars. Although used cars are the majority of the cars (and even the most expensive one). I used a scatter plot using the items index just to get a general feel of the lot and how they are priced with red items being brand new and blue dots being second hand and used vehicles.
Analysis
I focused on analysis for mobile phones in particular. I felt this would be more robust as the cellphone market in the Philippines is pretty big. Everyone from every demographic has a cellphone. It doesn't matter if you're class A, B, or even C. In any case, the prices and values seemed to be appropriate but there were certain trends that I found interesting.
One 256 gb iPhone 11 cost 14 hundred dollars on Carousell while getting one straight from the Mac store would only cost 12 hundred dollars. I felt this was interesting to note as most people would assume you'd get a better deal from a buy and sell website.
An older model like the iPhone X was appropriately priced I think due to the fact that it's been out for a while and most everybody would know how much or at least have a general idea on how much it would cost already. Finally, I found the resale value inflated again when it came to Samsung phones.
Samsung
Samsung Galaxy Note10+s and Note10+5g phones were overpriced by a few hundred dollars as well.
While the general trend is that used items are cheaper and new items are more expensive, itโs not necessarily true especially with higher end items such as latest model mobile phones and luxury vehicles that still cost a pretty penny even if used. There are resellers who possibly get early access to products only to resell it at higher rates (as seen with the iPhone examples). What used to be a simple person to person market has changed as discussed in the categorical analysis with various small groups creating their profiles for business related transactions.
Future studies would benefit from point of sale and time data upon deal completion as well as more precise item assignments as opposed to user input for product titles.