Analysis of Used Car Listings on

Posted on May 18, 2020

Project Summary:

            Cars, while they can be a big-ticket purchase, are not investments. Their value drops the moment it is driven off the dealership lot and continues to do so without skipping a beat. Many variables factor into a car’s resale value; however, mileage and age stand out above all others. To examine magnitude of these two factors’ effect on a car’s resale value and rate of depreciation, I used Selenium to collect used car listings’ data on In order to filter out high-end and sports car makes, a price maximum of $60,000 was set. The year of the vehicle was also limited to 2000 – 2020. Listings in major US cities were scraped and the data collected from each included: Title of listing (Year and Make), Mileage, Exterior Color, Interior Color, Transmission, and Drivetrain.

By examining the used car market, we can derive information on the habits of the first owner of a car and the perceptions and sensitivities of the secondhand buyer. The data is further broken down by car make, categorized into “Luxury Brand” and “Non-Luxury Brand” as well as by the listings’ city to identify differences between them. Findings from the analysis would be useful to car dealerships by refining their marketing techniques to increase sales.


Initial Findings:

While examining the data as a whole, we see that the mode of age and mileage centers around 3 and 40,000, respectively. A multiple regression analysis was done on Price with respect to Age and Mileage showing that on average every year depreciates a car by roughly $590 and every five thousand miles driven depreciates it by roughly $645.

Price = -590.4*Age – 645.3* Mileage (5 thousand miles) + 29,303

After categorizing each listings’ car make as either “Luxury Brand” or “Non-Luxury Brand”, we see the used car market is about 34% Luxury Brands and 66% Non-Luxury Brands. By splitting our data by their respective cities, it becomes clear that their proportions vary from city to city with high cost of living cities (New York City and San Francisco) having a greater proportion of Luxury Brands.


Differences Between Luxury and Non-Luxury

While the average sale price of a used Luxury Brand car is $10,000 more than a non Luxury Brand, the average age and mileage of either category similar. This suggest that new car consumers tend to behave similar regardless of the vehicle type they purchase. For car dealerships, it can be interpreted that optimal timing to market to a new car owner is just over 4 years after their purchase, regardless if they own a Luxury brand or not.

However, the depreciation rate does differ between car types. A luxury brand loses $1,027 in value for each year that passes while a non-Luxury brand only loses $676. For every five thousand miles driven, a luxury brand will lose $716 compared to $438 for a non-luxury brand. Car dealerships can relay this information to their salespeople to aid them in tailoring their sales pitch on why a potential buyer should sell their used car now for a new one.


Differences Across Major Cities

When examining the data grouped by the cities the listing is located in, we see that the average mileage and age of a used car listing varies city to city. Car owners in Seattle tend to wait longer before selling their vehicle, while owners in Miami and San Francisco sell earlier. The data also shows that mileage of a vehicle increases the longer the owner waits to sell. However, Houston is an exception suggesting that the average owner drives more each year than other cities (this is definitely an area for more exploration).

After running multiple linear regressions on each subset of data, we can see how residents of different major cities have different levels of sensitivity to age and mileage exhibited by difference between coefficients. A car’s value in Boston decreases by $939 every year while it only decreases $160 in San Francisco. However, every five thousand miles driven San Francisco depreciates a car by $833 but only $601 in Boston. Furthermore, it can be concluded that vehicles in San Francisco seems to depreciate slower overall than anywhere else. Factors responsible for these differences may be weather conditions, road structures, and the economic status of the residents.

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp