Data Analysis of Used Car Listings on

Posted on May 18, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Project Summary:

            Cars, while they can be a big-ticket purchase, are not investments. Data shows their value drops the moment it is driven off the dealership lot and continues to do so without skipping a beat. Many variables factor into a car’s resale value; however, mileage and age stand out above all others.

To examine magnitude of these two factors’ effect on a car’s resale value and rate of depreciation, I used Selenium to collect used car listings’ data on In order to filter out high-end and sports car makes, a price maximum of $60,000 was set. The year of the vehicle was also limited to 2000 – 2020. Listings in major US cities were scraped and the data collected from each included: Title of listing (Year and Make), Mileage, Exterior Color, Interior Color, Transmission, and Drivetrain.

By examining the used car market, we can derive information on the habits of the first owner of a car and the perceptions and sensitivities of the secondhand buyer. The data is further broken down by car make, categorized into “Luxury Brand” and “Non-Luxury Brand” as well as by the listings’ city to identify differences between them. Findings from the analysis would be useful to car dealerships by refining their marketing techniques to increase sales.


Initial Data Findings:

While examining the data as a whole, we see that the mode of age and mileage centers around 3 and 40,000, respectively. A multiple regression analysis was done on Price with respect to Age and Mileage showing that on average every year depreciates a car by roughly $590 and every five thousand miles driven depreciates it by roughly $645.

Price = -590.4*Age – 645.3* Mileage (5 thousand miles) + 29,303

After categorizing each listings’ car make as either “Luxury Brand” or “Non-Luxury Brand”, we see the used car market is about 34% Luxury Brands and 66% Non-Luxury Brands. By splitting our data by their respective cities, it becomes clear that their proportions vary from city to city with high cost of living cities (New York City and San Francisco) having a greater proportion of Luxury Brands.


Data on Differences Between Luxury and Non-Luxury

While the average sale price of a used Luxury Brand car is $10,000 more than a non Luxury Brand, the average age and mileage of either category similar. This suggest that new car consumers tend to behave similar regardless of the vehicle type they purchase. For car dealerships, it can be interpreted that optimal timing to market to a new car owner is just over 4 years after their purchase, regardless if they own a Luxury brand or not.

However, the depreciation rate does differ between car types. A luxury brand loses $1,027 in value for each year that passes while a non-Luxury brand only loses $676. For every five thousand miles driven, a luxury brand will lose $716 compared to $438 for a non-luxury brand. Car dealerships can relay this information to their salespeople to aid them in tailoring their sales pitch on why a potential buyer should sell their used car now for a new one.


Data on Differences Across Major Cities

When examining the data grouped by the cities the listing is located in, we see that the average mileage and age of a used car listing varies city to city. Car owners in Seattle tend to wait longer before selling their vehicle, while owners in Miami and San Francisco sell earlier. The data also shows that mileage of a vehicle increases the longer the owner waits to sell. However, Houston is an exception suggesting that the average owner drives more each year than other cities (this is definitely an area for more exploration).

After running multiple linear regressions on each subset of data, we can see how residents of different major cities have different levels of sensitivity to age and mileage exhibited by difference between coefficients. A car’s value in Boston decreases by $939 every year while it only decreases $160 in San Francisco. However, every five thousand miles driven San Francisco depreciates a car by $833 but only $601 in Boston. Furthermore, it can be concluded that vehicles in San Francisco seems to depreciate slower overall than anywhere else. Factors responsible for these differences may be weather conditions, road structures, and the economic status of the residents.

About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI