Data Analysis on Car Pricing and Its Factors

Posted on Aug 16, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

People view the expected resale value of their car as a key purchasing decision during new vehicle purchases. In addition, car leasing rates are similarly set with the data value of the vehicle at the end of the lease term as a key influencer of the leasing cost. Manufacturers realize this and design vehicles and options based on the expected resale value of these cars as a key influencing factor.

In order to be able to better understand how to optimize car design strategy, and the go-to-market approach on various options offered at extra cost, manufacturers need to use data to understand in more detail how their vehicles’ resale value compares to competitors and how it changes over time for new vehicle model years.

To support this strategy review, this project focuses on the comparison of the resale value of vehicles in 2021, and the changes in value over time between 2021 and 2015 


The project analysis is based on two distinct datasets. One is a comprehensive used car sales overview from auction data from 2015, the other is a Craigslist web scrape of used car data for the entire US from 2021. Both datasets include roughly 500k line items each.

In order to enable the comparison of both datasets, both datasets were filtered for a range of attributes available from both sources. They are listed in Diagram 1.


Diagram 1: Car feature selection

The data excludes further details impacting value, such as engine size, trim, interior, and technology options due to data limitations in either dataset. This leads to a potential bias in our dataset when comparing both years. For example, we can compare the mean price for an Audi A3, 10-15k miles driven, 2-3 years old, excellent condition, automatic transmission, in white, in Texas. That could bring to light a difference in the mean price between the years, though we cannot be certain if it is due to an actual price increase or if our sample from 2021 by chance includes a higher share of cars with a large 2-liter engine.


Data Results

All resulting comparisons and conclusions need to be individually drawn for each automotive OEM, by comparing their performance vs. that of direct competitors. In order to enable this comparison, all data has been structured into an interactive Shiny app.

In this blog post, I will lead through the results both in a generalized form and by reviewing the results for carmaker BMW as an example.


2021 review

We will first look at an overview of the landscape, and compare OEMs along various metrics for used car sales in 2021 only.

The first view shows the mean value of cars between manufacturers based on the age of the cars.

Data Analysis on Car Pricing and Its Factors

Graph 1: Average vehicle price depending on vehicle age per manufacturer

The depreciation of cars over up to 20 years of vehicle age varies considerably between manufacturers. There is no clear trend visible for either premium vs. mass-produced cars, or by the geography of the manufacturer.

Data Analysis on Car Pricing and Its Factors

Graph 2: Average depreciation 12-20-year-old cars vs 0-2-year-old cars per manufacturer

Looking at BMW, and comparing them to direct competitors Audi and Mercedes, we observe the same resulting price after 20 years, though BMW has a slightly lower average price after 0-2 years than Mercedes. When comparing the depreciation between the 0-2 year age, and the 12-20 year age directly in Graph 2, BMW slightly outperforms its competitors. However, compared to brands like Toyota, the depreciation is still high.

In addition, we can compare manufacturers on a range of additional metrics.

Data Analysis on Car Pricing and Its Factors

Graph 3: Price development by miles driven and by color per manufacturer

The number of miles driven seems to have a roughly equal effect on manufacturers, indicating that. BMW is decently positioned here. Comparing by color shows that cars painted green are of particularly low value. All OEMs should carefully review whether to terminate this color in future releases, especially given typical paint-job constraints on the total number of available colors per car.


2015 to 2021 Data Comparison

In order to compare car pricing in 2015 and 2021, we need to compare cars on a detailed level. We will use the same breakdown already pointed out in Diagram 1, and compare equal cars to one another.

Due to the detailed level of comparison, a lot of the available cars from both datasets do not have direct counterparts in the other dataset. Table 1 illustrates how many of the total ~500k used car sales in 2021 can be compared to similar cars in 2015, depending on the level of detail we compare them at.

Table 1: Impact of higher accuracy for comparison on total population size

In order to remove as much bias as possible from the comparison, we will use the most detailed level and compare a total of 15k Cars from 2021.

When averaging out the price delta for similar cars as defined above between 2015 and 2021 by manufacturer, we can see that prices across manufacturers have risen significantly, well above expected inflation levels.

Graph 4: Average price development of equal cars between 2015 and 2021 per manufacturer

There is no clear trend visible for either premium vs. mass-produced cars, or by the geography of the manufacturer. However, in general, manufacturers of larger and pick-up truck vehicles are mostly positioned at the upper end of the scale.BMW, in particular. has profited less than competitors from the price increases.

For the next step, we will be looking in more detail into the drivers of these price increases.

There are only limited trends across manufacturers visible for age and miles driven. However, this data needs to be reviewed individually by manufacturers as a basis for their product strategy review.

Graph 5:  Average price development of equal cars between 2015 and 2021 by Age, Odometer and model

When looking in detail at what influences BMW, we can observe that vehicles in poorer condition and bigger cars seem to drive most of the overall value increase.



Overall, used car prices have increased a lot across all brands and manufacturers over the past 6 years. Increases differ significantly by manufacturer, indicating that some manufacturer’s product strategies have been superior to others. While no clear trend is visible for all manufacturers across detailed metrics, manufacturers can use the provided dashboard to benchmark their pricing with direct competitors as a basis for their strategic product roadmap reviews.

When reviewing results specifically for BMW we can see that BMW profited less than competitors from the industry-wide price increases over the past 6 years. BMW may want to consider price adjustments, especially for new large cars, since the used value went up so much for these models. In addition, BMW should focus the development of new cars on the longevity of functionality rather than look and feel.


Next steps

In order to remove further bias from the data and enable even more granular analyses, the data should be broken down by additional metrics, such as car trim, engine size, and additional selected options.

Results should be qualitatively compared to new vehicle release strategies and car quality reviews to identify additional reasons for price discrepancies.


Further information:

Shiny app:


Author LinkedIn:



2021 Used car sales data from This dataset scraped from Craigslist

2015 Used car sales data from This dataset of used car auctions from Kaggle

About Author

Moritz Becker

Strategy Consultant, with a passion for creating impact from data-driven business insights. Originally from Germany, I have been working in the US as an Engagement Manager in Strategy Consulting for over 3 years. My projects at work focus...
View all posts by Moritz Becker >

Related Articles

Leave a Comment

How did used car pricing develop over the past 6 years? | DevArena August 16, 2021
[…] Source link […]

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI