Data Analysis on Car Pricing and Its Factors
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
People view the expected resale value of their car as a key purchasing decision during new vehicle purchases. In addition, car leasing rates are similarly set with the data value of the vehicle at the end of the lease term as a key influencer of the leasing cost. Manufacturers realize this and design vehicles and options based on the expected resale value of these cars as a key influencing factor.
In order to be able to better understand how to optimize car design strategy, and the go-to-market approach on various options offered at extra cost, manufacturers need to use data to understand in more detail how their vehiclesβ resale value compares to competitors and how it changes over time for new vehicle model years.
To support this strategy review, this project focuses on the comparison of the resale value of vehicles in 2021, and the changes in value over time between 2021 and 2015Β
Methodology
The project analysis is based on two distinct datasets. One is a comprehensive used car sales overview from auction data from 2015, the other is a Craigslist web scrape of used car data for the entire US from 2021. Both datasets include roughly 500k line items each.
In order to enable the comparison of both datasets, both datasets were filtered for a range of attributes available from both sources. They are listed in Diagram 1.
Diagram 1: Car feature selection
The data excludes further details impacting value, such as engine size, trim, interior, and technology options due to data limitations in either dataset. This leads to a potential bias in our dataset when comparing both years. For example, we can compare the mean price for an Audi A3, 10-15k miles driven, 2-3 years old, excellent condition, automatic transmission, in white, in Texas. That could bring to light a difference in the mean price between the years, though we cannot be certain if it is due to an actual price increase or if our sample from 2021 by chance includes a higher share of cars with a large 2-liter engine.
Data Results
All resulting comparisons and conclusions need to be individually drawn for each automotive OEM, by comparing their performance vs. that of direct competitors. In order to enable this comparison, all data has been structured into an interactive Shiny app.
In this blog post, I will lead through the results both in a generalized form and by reviewing the results for carmaker BMW as an example.
2021 review
We will first look at an overview of the landscape, and compare OEMs along various metrics for used car sales in 2021 only.
The first view shows the mean value of cars between manufacturers based on the age of the cars.
Graph 1: Average vehicle price depending on vehicle age per manufacturer
The depreciation of cars over up to 20 years of vehicle age varies considerably between manufacturers. There is no clear trend visible for either premium vs. mass-produced cars, or by the geography of the manufacturer.
Graph 2: Average depreciation 12-20-year-old cars vs 0-2-year-old cars per manufacturer
Looking at BMW, and comparing them to direct competitors Audi and Mercedes, we observe the same resulting price after 20 years, though BMW has a slightly lower average price after 0-2 years than Mercedes. When comparing the depreciation between the 0-2 year age, and the 12-20 year age directly in Graph 2, BMW slightly outperforms its competitors. However, compared to brands like Toyota, the depreciation is still high.
In addition, we can compare manufacturers on a range of additional metrics.
Graph 3: Price development by miles driven and by color per manufacturer
The number of miles driven seems to have a roughly equal effect on manufacturers, indicating that. BMW is decently positioned here. Comparing by color shows that cars painted green are of particularly low value. All OEMs should carefully review whether to terminate this color in future releases, especially given typical paint-job constraints on the total number of available colors per car.
2015 to 2021 Data Comparison
In order to compare car pricing in 2015 and 2021, we need to compare cars on a detailed level. We will use the same breakdown already pointed out in Diagram 1, and compare equal cars to one another.
Due to the detailed level of comparison, a lot of the available cars from both datasets do not have direct counterparts in the other dataset. Table 1 illustrates how many of the total ~500k used car sales in 2021 can be compared to similar cars in 2015, depending on the level of detail we compare them at.
Table 1: Impact of higher accuracy for comparison on total population size
In order to remove as much bias as possible from the comparison, we will use the most detailed level and compare a total of 15k Cars from 2021.
When averaging out the price delta for similar cars as defined above between 2015 and 2021 by manufacturer, we can see that prices across manufacturers have risen significantly, well above expected inflation levels.
Graph 4: Average price development of equal cars between 2015 and 2021 per manufacturer
There is no clear trend visible for either premium vs. mass-produced cars, or by the geography of the manufacturer. However, in general, manufacturers of larger and pick-up truck vehicles are mostly positioned at the upper end of the scale.BMW, in particular. has profited less than competitors from the price increases.
For the next step, we will be looking in more detail into the drivers of these price increases.
There are only limited trends across manufacturers visible for age and miles driven. However, this data needs to be reviewed individually by manufacturers as a basis for their product strategy review.
Graph 5:Β Average price development of equal cars between 2015 and 2021 by Age, Odometer and model
When looking in detail at what influences BMW, we can observe that vehicles in poorer condition and bigger cars seem to drive most of the overall value increase.
Conclusion
Overall, used car prices have increased a lot across all brands and manufacturers over the past 6 years. Increases differ significantly by manufacturer, indicating that some manufacturerβs product strategies have been superior to others. While no clear trend is visible for all manufacturers across detailed metrics, manufacturers can use the provided dashboard to benchmark their pricing with direct competitors as a basis for their strategic product roadmap reviews.
When reviewing results specifically for BMW we can see that BMW profited less than competitors from the industry-wide price increases over the past 6 years. BMW may want to consider price adjustments, especially for new large cars, since the used value went up so much for these models. In addition, BMW should focus the development of new cars on the longevity of functionality rather than look and feel.
Next steps
In order to remove further bias from the data and enable even more granular analyses, the data should be broken down by additional metrics, such as car trim, engine size, and additional selected options.
Results should be qualitatively compared to new vehicle release strategies and car quality reviews to identify additional reasons for price discrepancies.
Further information:
Shiny app: https://moritzbecker.shinyapps.io/used-car-pricing/
GitHub: https://github.com/Kneck12/used-car-pricing
Author LinkedIn: https://www.linkedin.com/in/moritz-becker2/
Sources
2021 Used car sales data from This dataset scraped from Craigslist
2015 Used car sales data from This dataset of used car auctions from Kaggle