How Much is My Used Car Worth?

Anthony Parrillo
Posted on May 21, 2018

Maybe you've tried to sell a car, only to find that you couldn't get nearly as much money as you thought! Perhaps you have tried to buy a used car, only to find that it cost much more than you could have imagined! Or you purchased a used car for a reasonable price only to find that the car had unexpected problems that were invisible on the surface!

In this web scraping project, I set out to see which factors affected the resale price of a used car, and by what factor.

To do this, I web scraped Carfax: https://www.carfax.com/.

Carfax is a website that is well known for checking the history and status of used cars to help used car buyers from being sold a car that has problems that the buyer is unaware of.

Using Selenium, I extracted the URLs of the webpage for each car using the following used filters:

- Sold within a 50 mile radius of New York city

- Under $15,000

Once all the URLS were collected, I extracted the detailed information for each car using scrapy. Here is a sample webpage:

The prices collected are based on dealer selling prices, not the current market value.
Due to time restraints, I collected data for 6747 used cars. It was collected in a CSV file:

I ran several analyses of the data that was collected. First, a scatter plot of price vs. year:

Boxplot of Price vs. Year:

Barchart of Price vs. Year:

I was curious to see why the trend of prices decreasing as the age of the car was not followed in the year 2015. I noticed something interesting when looking at the number of listings per year.

Because many cars are leased for 3 years and returned, the number of cars that are 3 years old being sold is significantly higher! This could be a factor as to why the sale price in 2015 was higher than in other years.

Another large factor in the resale of cars in the mileage. The following is a histogram of the number of used cars sold by mileage:

The density plot:

The values for Mean and Median:

A hex chart of price vs. mileage:

A density plot of price vs. mileage:

When comparing by "make" since the filter of less than $15,000 was set, the data could be skewed for luxury brands in particular, but the following shows a general view of the resale prices categorized by make:

The following is price categorized by the body style:

For those who are green, here is a breakdown of the resale price by energy source. (Note: there was only one result for "Alternative" energy source, so the data may not accurately reflect the category.)

Breakdown by Transmission:

I compared the automatic vs. manual transmission purchases with a two-sample t-test to see if the means were statistically different:

They were statistically different! A car with an automatic transmission sells for approximately $1300 more than a car with a manual transmission.

Many people are concerned with the title status of a new car. Here is the breakdown by title status:

An interesting comparison was between resale values of cars that had accidents, and cars that did not:

It is clear that No Accidents Reported had a significant impact on the resale price:

Ultimately the two-sample t-test showed that the means of these categories were different. The mean difference was $1150.

Finally, based on the data, here is a breakdown of the models with the top resale value:

 

Conclusions:

1. An automatic transmission resells at about $1300 more than a manual transmission.

2. A used car without accidents reported resells at about $1150 more than a car with accidents reported.

Sedans resell better than other body types of used cars. (By observation, t-test not performed.)

 

These results can be applied not only for those who are interested in purchasing a used car, but also for:

- Those considering purchasing a new car.
- Whether or not to buy or lease a new car.
- Whether to sell a currently owned used car or keep it.

About Author

Anthony Parrillo

Anthony Parrillo

A passionate, intuitive problem solver using critical thinking and creative strategies with data to find meaningful insights to deliver practical, profitable results.
View all posts by Anthony Parrillo >

Leave a Comment

Avatar
Anthony Parrillo June 7, 2018
Thank you for your comment and question Bernardo. Nice graph also! If the purpose of the graph is to display exact precision of year vs. price, then you are correct. My purpose in creating this "jitter" plot was to allow the viewer to more naturally see the overall upward trend between year and price. A jitter plot takes plots for the years (ordinal data) and moves them slightly to the left or right so that all the dots are not lined up vertically. Another purpose in using the jitter plot was to represent the data in a way different from the following graphs which show the data aggregated by year, which results in a more discrete visualization (e.g. boxplot, bargraph).
Avatar
Bernardo Lares June 7, 2018
Thanks for sharing... quite intereseting. But why, in the scatter plot of price vs. year, you have points everywhere instead of only years? Shouldn't you get something like https://imgur.com/a/cM8EiWN ?

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp