Popularity and Price Data evaluation of used cars
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Carmax is the largest used cars retailer in the United States with stores across the country. Thousands of cars are found in each of these stores and they are traded by buyers and sellers every day. This data study will help those who are planning to buy a used car and intends to know a precise price range for the used cars based on mileage and year of the car. As is known, the price can vary drastically for a car with the same characteristics from location to location. This project can be used to see the most available cars at a location and also the various features that helps customers decide on a car.
Data Collection
The data was collected from the carmax website using scrapy and selenium. As of now, only the state of Virginia is used for data collection. All the sedans at a location in virginia is scraped and stored as csv files. Scrapy was used to loop each location and each car in the location. Selenium was used to make clicks to get used at each location, to click type and filter by sedans as well as to get all the cars in 25 mile radius. Below are the screens for each click.
-
Click all 'Used cars at this location" for each of the 10 stores in Virginia.
2. Click on Type and then Sedan to filter only sedan at the location
3. Choose filter by distance to filter only cars at the store. Hence choose 25 miles
All the duplicate links to each of these cars are filtered by scrapy.
Analysis
a. Most available used car in Virginia
The data shows that the most popular used sedan in carmax across Virginia is Honda. Toyota makes it as the 7th most available car and Nissan as the 4th most available. Also to note is that years 2012-2014 are most available on the lots in carmax. One possible reason for this can be that the warranty for the car has run out based on mileage or years.
b. Most popular features which attract customers
The word cloud displays popular features that can possibly attract more customers apart from the price and mileage. As seen in the word cloud, "Auxiliary Audio Input" and "Cruise Control" are some of the attractive features apart from leather seats and rear view camera.
c. Price range of sedans in Virginia
Most used sedans are evaluated between 10K - 20K in carmax across Virginia
d. Price range for used sedan based on the year
As seen in the plot, there is a larger price range for 2013 and 2016 used sedan. This large range can be due to more luxury cars available in the market. As luxury cars have only 4 years of warranty, 2013 cars maybe traded in more often than others thus resulting in larger marginal difference in price.
e. Heatmap to display price range
The heat map can be used to display average price of each brand of car on a location. Year of car is not taken as a factor here which can significantly affect the price. However, we can still see that cars such as Audi can vary across locations, for example in Gaithesburg and other location. Similarly, cars such as Mazda have roughly the same average price point across all locations.
f. Scatter plot for price vs mileage based on year of car
