Car Sales Report R Shiny App
Presentation Video
Introduction
Every road carries a wide variety of styles of car. Cars aren’t just a means of transportation, but also status symbols and representations of their owner’s lifestyle choices. Both reason and emotion can drive the choice of purchase, which is why understanding who buys what kind of car and why is so crucial. With the power of data, car companies and car marketers are in a better position to launch a best selling model rather than have to regret a missed opportunity. To make it easier to access the relevant data, I built an R Shiny dashboard to bring the marketing insights surrounding car purchases to light. It’s based on a car sales report from January of 2022 to December of 2023. My goal was to explore how gender, income, and vehicle characteristics affect what types of cars people are more likely to purchase, and how this could help inform marketing strategies for car companies/car sales people.
Data
The data comes from a Kaggle dataset called 'Car Sales Report', which analyzes 30 different car companies. The dataset has 16 columns and 23,906 rows of observations. While all the rows were useful for this project, the columns that were used were narrowed down to company, model, gender, annual income of the buyer, price of the car, transmission, color, and dealer region. One thing about the data to make note of is that 79% of the consumers were men. With women making up only 21% of purchases, we have over three male buyers for every female.
Question
This app allows us to use dynamic filtering and comparisons to find buyer patterns. The question to explore using this data is: Is there a specific type of car or feature a person is buying based on their demographic?
Analysis
When we explore the data by looking at the different variables for each category, we can see what relationships exist. The plots were broken down by female, male, automatic, manual, black, red, white, and average car price for each company. Overall, we can see positive linear relationships, except for where the average price is involved. Initially, the plots that include average price don't show much of a relationship, though it’s possible that a closer look will reveal a somewhat more nuanced relationship.
Company Data
The image below shows an exploration of Chevrolet cars.
Next we analyze the spread of the price for each model, the price of each body style, the gender of the buyers, and the number of each color sold by model for each company. The first piece of data that's interesting to look at is the Cleveland dot plot for gender. We can see that the Prizm has significantly more men and women purchasing this model than any other model. From there, looking at the box plots, we can see from the spread of the price of each model that the Prizm has a relatively lower price compared to the other models. The median itself is lower than most models, which indicates that 50% of the buyers are purchasing the car because it is more affordable than most other models. Lastly, we can see that the Prizm has a fairly even number of each color sold, whereas most of the other models heavily lean towards one or two colors.
Average Annual Consumer Income vs Average Company Car Price
The image below shows the average annual income of the consumer and average price of the car by company.
Observing the scatter plot, we can see a small positive linear relationship between the average annual income of the consumer and the price of the car by each company. Though the r-value is low at .22, it still represents some relationship as opposed to no relationship. This dynamic plot allows us to look at specific companies if we'd like. As we can see, Audi, BMW, and Buick have been deselected, and the point is no longer on the grid.
Annual Consumer Income vs Car Price by Model
In the first image below, the company Chevrolet and model Prizm is featured.
In the second image below, the company Hyundai and model Sonata is featured.
In this dynamic density plot, the body styles Hardtop, Hatchback and the dealer regions Aurora, Middletown, and Scottsdale were deselected. Looking at the first image, in the annual income density plot, we can see peaks around $800,000, which means that people who buy a Chevrolet Prizm SUV are more likely to buy it if they have an annual income around that amount. If we look over to the price density plot, where we can see peaks, people were more likely to buy the car if it cost approximately $22,000. Looking at the second image, what's interesting is that the peaks for the Hyundai Sonata were where the consumer's annual incomes were around $100,000, which is significantly lower than the consumer's annual income for the Prizm. However, the peaks for the car price occur around $22,000, just like the Prizm. So even though people who bought the Prizm were making more money, they still bought a cheaper car.
Buyer Preferences
In the image below, the company Chevrolet and model Prizm were chosen.
The Sankey plot tells us how many people bought each type of transmission and color based on their dealer region and gender. We can see that 85 men bought this car. While using the app, we can see other factors as well. If we look at the second image, in this dynamic Sankey plot, we can focus on the Hardtop body style and dealer region Pasco. We can now see more details, such as 23 men and 11 women bought this car from Pasco. Out of the 22 people who bought a manual car, 15 were male and 7 were female. Out of the 12 people who bought an automatic car, 8 were male and 4 were female.
Looking at colors can also reveal not only that white is the most popular color for the model overall, but also how gender correlates with the choice, as well as the breakdown for automatic and manual options for the car model in each color. For example, 15 people bought white cars, 5 of which were automatic (3 males, 2 females) and 10 of which were manual (7 males, 3 females). Also, 10 people bought red cars, 5 of which were automatic (4 males, 1 female) and 5 of which were manual (2 males, 3 females). Lastly, 9 people bought black cars, 2 of which were automatic (1 male, 1 female) and 7 of which were manual (6 males, 1 female).
Totals
The images below show the total revenue and total sales for each dealer region.
The final image shows the totals for each variable using a horizontal bar plot. We can choose from dealer region, annual income, transmission, gender, company, color, and body style. In this case, the dealer region was chosen, and we can see that the order for revenue (amount of money brought it) and the number of sales do not change for the first 3 but do change for the bottom 4. So while Pasco made more sales than Aurora and Greenville, their sales did not bring in as much revenue, meaning the cars they were selling weren't as expensive.
Conclusions
- Company Analysis: Here you can analyze the gender of the buyer, the price of the body styles and models, and the color.
- Model Analysis: This where you can compare the price of the car and the income of the consumer overall, as well as by the model, body style, and dealer region.
- Features Analysis: Here you can follow a Sankey plot for each model to see what transmission and color consumers prefer based on dealer region and gender. You can also narrow it down by body style.
- Total Revenue Analysis: This is where you can analyze how much money each variable (gender, company, annual income, color, transmission, and body style) is bringing in.
- Total Sales Analysis: Here you can analyze how many sales each variable (gender, company, annual income, color, transmission, and body style) is bringing in.
Future Works
In the future, I would like to create a predictor that predicts what type of model someone would purchase based on their gender, annual income, and their car preferences. I would also like to create a page to compare information about companies side by side (combining the concepts I have under the Data slides).