Road to Victory 2024 Presidential Election
The 2024 Presidential Election is coming up this year so I thought it would be interesting to visit this topic for my Python EDA(Exploratory Data Analysis) project.
Running a political campaign might seem to be very different from running a business, but the two have a lot in common. A political campaign has to deal with limited resources to deploy within a limited time frame just like any other business. That being said, inevitably, deciding which states to prioritize is crucial for campaign managers.
I approached this overarching question by looking at it through two different lenses. First of all, I explored how the electoral votes are distributed among states and which states seem to hold more "deciding power" during the election. Second of all, I took a deep dive into the historical election results to see how the political landscape changed over the past few decades. That included identifying the swing states and the battleground states. Given limited resources, it makes sense for candidates to prioritize states that hold a lot of "deciding power" and present a chance to win.
For this project, I primarily focused on the Democratic Party and the Republican Party as they are the two major parties that dominate the presidential election. Last, but not least, I was interested in finding out if there is any correlation between voter demographics and the party they voted for during an election. We will get to that part later.
The dataset I used for the first two parts of the analysis is the U.S. President 1976–2020 election results published by MIT Election Data and Science Lab. While this dataset is very clean and organized to begin with, it has very few features. Consequently, the biggest challenge for me was to come up with new features that are going to help us with the analysis. I also added several state level data points, including the number of Electoral votes, 2022 citizen and total population, and 2022 GDP in USD to the dataset.
For the third part of the analysis, I used the 2016, 2018, 2020 and 2022 Validated Voter Demographics Table published by Pew Research Center. Notice that they only published the summary table so I do not have access to the underlying data where voter demographics is linked to the voter behavior during an election. This limited my ability to run any logistic regression models to predict voter behavior in this project. However, patterns can still be identified just by analyzing the summary table. I looked into how features such as gender, age, race, education, religion status and community type affected the party voters voted for during an election in this part of the analysis.
Exploratory Data Analysis - Part I
For this part of the analysis we are going to focus on Electoral votes. The President and Vice President are not directly elected by the popular vote but through the Electoral College. Each state is assigned a certain number of Electoral votes based on its representation in Congress (the total number of senators and representatives).
Most states use a winner-takes-all system, where the candidate who wins the popular vote in a state receives all of its Electoral votes. However, Nebraska and Maine use the "congressional district method", where they allocate two Electoral votes to the state's popular vote winner, and the remaining Electoral votes to the popular vote winner in each congressional district.
Let's take a look at this bar chart that shows the number of Electoral votes in each state. California, Texas, Florida and New York are the top states in terms of the number of Electoral votes and "deciding power". These four states happen to be the top 4 populated states as well so is there a relationship between the population and the number of Electoral votes?
As shown in the scatter plot below, a 0.999 Pearson correlation coefficient and a p-value far below 5% indicates a strong correlation between population and Electoral votes. When we think about the deciding power each state holds during the election, what else can we look at besides the number of Electoral votes?
Here I took the number of Electoral votes divided by the citizen population in millions to find the number of Electoral votes per million citizens in each state. Plotting the data points into another bar chart shows that Wyoming, Washington DC, Alaska, Vermont hold a lot of “deciding power” based on this metric. One million citizens in the state of Wyoming have three times more “deciding power” compared to one million citizens in the state of New York. By using different metrics, we get a very different answer to the question “which states hold more deciding power during the election?”
This brings us back to our main research question: Which states should we focus on during the election campaign: states with larger numbers of Electoral votes or states with higher Electoral votes per MM of citizen population? The answer is: it depends! We need to take into account more factors.
Before proceeding to the second part of my analysis, there’s another question to explore: Should we consider other factors such as state GDP per capita when distributing Electoral votes? Currently there seems to be no strong correlation between the state GDP per capita and the number of Electoral votes.
Now let’s dig into this hypothetical question a little bit deeper. In the bubble chart below, we are plotting the number of Electoral votes per million citizens in each state on the y-axis with high value towards the top and low value towards the bottom. The log of 2022 GDP per capita in USD for each state is on the x-axis. The bubble size represents the total population of each state.
You can see from the bubble chart that states such as Wyoming, Alaska and Vermont that generate lower GDP per capita seem to have greater “deciding power” during the election. The small states were given additional power to prevent politicians from only focusing on issues that affect larger states. Do you think the current setup is reasonable?
Exploratory Data Analysis - Part II
Change in Political Landscape
Now let’s travel back in history and look at how the political landscape has changed since 1976.
Here is the animated electoral college map from 1976 to 2020. I used the color red to represent the Republican party and the color blue to represent the Democratic party.
Now let’s focus on the election results on the national level first.
In this chart, I plotted the Democrat votes, Republican votes aggregated at the national level and the difference between the two for each election year.
During the past 12 elections, there was an even split. Republican candidates won 6 times, and Democrat candidates won 6 times. However, Republican candidates only won 4 times as measured in terms of popular votes, while Democratic candidates won 8 times. The winner-takes-all approach seems to benefit the Republican party more than the Democratic party.
Total voting population is gradually increasing throughout the years as well, which could be driven by the increase of population and the increase in voter turnout.
In this lollipop chart below, I plotted the margin of victory by election. This data point is calculated by taking the difference between the two party votes aggregated at the national level, dividing it by the total votes, and then taking the absolute value of it.
Four observations emerge from this plot:
- On average, the winning party has a 6% lead over the other party.
- Two of the largest lead in the 1980 and 1984 election was Ronald Reagan’s defeating first Jimmy Carter and then Walter Mondale.
- The 2000 election was essentially decided by the Supreme Court of the United States.
- This raises the question of whether or not our country is becoming more divided after the 2008 election as the vote margin difference as measured by percentage is below the average?
Now, let's take a closer look at the state level election results. This heat map plots election year on the x-axis. On the y-axis, we have our 50 states on the left hand side, on the right hand side, I calculated the state level votes difference in percentage between the Republican party and the Democratic party. A positive number means the Republican candidate won that state in that year. Conversely, a negative number means the Democrat candidate won that state in that year. Sticking to our color code, red indicates a Republican lead, and blue indicates a Democrat lead. The bigger the lead is, the darker the color gets.
I admit that it is difficult to spot any patterns from the heat map above. What if we group the states by their geographic regions?
From below heat map focusing on southern states, we can observe some key turning points:
- Delaware turned blue during the 1992 election and remained a blue state after that.
- Election years 1992 and 1996 were the last two elections that Arkansas turned blue. Bill Clinton is the last Democratic candidate to win the state of Arkansas.
- Most of the southern states stayed solid red states during and after the 2000 election.
In the heat map below that focuses on midwest states, we can see that Illinois turned blue during the 1992 election and didn't change its party alignment after that.
The heat map below that focuses on mountain states shows New Mexico turned blue for Bill Clinton in 1992 and 1996 but flipped for George W.Bush in 2000 and 2004. It flipped again for Barack Obama and remains a blue state ever since.
Now let’s look at the northeast region and Pacific region together. You can see that all the Blue states that we know today turned blue before or during the 1992 election, which marked a significant shift in the political landscape.
California joined Washington and Oregon in the blue column, and the Northeast mostly solidified as Democratic states. The “blueing” of the east coast and west coast is likely the result of a combination of 1) the changing demographics where the voter base became more diverse and 2) the rightward tack of the Republican party that started in the 80s during the Reagan administration.
Now we have a general idea in terms of which states tend to vote Democrat and which states tend to vote Republican. The next thing we need to look into is how “sticky” their preference is. To assess that, I calculated another feature that tells us how many times a state flipped their party preference from 2000 to 2020. Note that I didn’t include elections from years prior to 2000 for this part of the analysis because I think they are less relevant to the present.
Here are our top 10 swing states that flipped at least two times in the past six elections, and we will take another look at those states in the final assessment.
This feature tells us how many times a state flipped its preference, but it doesn’t tell us how competitive the race was. To address that, I calculated how many tight races each state had from the past six presidential elections. I defined a tight race as an election where the absolute value of the percentage difference between Republican votes and Democrat votes is less than 5%. Then, I counted how many tight races each state had between 2000 and 2020.
Here are the top nine states that had at least three tight races in the past six elections. Using Florida as an example, five out of six times, its vote difference was under 5%. I think we can all agree that Florida is a battleground state. We will take another look at those states in the final assessment as well.
Also, let’s not forget about states that hold a lot of "deciding power" measured by the number of Electoral votes they carry and the number of Electoral votes per million citizens. We will also take another look at those states in the final assessment.
Now let’s recap what we have gone over so far. We want to know which states should be prioritized during the election campaign this year, and we believe that states that hold a lot of "deciding power" and present a chance to win should be the ones to focus on. To quantify that, we came up with four features, and we are going to select the top states under each feature for the final assessment.
To narrow down the list of our focus states, we will take a closer look at the 27 states identified by the features we calculated earlier which have the following three characteristics:
- States have the most "deciding power" measured by the number of Electoral votes and the number of Electoral votes per million citizens.
- States flipped at least two times between 2000 and 2020.
- States had at least three tight races from 2000 to 2020.
Here I plotted these 27 states in the area charts and divided them into 5 groups based on the common characteristics shared within each group.
Our first group is what I call the "solid blue states." The value plotted on the y-axis represents the state level voting difference in percentage between the Republican party and the Democrat party. Given that we are looking at the blue states, these values are in the negative zone. So what common characteristics do these states share? There are two: 1) States in this group are loyal to the Democratic party as they haven’t changed their party preference in the past six presidential elections 2) The Democrat party maintains a double digits lead over the Republican party.
Given how reliably Democrat those states are, it makes sense not to deploy significant resources to campaign in those states. The Democrats already have them, and the Republicans don’t have a real chance of winning them over.
Before we move on, I also want to point out one more thing. Note that Democratic lead seems to be decreasing in the state of Hawaii for the past four consecutive elections. The same trend can be observed in the state of New York for the past three consecutive elections. I wonder what happened there?
Our second group is what I call the "solid red states." States in this group haven’t changed their loyalty to the Republican party in the past six presidential elections, and the Republican party maintains a double digits lead over the Democrat party in all those states. Just like it doesn’t make sense to deploy resources to decidedly Democrat states, it doesn’t make sense for either party to try to make an impact on states that are always going to vote Republican. It’s also interesting to note that the Republican lead has been decreasing in Alaska for the last consecutive six elections. I wonder what happened there?
Similar to the second group, states in the third group below haven't changed their loyalty to the Republican party for the past six elections. However, the Republican lead has been declining, and it eventually dipped into the single digit zone in the most recent elections.
I would recommend focusing on this group in the election campaign this year. They might present risks to the Republican party, but they also represent opportunities for the Democratic party.
Let's look at the fourth group, which I call the battleground states. States in this group had at least two tight races in the last two consecutive elections. Recall that in earlier analysis, I defined a tight race as an election where the absolute value of the percentage difference between the two party votes is less than 5%. I would recommend focusing on this group in the election campaign next year.
Last, but not least, let's take a look at the states that haven't been covered in the first four groups.
Indiana has been a loyal Republican base since 2000. Republicans won the state by double digit margin in five out of six elections. The exception to this general rule was the 2016 election when Obama flipped the state. If the Democrat party ever had another candidate that could appeal to the voters there, how safe is this state to the Republican party?
Let’s look at Minnesota. It has been a loyal base to the Democratic party, though its lead has only been in the single digit zone, which indicates a “not so blue” status. It would be wise not to consider Minnesota a safe state for the Democratic party in the election year. The same may be said of New Hampshire, Iowa and Ohio, which flipped several times in the past six elections. Note that Clinton almost lost New Hampshire in the 2016 election. She won the state by a mere 0.4% margin, which translated to 2,736 votes. That’s why I would recommend keeping an eye on this group. I would also recommend prioritizing Arizona, Florida, Georgia, Michigan, Nevada, North Carolina, Pennsylvania, Texas and Wisconsin during the 2024 Presidential Election campaign.
Observations and Future Work
The first two parts of the analysis looked into “how” the political landscape has changed in some states and future work can be done to focus more on “why” certain changes happened in those states. One way to approach this question is to look into how the voter base changed over time.
Exploratory Data Analysis - Part III
Now let’s talk about voter demographics.
Let’s first look at gender. Please note that I used orange to represent Republican candidates and blue to represent Democrat candidates. The charts show that more than half of female voters voted Democrat, and more than half of the male voters voted Republican.
Now let’s look at the age factor. From the plot, approximately two thirds of the voters aged 18 to 29 and voters aged 30 to 49 voted Democrat in the past four elections.
The pattern is reversed when we look at voters that are older than 50 years old. In that category more than half of voters voted Republican.
Let’s look at the education factor. From the plot, more than half of the voters who received at least 4-year college education voted for Democrat. The pattern reversed when we look at voters who received some college education or less.
Now let’s look at race. More than half the White voters voted Republican, and more than 90% of Black voters voted for Democrat.
Around two thirds of the Hispanic voters voted Democrat, as did an average of around 70% of the Asian voters.
Here is something interesting to notice. Recall that when we looked at race alone, Caucasian seems to be the only group in which more than half voted Republican. If we combine the race factor and the education factor to further split the group into voters who received “4-year college and plus” vs. voters who received “some college and less,” we can observe the inconsistency between the two subgroups. More than half of the voters in the subgroup "White, non-Hispanic college grad +" voted Democrat, and the pattern reversed in the other subgroup "White, non-Hispanic some college or less".
Does this mean that the education factor might be a better predictor than the race factor? Or is the combination of race and education a better or more significant predictor?
Observations and Future Work
Using gender as an example, we see that more than half of the female voters chose to vote Democrat. From there, can we infer that women are more likely to vote for the Democrat candidate? We probably can’t answer this question without further quantitative analysis.
From these plots, we can see that the pattern seems to be consistent from election to election within each category. That indicates the possibility of building a regression model that predicts voter behavior based on voter demographics if the dataset is accessible.
Note that we only looked at the people who voted during the past four elections in this part of the analysis. What about people who did not turn out to vote? Would any patterns we observed so far be the same? What’s the reason that they didn’t vote? Should you spend resources and energy on changing people’s mind, or should you focus on getting people to vote if you knew they would support your candidate? Those questions are worth exploring.
Credits & Links
The presentation template I used was created by Slidesgo, and includes icons by Flaticon, and infographics & imagines by Freepik