Post-Election Reflection

November 23, 2020

With the outcome of this year’s much-anticipated election finally solidified and a new president-elect for the nation, we take a look at how our model performed.

Recap of Model

To briefly recap our model, we created a weighted ensemble using three linear regression models based on polling data, COVID-19 deaths, and demographic data, as follows:
Model Equations
Then, we assigned the following weights to the models:

Model Equations (Full details of how we constructed our model can be found here.)

Our model produced the following electoral vote count predictions:

Below is a plot of what our model predicted compared to what the true outcome was in each state, with blue representing a Biden win and red a Trump win:

Prediction vs. Outcome Maps

The final electoral vote count was

Thus, our predicted electoral vote count was off by 8 points and incorrectly predicted the outcomes for four states: Arizona, Georgia, Florida, and Iowa.

Accuracy of Model

Using the vote counts last updated on November 17, we can calculate our model’s root mean square error (RMSE), a measure of how far our predicted values for each state are from their true values. The RMSE of our model is 4.35.

Below is the distribution of Trump’s actual vote share minus our model’s predicted vote share:

Actual - Predicted

The error appears to be relatively evenly distributed, with a mean difference of 0.62 percentage points.

We can also segment this by the candidate who won each state. The average percentage point difference of the true two-party popular vote share from the predicted vote share for states Biden won is -2.49, whereas in states Trump won the average difference is 3.72. Below, we present the absolute difference between actual and predicted vote share to better compare the distributions:

Actual - Predicted by Party

We see that our model exhibits slightly larger errors for states in which Trump won. Taken together with the number above, we find that in general, our model overpredicted Biden’s vote share and underpredicted Trump’s and was slightly more accurate for states in which Biden won.

Inaccuracies

What can explain the inaccuracies in our model, and in particular, the general tendency to overpredict for Biden and underpredict for Trump? We consider some possible explanators:

What if we had just used polling data, without adjusting for demographics and COVID-19 deaths?

Let’s look at how accurate our model would have been if we just used a simple linear regression model of polling support, essentially weighting the demographic model and COVID-19 model to 0.

The overall RMSE is improved to 3.56; however, Trump’s predicted electoral college vote count is 269, producing a larger electoral vote count error of 45. Below charts the difference between actual and predicted vote share for Trump under this new model:

Actual - Predicted

It seems that although the COVID-19 and demographics data could have added more noise to our model, they added some predictive value overall.

Other Potential Tests

To determine whether some of the hypotheses regarding inaccuracies outlined above hold true, if the data were available we could also perform the following tests:

In the Future

Considering the increasing divide in urban and rural split in partisanship, one change that we may consider implementing in future iterations of our predictive model is to include a variable accounting for this characteristic of voters. If the split continues to deepen, we may see this as a strong predictor of voter behavior and thus the potential for a model with a higher accuracy.

In addition, a key lesson we can take away from this election is that overall voting trends remained remarkably stable from 2016, even in extraordinary times like now. The map below1 depicts the change in voting patterns by county from 2016 to 2020:

County Swing

We see that with the exception of the southern Texas border, most areas show slight changes from 2016; the pale blue shift in many places was enough to push Biden over the edge.

As Hopkins (2017) finds, political polarization by geographic regions has resulted in races in which candidates only need to concentrate their efforts in a handful of battleground states, and even then, the expanse of these battlegrounds is diminishing. If this trend continues, future models may need to assign even greater weights to aspects based on historical trends to take into account this stability of the electorate.


1 Modified from code provided by Prof Enos.