Final Prediction

November 1, 2020

Using everything we have learned and explored these past seven weeks, we present our final prediction for the 2020 U.S. presidential election.

Prediction Model

To predict the election outcome, we use a weighted ensemble of linear regression models, fitted to the following sets of data:

In other words, our model predicts that the incumbent party vote share is based on popular support, deaths due to COVID-19, and demographic changes.

For each state, we run three different linear regression models as follows:
Model Equations
where

We then construct our final ensemble by adding the results from each of the models above with the following weights (for explanation of choice of weights, see further below):

Model Equations

Here is our final prediction for the 2020 election, with blue representing a Biden win and red a Trump win:

Final Prediction Map

Under this model, Biden is predicted to win with 314 electoral votes and Trump to lose at 224 votes.

The following shows the 95% confidence intervals for each state’s prediction.

Final Prediction Intervals

A few notable states predicted to flip from red in 2016 to blue this year are Florida, Iowa, Michigan, Pennsylvania, and Wisconsin. Battleground states predicted to stay the same are Arizona, North Carolina, Ohio, and Texas. Nevertheless, for all of these states, the predicted vote share is within one or two percentage points of 50%, and the span of their confidence intervals indicate that ultimately the win could go to either side.

Why This Model

Let’s consider why we choose to include each of the variables in our model:

We can look at the magnitude and direction of the effects of each of the above variables on the outcome in a given state. Running each of the three linear regression models on each state’s data produces the following distribution of coefficients:

Final Prediction Model Coefficients

Each of the coefficients can be interpreted as follows:

A measure of how well the models predict past election results are their root mean squared errors (RMSEs). Below shows the distributions of the weighted RMSEs for each state:

Final Prediction RMSE

Most RMSEs fall within the 3 to 6 range with a mean of 4.3. This shows that the model performs relatively consistently across states.

What’s Not In This Model

What about all of the other factors we’ve looked at over these past few weeks? It is important to address what is not included in our model. We claim that the effects of the following factors ultimately have little effect, or cancel out with each other, on this year’s election.

Alternate Outcomes

It is important to acknowledge that the weights selected for our ensemble model, although based on reasoning, may seem somewhat arbitrary. We can conduct a sensitivity test, or look at how much the final results change as we adjust the weights. We try increasing the weight of the poll model by 0.1 until we reach 1 (which creates a prediction solely based on the polling model) and allocating the remaining weight to the COVID and demographic models equally, as before.

Sensitivity Analysis

We can see that the predicted electoral vote count for the incumbent party levels off at 269; in other words, our model predicts a Biden win regardless of the weights. In fact, the change from 0.9 to 1.0 for the weight of the polling model did not change the final electoral vote count. Although there are other combinations of ways to change the weights, this provides us with an approximate sensitivity analysis for our model.

Final Notes

The art and science behind predicting election outcomes is difficult during a “normal” year, much less in this year of many unknowns. With the unprecedented numbers of people turning out to vote early this year and new levels of mail-in ballot counts, it is quite possible that factors like these may affect the speed in which a winner can be determined or even the election outcome. Only time will tell…let the countdown begin!