Usability Analysis of the Palm Beach Ballot Controversy

 Paul Resnick (web page; email)

Associate Professor

University of Michigan School of Information

This is version v3.1 last modified 11/13/2000 06:57 PM EST

Click here to check for newest version of this document

Click here for archive of all previous versions

Differences from previous version:

 

This site  is intended to document some of the controversy surrounding the Palm Beach County, FL ballots for the 2000 U.S. Presidential Elections, from the perspective of the field of HCI, or human-computer interaction, which studies usability of all kinds of systems. The distinct contribution this document makes, as compared to the many other analyses out there, is to consider explicit models of voter behavior, and hence provide an underlying theory for the statistical models. If extra Buchanan votes were coming from Bush, then we would also expect to see extra votes for McReynolds coming from Gore, according to the only plausible psychological models where Bush voters vote for Buchanan. We do in fact see some extra votes for McReynolds, but their quantity puts an upper bound on the possible prevalence of the error type that would lead Bush voters to vote for Buchanan. I encourage people to send me additional information or analysis that you think might be helpful to include here, or links to other related analyses.

Table of Contents

Related Analyses
Models of Ballot-Punching Behavior
Evidence: Which Model(s) Did Voters Use
Remedies for This Election
Remedies for Future Elections

Related Work

More sophisticated statisticians than I have been hard at work on this. There are now so many sites that I've lost track, but see Jonathan O'Keefe's list. The analyses that have most influenced my approach are listed below. They are also cited directly in the sections where I use them.

The Ballot, Some Models of What Voters Might Have Thought and How they Would Have Behaved

Here's an AP Photograph of the presidential portion of the ballot in Palm Beach County, FL. 

Voters are supposed to punch the hole in the middle of the ballot corresponding to one President/Vice-President ticket.

From an HCI perspective, the first thing to point out is that the voter must establish a correspondence between candidate tickets and holes in the middle. Contrast this with a potential selection method where voters would directly manipulate the names. The need to establish a correspondence in the mind of the voter creates the potential for voters to establish an incorrect correspondence. 

There are a number of ways a voter might establish a correspondence between candidate tickets and holes in the middle, detailed below. The first of these would cause all votes to be counted as intended; others would lead to particular patterns of invalid ballots and/or incorrectly counted ballots. Each of these models of how voters established a correspondence leads to predictions about patterns of double-punched ballots and/or transfers of votes to other tickets than those intended. These are evaluated in the following section.

Model 1: Visually follow the arrows

In the box for each ticket, near the center, there is an arrow. For example, for the Libertarian ticket there is an arrow at the right that seems to point pretty clearly to a particular punch hole. When commentators argue that "it's easy; just follow the arrows", this is the method that they have in mind. This method would establish a correct correspondence, so that voter intended for any candidate would actually be counted for that candidate. 

Model 2: Count based on the printed numbers

In the box for each ticket, right next to the arrow, there is a number. For example, for the Republican ticket the number is 3. It's not obvious where to start counting the holes, but if the top hole is counted as 1, this would cause a voter who intended to vote for the Republican ticket to actually vote for the Democratic ticket, a voter who intended to vote Democrat to have their vote counted for the Libertarian party ticket, and so on. [I am grateful to Charles Henkel for pointing out that if this Method were popular, the net effect would probably be to transfer votes from the Republican to Democrat. The way to detect it, however, is to see whether the Libertarian vote count was surprisingly high, as discussed in the section on evidence below.]

Model 3: Count based on ordinal position

A voter might count the candidate tickets and infer the ordinal position of their ticket, then count off the holes down the center. One possibility would be for voters to count the position only in the left column, so that Republican would be 1, Democrat 2, Libertarian 3, and so on. This correspondence would cause a voter intending to vote Republican to do so correctly, but a voter intending to vote Democrat to have the vote actually be counted for Reform.

One of the interesting things about Method 3 is that while it seems fairly natural for the first few tickets in the left column, it seems unlikely that a voter would construct a cognitive map based on ordinal position that included all the candidate tickets. Consideration of the right-hand column would probably cause a voter to reevaluate the tentative correspondence he or she had established for the left-hand side.

Model 4a: Linear Visual Search

Kevin Fox suggests that voters may have grouped the candidate's names with the line above it, then followed that line over to the center and punched the nearest hole. (See the Linear Visual Search hypothesis on his page for a picture). Using this mapping, a Republican vote would be counted correctly, a Reform vote would be counted for the Republicans, a Democrat vote would be counted for the Reform ticket, and a Socialist vote would be counted for the Democrats.

Model 4b: Alternative Linear Visual Search

Kevin Fox argues for why grouping names with the lines above rather than below the tickets is more plausible. Still, an alternative hypothesis would be grouping with the line below. Under that mapping, a Republican vote would be counted for the Reform ticket, Reform for Democrats, and Democrats for Socialists. This method  is of special interest, because if it were used by a large number of voters, there should be a surprisingly large Reform vote. Thus, it offers a different explanation than Method 3 above, one which suggests that some of the Reform vote could actually have been from Republican voters. If this method were popular, we should also see a surprisingly large Socialist vote.

Model 5: Gestalt Grouping

Kevin Fox suggests another method, visual Gestalt grouping, whereby voters would group together visually the number, the arrow, the hole to punch, and the long line from the opposite side of the page. The long line would then be associated with with ticket below it. This would cause a Democratic vote to be counted for Reform and Reform for Democrat. Since there's no number and arrow opposite the Republican ticket, Kevin Fox argues that Republican voters would not make this incorrect Gestalt grouping at all, and would find some other way to make the association with their candidate.

Model 5b: Alternative Gestalt Grouping

Again, it seems possible that with visual gestalt, the line would be associated with the ticket above rather than below the line. With this method, a Republican voter would mark Reform; a Reform voter would mark Democrat; a Democrat voter would mark Socialist.

Model 6: Punch for President and VP

At least one voter claims that he looked only at the left column of candidates, saw that there were two holes in the center for each ticket, and assumed that he was supposed to punch one for the President and one for the Vice-President. A Republican voter who did this would punch Republican and Reform. A Democrat who did this would punch Democrat and either Reform or Socialist, depending on whether they picked the extra hole on top or on the bottom.

Physical Problems punching holes

In addition to cognitive errors in establishing a correspondence between the voter's preferred ticket and the correct holes to punch, some voters may have had trouble punching the holes they intended to punch. [I am grateful to Andrew Hobgood for pointing out this possibility, and the "fat-pen" method in particular.]

Model 7: The shaky hand

A voter with an unsteady hand might punch an adjacent hole by accident. This would cause some Republican voters to punch Reform, and some Democratic voters to punch both Reform and Socialist (probably in equal numbers). If many voters did this, we should see a distribution of "unexpected ballots" based on these proportions. If a voter didn't punch very far, he or she might not realize it was punched, and then try again to punch the correct hole. This should be reflected in the distribution of double-punches. 

Model 8: The fat pen

My memory of using punch ballots in previous elections (my polling place doesn't use them now) is that it's not possible to simultaneously punch more than one, because there's a solid barrier in between. If it is possible, however, some voters might have done that, and it should be reflected in the distribution of double punched ballots.

Model 9: The misaligned ballot

It is possible that the ballot instructions could be misaligned with the punch holes. Presumably the numbers next to the arrows are meant to alert people to such a misalignment, with the punch holes being numbered in a way that matches the numbers next to the arrows on the instructions. The most likely misalignment would presumably be an off-by-one, either up or down. The effects of such a misalignment would be nearly the same as those for model 7, the shaky hand. One possible exception is that the shaky hand may have been unable to punch hole 2 with a correctly aligned ballot, while a misaligned ballot could lead Republican voters to punch hole number 2. 

Evaluation: Were Voters Really Confused and Did that Confusion Lead to Vote Counting that Did Not Match Voter Intentions?

How could one tell if voters were in fact confused and cast ballots that did not reflect their intentions? There's no way to know for sure, but there are several possible indicators:

  1. During voting, were there a lot of questions asked of election officials that reflected voters making mental correspondences with method 2 or 3?
  2. Did voters punch more than one hole, potentially indicating that they used Method 6 or that they used one method of establishing the correspondence and then decided that another correspondence was the right one, punching once for each. 
  3. After voting, did some voters realize that they had used an incorrect correspondence and express concern to friends, election officials, or reporters?
  4. Did any candidate ticket that would have benefited from incorrect voter correspondences of methods 2 or 3 receive a surprisingly large number of votes?
  5. In controlled experiments with analogous conditions, what kinds of confusions and errors occur?

I discuss each of these indicators below, together with the evidence I've been able to assemble so far on whether they happened or not.

Questions While Voting

I have not seen any reports yet about whether election officials got a lot of questions from voters about the correspondences. Professor Karen Drabenstott points out that the etiquette of polling places might keep most voters from asking questions and privacy requirements preclude election officials from circulating among voters to offer assistance. Hence, voter questions might not be noticed as much inside polling places as outside. Apparently, there was some concern by election officials in advance of the election, and according to Congressman Wexler, a memo was sent to poll workers telling them to remind voters to vote for one candidate and follow the numbered hole (as reported in the Sun-Sentinel newspaper). I also heard one report that county election officials issued an advisory to voters in the middle of the voting day , once it became clear that there were problems, but I have not yet found the reference for that. 

Disqualified ballots

Ballots with more than one hole punch are automatically disqualified. Some 19,120 ballots had more than one candidate punched and hence were not counted (based on reporting in the Sun-Sentinel newspaper). This amounted to more than 4% of the total number of ballots that were counted: 461,988 (according to official Palm Beach County results). One possible explanation for double punching is that a voter used one method to establish a correspondence, realized they made a mistake, and then punched another. 

Another possible explanation is that voters used Method 6 above, punching once for President and once for Vice-President.  The Method 6 hypothesis seems likely to account for most of the disqualified ballots, because there apparently were a similar percentage of double-punched ballots in the 1996 election (Republican commentators have made this claim, but I haven't yet been able to track down the hard numbers; citations from readers would be appreciated). We should also expect to see similar percentages of double-punched ballots in other counties if Method 6 were the primary explanation for the double punches (Again, pointers to data from readers would be appreciated). 

There are other potential explanations for the large number of disqualifications, including the possibility that voters changed their minds about who to vote for while they were voting, that they didn't realize they could only vote for one, or that they just had no clue about the whole process. There is no way to tell for sure whether the cause of the double punching was confusion about how to establish a correspondence or one of these other explanations. However, there are ways to make some inferences.

Perhaps most instructive would be to examine the nature of the double-punches.  I have heard some (Democratic) commentators argue as if all 19,120 ballots were double-punched as Democrat and Reform tickets. I presume that this is not the case, and that the number 19,120 includes ballots that had other double-punches as well.  In order to make inferences about the prevalence of the methods above, it would be extremely useful to get the raw data about the actual double-punches. I hope that this data will emerge as the controversy continues, and that someone will alert me when it does, so that I can link it in here and complete the analyses that it would enable. If they almost all reflect the same confusion about the correspondence (e.g., if the two punched holes were for Democrat and Reform, which could result from a voter intending to vote Democrat and switching between correspondence 1 and 3 above) rather than being more uniformly distributed, that would provide further evidence for a confusion about correspondences, and in fact evidence for one particular confusion. 

News flash: Bob Spence reports that Palm Beach election officials unofficially reported the double-punch numbers for a small sample on 11/11. The initial data seem to support a far higher prevalence of the Method 6 error than Method 3, but I will await further confirmation of this data before making strong conclusions.

Voter complaints

Congressman Wexler reports (RealAudio)  that his office received hundreds of complaints from voters on the day of the election, even before it became apparent that the Florida race would decide the election. Many of these voters' explanations of their confusion points to their using correspondence 3 above. But other complaints are consistent with other models above. I have not yet seen any complaints that are consistent with model 2 above, so that probably wasn't a problem, and the statistical analysis below confirms that.

Surprising Votes for Minor Party Candidates

In the initial count, there were 3,407 votes (.79% of the total) for Buchanan and the Reform Party ticket. These voters may, of course, have intended to vote for the Reform ticket. Some have suggested that this was unlikely, however, because the area votes heavily Democratic and many of its Jewish voters would never vote for Buchanan. Even statewide, where he might be expected to be more popular, Buchanan's percentage was much smaller (.29%). Several of the related analyses cited above show that Palm County was indeed a statistical outlier in its results. Patrick Buchanan himself said, "I don't want to take any votes that don't belong to me," adding that he had not campaigned in Palm Beach and that the majority of those votes probably belonged to Gore. (Sun-Sentinel report). In one of the Congressional races in the area, however, the Reform Party candidate did receive 2,651 votes (2.09% of the total in that race) which could suggest that Palm Beach County did have an unusually large number of Reform Party voters. It would be useful to include party registration figures as variables in the regression models that are being used to determine whether the Buchanan vote was surprisingly large or not (thanks to Alan Davis for pointing this out), but I am not aware of the availability of that data as of yet. 

Extra Buchanan votes, even assuming they were by mistake, could have come from Republican voters rather than Democrats, under Methods 4b or 5b, or Error 7. Under Model 2, some of the Democrat punches might even have come unintentionally from Republican voters.

In order to distinguish among some of these alternative models for errors, it is helpful to analyze the expected votes for the minor party candidates, based on voting in other counties, and then look at variation across precincts within Palm Beach County. I pretty much follow the lead of Professor Henry Brady's analysis, where he estimates the likely frequencies of two kinds of errors (Bush voters voting for Buchanan, and Gore voters doing the same) but using the more theoretically grounded error models. So far, I have restricted my analysis to the decision errors of models 2 through 5b, and not the physical errors of models 7-9. We can ignore the effects of errors made by voters intending to vote for minor party candidates, since with small error rates these will amount to very small numbers. The errors group into three classes, based on their effects:

It is possible that some of these errors would afflict voters for one candidate more often than voters for another candidate. It is not obvious, however, that this should be so, and for the purposes of this analysis, I assume that each type of error occurs with the same frequency across voters for each candidate. For example, if model 2 errors occur with frequency f2, then a fraction f2 of Bush votes would go to Gore and a fraction f2 of Gore votes would go to Browne.

Suppose that the first class of errors occurs with frequency f2, the second with frequency f3, and the third with frequency f4.

We can dispense with f2 first. Burt Monroe's analysis, however, suggests that Browne actually received fewer votes in Palm Beach County than might be expected from a regression model of the rest of the state. But model 2 errors cause extra votes for Browne. This suggests that the frequency f2 of Model 2 errors was quite low, and for the purposes of the rest of this analysis, I shall assume f2=0.

Now consider f3 and f4. To estimate these percentages, I compare the expected and actual votes for Buchanan and McReynolds.

Using Monroe's statistical models, I get the following two equations (results taken from Monroe's analysis, version 3.1):

Buchanan: ln(OPB/OGB)  = -2.29 - .259ln(votes)

McReynolds: ln(OMc/OGB) = -4.01 - .449ln(votes)

Treating ln(votes) as a stand-in for county population (rather than precinct population), ln(votes) becomes a constant within the county rather than a variable. So the prediction for the true Buchanan vote, TPB, is 542. The prediction for the true McReynolds vote, TMc, is 8.

With error percentages f3 and f4, we would expect to observe the following (the derivation is similar to that in Brady and is abbreviated here):

OGB = TGB(1-f4)

OPB = TPB(1-f3-f4) + TAG(f3) + TGB(f4)

OAG = TAG(1-f3-f4)

OMc = TMc(1-f3-f4) + TAG(f4)

f3 + f4 will add up to just a little over 1%, and TMc and TPB will be small enough that it's not important to get an exact estimate for them in order to simplify the equations:

OGB = TGB(1-f4)

OPB = TPB(.99) + TAG(f3) + TGB(f4)

OAG = TAG(1-f3-f4)

OMc = TMc(.99) + TAG(f4)

Looking at the county as a whole, we have the following values:

OPB = 3407; TPB = 542 (predicted from the model); OMc=302; TMc = 8 (predicted from the model).

The values of f3 and f4 that satisfy the equations above are:

f3 =0.993%

f4 =0.108%

With these estimates of f3 and f4, the total "incorrect votes" would be as follows:

Bush would be undercounted by 165 votes;

Gore would be undercounted by 2994 votes;

Buchanan would be overcounted by 2865 votes;

McReynolds would be overcounted by 294 votes;

The last two, of course, are determined by the original regression models. The model provides an inference about how many of those extra votes came from each candidate. The power of the psychological models is that they add an extra constraint to the purely statistical models: if the extra Buchanan votes were coming from Bush, then we would also expect to see extra votes for McReynolds. We do in fact see some extra votes for McReynolds, but their quantity puts an upper bound on the possible prevalence of the error type that would lead Bush voters to vote for Buchanan. Most of the extra Buchanan votes, then, must have come from Gore.

(An earlier version of this document included such precinct-level analysis, but I am temporarily withdrawing that analysis in order to verify its correctness).

Experimental Study

In an experimental study conducted in Canada, using a different scenario of voting for Prime Minister of Canada , 7% of a sample of 119 people at a shopping mall made model 3 errors, and no one made any other errors. Among University students, no one made any errors at all, though many students reported confusion.

The Testing and Approval Process

"I'll never use facing pages like that (again)," LePore said Wednesday morning, when the impact became clear. "I was trying to make the ballot more readable for our elderly voters in Palm Beach County. I was trying to do a good thing." (Sun-Sentinel report)

The two-column ballot was designed by the Palm Beach Supervisor of Elections, Theresa LePore, a Democrat, and was approved by the other two people on the canvassing board. The two-column ballot was unusual. In prior Florida elections, there have not been so many candidates, and hence two columns were not necessary. In other Florida counties, they either used smaller type and one column, or two separate pages. 

No mention has been made of whether there was any user testing of the ballots prior to the election. Sample ballots were, however, mailed out to voters in advance of the election [Thanks to Allison Groff for mentioning this; I do not yet have a citation verifying this claim.] This would have familiarized voters with the locations of the candidates, though not necessarily with the hole punch correspondence. 

Remedies for This Election

How to resolve the current Presidential election is beyond the scope of this analysis.

It would be nice, of course, to be able to go back to each of the voters who voted for Buchanan or who punched two holes and ask them who they really intended to vote for. Ballots are secret, however, and there is no way to tell at this point which ballot belonged to which voter. It would probably be possible to tell, approximately but not exactly, who the double-punched ballots were intended to vote for. If there were similar percentages of disqualified ballots in other counties, however, it would seem unfair to do this only in one county. If the percentage in Palm Beach County is high, then it might be reasonable.

Republican analysts appear to be arguing that the election was "ex-ante" fair (while some irregularities are unavoidable, before election day there was no way to know where they would pop up or what they would be, and hence no reason to expect that they would favor one candidate over another) and that's the best we can hope for from an election. Democratic analysts are pushing for an "ex-post" fair election, where every voter who went to the polls should have their vote counted toward the candidates of their choice. I suspect that there's a whole body of literature in political philosophy about ex-ante versus ex-post fairness, so I'll just leave this one for others to wrangle over for now.

Possible Future Improvements: Roles for HCI Principles and HCI Practitioners

One obvious suggestion for future improvements is user testing of all new ballot designs. Subjects could go through a mock voting process, then be debriefed orally about who they intended to vote for. Any pattern of mismatches would then be detected. Note, however, that somewhat large subject samples might be necessary to detect problems. It appears that only about 1 in 20 of the Palm Beach voters were disqualified or cast their ballots unintentionally, though others may have been initially confused but figured it out. This suggests that bringing in half a dozen voters as subjects to test a ballot may not be sufficient to detect the nature and severity of problems.

A number of other HCI evaluation techniques could also be used that involve expert evaluators instead of or in addition to a sample of voters. These techniques include scenario walk-throughs and development of cognitive models of the voter's voting process. For example, the alternative methods described in the first section of how a user might establish a correspondence between tickets and holes to punch reflect (very) informal cognitive models.

Perhaps the most important principle that could be applied is that of direct manipulation. It really would make more sense for voters to directly punch or mark the candidate's name rather than a hole some distance away that corresponds to that name. The hole punches are an artifact of our vote counting machines. New techniques with scanners and OCR (or touch-screens) should enable a more direct-manipulation interface for voting.

Finally, there should be no need for disqualified ballots. Each voter should get immediate feedback from the voting machine if they have voted improperly, and should be given a chance to correct their ballot, or fill out a new one. In computer interfaces, there has been a general trend towards more and more immediate feedback to users about errors in data entry (even web forms now have embedded JavaScript code to validate data entry). In many cases, computer interfaces have gone one step further to simply not allow invalid data entry in the first place, by having all data entry through menus and buttons. For example, in a computer interface, after one candidate was selected, the other candidates would be "grayed out" so that it would not be possible to vote again.

In any case, it seems pretty clear that HCI professionals should be involved in these decisions in the future, rather than leaving it to the intuitions, however well-intentioned, of election officials. Some standardization would probably help, as otherwise there would be too many local ballots to evaluate each election. Perhaps universities that provide HCI training to students could band together to offer a free evaluation service to election ballot designers in the future.

It's probably also important to realize that there will never be perfect user interfaces. Some people will always make mistakes and their ballots will not be counted for the candidate they would have liked. It's important that a democracy minimize these irregularities, and it's almost important that the irregularities be randomly distributed, rather than leading to a biased outcome in favor of one candidate or another. It may be helpful to think of an actual election as having some margin of error or confidence interval, just as we think of opinion polls as having these margins. Here the actual election is a single sample drawn from a hypothetical distribution of things that could have happened on election day. The confidence intervals are much tighter for actual elections than for random sample opinion polls.

But occasionally elections will end in a statistical dead heat (almost certainly this Florida ballot would qualify, and probably the national popular vote count as well, so that eliminating the electoral college would not alter the situation we're in). We need to be prepared as a democracy for how to handle cases where an election simply does not name a clear winner. I have no idea how we should handle it, but we need some way other than both sides jockeying to claim a victory based on the balloting when that balloting was inconclusive. Another election doesn't seem especially useful in such a case, since unless opinions change, it too is quite likely to come out inconclusive.