Response to Green-Armytage, Tideman, and Cosman

Why Statistical Evaluation of Voting Rules underestimates Approval Voting

8 min readSep 20, 2021

Thomas Reasoner and Neal McBurnett contributed to this post.

When advocates of IRV (Instant Runoff Voting, a.k.a. single-winner Ranked Choice Voting or Hare) critique Approval Voting, their arguments often go something like this: Approval Voting may be great when there are low stakes and everyone votes honestly, but it is extremely easy to game, so it should never be used in important governmental elections. They point to IRV as being more difficult to manipulate, and therefore superior.

The most sophisticated critics point to the findings of James Green-Armytage, T. Nicolaus Tideman, and Rafael Cosman (hereafter GTC) in papers such as Statistical Evaluation of Voting Rules (see also this article and Tideman’s book, Collective Decisions and Voting). GTC look at “feelings thermometer” polling data and construct three-candidate elections from it in which the “feelings thermometer” data are interpreted as the sincere utilities of the voters. They look at IRV, Plurality, Borda Count, Approval Voting, Score Voting, and some Condorcet methods, among others.

GTC evaluate the voting methods on two metrics: “Utilitarian efficiency” and “Resistance to strategy.” “Utilitarian efficiency” (E) is the fraction of elections in which the voting method yields the winner with the highest aggregate utility across all voters. “Resistance to strategy” (R) is the fraction of elections in which no group of voters can get a better outcome (in their own eyes) by voting differently, with all other votes held constant. Higher is better for both E and R.

It should be noted their R metric does not necessarily model realistic choices for voter behavior. It instead asks whether there is some group, who, if given perfect knowledge of how everyone else was voting, could change their votes in response to this knowledge, while the remainder of the electorate remains utterly ignorant to their preferences and strategies. A more precise name for R might be “post hoc cooperation resistance.” What R does do is give an upper bound for how often strategic voting can sway elections; it catches every case in which strategic voting can determine the outcome of an election at the cost of “catching” cases in which it would be utterly preposterous for strategic voting to affect elections given a realistic electorate. R can exonerate, but it cannot convict.

GTC’s metrics miss some advantages of cardinal methods

GTC find that Approval Voting does better on their E metric than any ordinal (ranked) voting method except Borda Count, and Score Voting (“normalized range”) does best of all. Plurality is predictably terrible, and IRV does worse than any Condorcet method. In GCT’s R metric, however, IRV and Condorcet-IRV do best, while Approval, Score, and Borda Count do worse than even Plurality (Borda Count does the worst of these three). However, there are several methodological choices which mean that their findings are more reliable when considering ordinal (ranked) methods than cardinal methods (Score and Approval).

With ordinal methods, how someone votes is completely insensitive to the intensity of their preferences; if my utilities for candidates A, B, and C are 10, 9, and 0 while yours are 10, 1, and 0, our ballots will look exactly the same. Under Approval Voting, however, I’ll most likely vote for both A and B whereas you’ll probably vote for only A. Since Approval and Score factor in the intensity of voter preferences, it is reasonable to expect that they would be better at avoiding extremely bad outcomes relative to mildly bad outcomes than ordinal methods. This advantage goes unnoticed by E and R; both of them only look at how often an unfortunate event occurs while ignoring the question of exactly how bad it is.

(With regards to E, GTC note “we also collected results using the other main measure in the literature (the average ratio of the winning candidate’s sum of utilities to the maximal value of the sum of utilities), but we find that the difference between the results with this measure and those with our measure (the frequency with which the candidate maximizing the sum of utilities is chosen) is not great enough to justify including both here.” Indeed, a moderate increase in Approval and Score’s utilitarian efficiency would be unlikely to change one’s qualitative view of the results since they do exceptionally well in this regard anyway.)

GTC’s strategic assumptions

A greater issue is the choice of voter strategy in Approval Voting. With ordinal voting methods, there is exactly one honest way to vote, so it is entirely reasonable to use it as a baseline of voter behavior. With Approval Voting, there are two honest ways to vote in a three-candidate election: Either you vote for only your first choice, or you vote for both your first and second choices. Furthermore, it is always strategically optimal to vote in one of these honest ways; it never helps to vote for your least favorite or against your favorite. Insofar as people dislike strategic voting because they dislike dishonesty, there is absolutely nothing wrong with Approval Voting; all strategic voting under ordinal voting methods is necessarily dishonest, but all strategic voting under Approval Voting is honest.

The question under Approval Voting (with three candidates) is, “Should I vote for my second choice?” The rigorous answer: If I think my second choice is x times more likely to get into a tie with my favorite than with my least favorite, then it is optimal for me to vote for her if and only if her utility is at least x/(1+x) (using a normalized utility scale where my favorite has 1 utility and my least favorite has 0 utility). For example, if I think a tie between my first and second choices is half as likely as a tie between my second and last choices (x = ½) I should vote for my second choice if she’s at least a third as good as my first choice. The less rigorous version: If my favorite is a frontrunner and my last choice isn’t I should probably bullet vote, and if my last choice is a frontrunner and my first choice isn’t then I probably should vote for my second choice.

If there is a coalition that could change the election by changing strategies, it would have to be either by dropping the second vote to get their favorite elected or by adding a second vote to get their second-favorite elected. In the former case, that would only be possible if their two favorites were front-runners, in which case few voters would even start with the “vote top-2” strategy; and in the latter case, that would only be possible if their favorite was not a front-runner, in which case few voters would bullet vote. The exception, of course, is when all three candidates are similarly viable — we’ll get to this in a moment.

GTC assume that voters will only vote for a second choice candidate whose utility is closer to that of their first choice than to their last choice or is exactly in the middle. Is this ever a reasonable assumption? Yes. Specifically, this is strategically optimal when their first and last choices are equally likely to win — most realistically, when voters have absolutely no information regarding each candidate’s chances of winning. Contrast this with the assumption behind GTC’s R metric — that a group of voters has perfect information about how everyone else is voting. These assumptions are incompatible — when GTC’s assumption of voter behavior is valid there is no “horse race” information available so their R metric is irrelevant, and when their R metric is relevant there is excellent “horse race” information available so their model of voter behavior is unrealistic. The case where all three candidates poll equally well is effectively the same as the “no information” case; either way, strategic voters lack the information that some candidates are more viable than others and therefore are unable to strategize effectively.

Where GTC’s simulations show that many coalitions would have post hoc used a different strategy, they demonstrate that the simulations are based on unrealistic assumptions of voter strategy. This isn’t a defect of the paper, it’s one of the core findings. Still, learning that we shouldn’t expect voters to vote for their second choice if and only if that candidate’s perceived quality is at least as close to that of their first choice as that of their last choice lacks the significance of learning that we should expect dishonest voting under Borda Count and many Condorcet methods. It would have been nice to see results for Approval Voting given more realistic baselines of voter behavior.

The fact that GTC’s baseline strategic assumptions are unrealistic is not the entirety of their findings regarding Approval Voting. While R is irrelevant in the no information case, E is still important — and again, Approval Voting outperforms everything but Score and Borda. Unfortunately, this is about as far as we can go; in the high information case voters would behave differently, so GTC’s findings of high utilitarian efficiency need not hold up. (Presumably, Approval Voting’s E would fall to that of the Condorcet methods since Approval elects Condorcet winners when voters are strategic and perfectly informed.)

Another difference between GTC’s strategic resistance findings for Approval Voting and for ordinal methods involves the likelihood that a strategic coalition would form. Under ordinal methods, the most passionate of voters can work to change an election post hoc. But under Approval Voting, any group that would vote differently to sway the outcome would need to be a coalition of the indifferent. Suppose your utilities for candidates A, B, and C are 10, 9, and 0. Under Borda or IRV you could deviate from sincerely voting A > B > C and attempt a compromising strategy of B > A > C, which you would very much like to do if that would prevent C from winning. Under Approval Voting no such deviation is possible since (under GTC’s assumptions) you’re voting for B (in addition to A) anyway. While it would be possible to switch to bullet voting for A so that she defeats B, this is far less appealing (and therefore less likely to occur in practice) since it would result in a gain of 1 utility rather than 9.

Conclusions

These issues with GTC’s methodology have little bearing on their comparisons between ordinal voting methods. This paper does provide strong grounds for arguing (for instance) that IRV offers fewer incentives for dishonest voting than Coombs, Borda Count, or Minimax, at the cost of electing a suboptimal candidate the most frequently when voters are honest. That is because, in the context of ordinal methods, R can be interpreted as “What fraction of the time is there no incentive whatsoever for any voter to vote dishonestly?”, which people may consider to be intrinsically important when evaluating different voting methods. This interpretation does not carry over to cardinal methods (Approval and Score) where there are multiple honest ways to fill out a ballot and it is never strategically advantageous to show greater support for B than for A if you like A more.

In conclusion, while GTC provide a good analysis of various ordinal voting methods, their methodological choices consistently lead to an underestimation of Approval Voting.

Having E and R only look at the frequency of adverse outcomes while ignoring their intensity may not favor any ordinal method over any other, but it does favor ordinal methods over cardinal methods since the latter takes the strength of voter preferences into account (in the case of Approval Voting this happens indirectly through choice of strategy).
Having R not look at the strengths of the opinions of the voters who might alter outcomes with post hoc strategic voting also serves to underestimate Approval Voting since in Approval Voting, unlike ordinal methods, only the voters who care the least have the opportunity to switch to such strategies.
GTC use a model of voter behavior under Approval which is incompatible with the perfect information assumption that underlies their R metric. Approval Voting’s “resistance to strategy” would be higher if another honest strategy, which gave some consideration to candidate viability, was used instead.

GTC’s conclusions about how often there’s an incentive to vote dishonestly under various ordinal voting methods and how often any voting method elects an optimal winner in a zero-information setting are sound. I like this study quite a lot, but it should never be used to assert that Approval Voting is gameable.

Response to Green-Armytage, Tideman, and Cosman

Why Statistical Evaluation of Voting Rules underestimates Approval Voting

GTC’s metrics miss some advantages of cardinal methods

GTC’s strategic assumptions

Written by Marcus Ogren

No responses yet