FairVote’s comparison of single-winner voting methods

What they get right, what they get wrong, and what they leave out

28 min readApr 5, 2024

FairVote, the most prominent organization supporting Ranked Choice Voting (RCV — which I’ll call Instant Runoff Voting or IRV since RCV is an umbrella term) has an article comparing single-winner voting methods. Here’s their comparison table:

We’ll go through FairVote’s analysis line by line. But first, here’s the TL;DR:

How I’d evaluate voting methods based on FairVote’s criteria (plus one more)

Well-tested in government elections

On this point FairVote is objectively correct: IRV (RCV), Plurality, and Plurality Top 2 (they call it two-round runoff, but I’ll write Plurality Top 2 to distinguish it from St. Louis’ two-round system that uses Approval Voting in the first round) have received far more use in governmental elections than the other voting methods on this chart. They should have acknowledged the uses of single-winner Approval Voting in St. Louis and Fargo, but that’s not my main disagreement with FairVote here.

Instead, my big problem with this row is the use of colors. As we all know, green = good and red = bad. But there are big advantages to adopting a less-used voting method! If you care more about national politics than local politics, a local adoption of STAR or Condorcet will provide more useful information about which voting methods should be adopted nationwide than another local adoption of a voting method like IRV which FairVote describes as a “known quantity”. And this is an advantage that STAR and Condorcet have because they’re less used. The colors on the chart imply that IRV is flat-out better in this regard — and this just isn’t the right way to interpret differences in how much voting methods have been used.

Another consideration: being well-tested is only nice insofar as the outcomes of those tests have been unproblematic.

IRV’s track record is far from clean. In the 2021 NYC mayoral race, about 135,000 test ballots were accidentally included in the tabulation. When a 2022 Alameda County election was tabulated improperly, it took so long to catch the problem that the wrong winner got certified. Both problems were attributed to human error, but attributing failures to human error is not the way to ensure that complex and important systems are reliable. Humans are by nature erratic and error-prone. Safety in the aviation industry has been achieved by not blaming “human error” and instead looking beyond the person who made a mistake to find the risk factors in the system that allowed a single slip to have disastrous repercussions. In the case of these elections, the most obvious risk factor is the use of an unnecessarily complicated voting method which has more things that can go wrong and makes it difficult to catch problems when they occur.

(A final point that I’ll confine to a parenthetical: FairVote’s claim that “Condorcet methods, score, and STAR voting have never been used in a public election for government office” is stretching the truth. STAR Voting has been used by the Independent Party of Oregon to determine their presidential and secretary of state nominations; I guess FairVote doesn’t consider partisan primaries to be “public” elections.)

Resistance to strategic voting

Here, FairVote considers IRV to be better than everything else. Here’s what they have to say about IRV and strategic voting:

RCV is most resistant to strategic manipulation and immune to the most common strategies: bullet-voting and burying. It is immune to bullet-voting because it satisfies a criterion known as later-no-harm, which means that ranking an additional choice on the ballot doesn’t hurt the chances that an earlier choice will be elected. RCV is vulnerable to compromising in rare circumstances, according to James Green-Armytage’s statistical analysis.
Because of its non-monotonic nature, RCV could be vulnerable to the push-over strategy in certain cases, but that strategy is risky and difficult to pull off in a political election because it requires denying support to a voter’s preferred candidate. Indeed, there is no evidence of voters employing a push-over strategy in real-world elections. As such, strategic voting is not a concern in jurisdictions and among voters that use RCV.

I agree with many of the main points here. Neither bullet voting nor burial (see the FairVote article for the definitions) makes sense under IRV, and I wouldn’t advise voters to try using a pushover strategy very often. That said, I do have significant disagreements.

First, FairVote frames strategic voting as being purely negative with phrases like “vulnerable to compromising”. Typically, we say that something is vulnerable if it can be harmed by it — but strategic voting usually doesn’t harm IRV. Instead, strategic voting ameliorates many of IRV’s weaknesses. Strategic voting is usually socially beneficial.

Second, FairVote links exactly one study to justify their claims: the one by Green-Armytage. I have written about this methodology previously:

It should be noted their R metric [for resistance to strategy] does not necessarily model realistic choices for voter behavior. It instead asks whether there is some group, who, if given perfect knowledge of how everyone else was voting, could change their votes in response to this knowledge, while the remainder of the electorate remains utterly ignorant to their preferences and strategies. A more precise name for R might be “post hoc cooperation resistance.” What R does do is give an upper bound for how often strategic voting can sway elections; it catches every case in which strategic voting can determine the outcome of an election at the cost of “catching” cases in which it would be utterly preposterous for strategic voting to affect elections given a realistic electorate.

According to this metric, IRV does do exceptionally well. But the question of how often it’s mathematically possible for a coalition of voters with perfect internal coordination and perfect knowledge of how everyone is voting to change the outcome of an election is very far removed from the real world. It’s still an interesting question and a rigorous study — I don’t blame FairVote for citing it. Instead, FairVote’s mistake is letting this study be the final word.

There are other approaches to investigating strategic voting that account for real-world considerations such as coordination being non-trivial and people having a less-than-perfect knowledge of how everyone else is voting. Eggers and Nowacki (2024) account for the lack of voter omniscience by treating individual voters as having uncertainty over how everyone else is voting and asking how often there is an insincere ballot that outperforms the honest ballot for them, in expectation. Here’s how they describe their results:

We find that, when beliefs are precise and other voters are expected to vote sincerely, more voters would benefit from voting strategically in IRV than in plurality (contrary to what advocates suggest). The anticipated benefit for these voters is small, however, and for the average voter the benefit of taking strategy into account is many times larger in plurality than IRV — especially when beliefs are imprecise and/or voters expect other voters to behave strategically.

This is a considerably more nuanced view than one gets just by skimming the Green-Armytage study. The results of these two studies may seem contradictory; how could Green-Armytage have found that it’s possible for strategic voting to be effective more often under Plurality than IRV, but Eggers and Nowacki have found that it’s more often optimal to vote insincerely under IRV? The answer lies in considering voter uncertainty: Strategic voting usually won’t end up mattering under IRV. But the question facing a strategic voter is, “Conditional on my strategy mattering, does a particular insincere ballot serve me better than a sincere ballot?” And the answer is often that an insincere ballot is more effective in expectation.

(It’s also worth noting that Plurality is terrible when it comes to strategic voting, as is obvious to anyone who has every who liked a third-party candidate more than a major-party candidate in a US election.)

Eggers and Nowacki have my favorite methodology of anyone who has studied strategic voting. Their approach has its disadvantages; I think it assumes much greater competence at strategic voting than is realistic, but the assumptions underlying their methodology are a lot more reasonable than Green-Armytage’s. It’s still a terrific study. The unfortunate thing is that Eggers and Nowacki only considered IRV and Plurality — and we’re interested in a lot of other methods.

My own approach to strategic voting has been to take some strategies that sometimes involve casting an insincere ballot and see how they compare to always voting sincerely, from the perspective of the voter casting the ballot. This avoids counterintuitive strategies that I don’t expect many voters to think of (like pushover strategies in IRV). (The disadvantage of this approach is that it also has to ignore counterintuitive strategies that I didn’t think of; my approach, unlike Eggers and Nowicki’s, can’t rule out the possibility that there’s some incredible strategy that never occurred to me.)

Here’s what I’ve found when evaluating the effectiveness of dishonest strategies in different voting methods:

Here, FB stands for favorite betrayal (what FairVote calls a compromising strategy) and PO stands for pushover. The +25% for favorite betrayal under Plurality means that, on average, voting in a manner that sometimes involves voting for someone other than the voter’s sincere favorite does 25% more to steer outcomes in the desired direction than always voting sincerely; casting four ballots while being willing to vote for someone other than your favorite is as effective (in expectation) as casting five sincere ballots. My findings agree with FairVote on a key point: IRV rewards dishonest voting far less than Plurality. However, voting dishonestly seems significantly more effective under IRV than it is under Approval, Approval Top 2, or STAR.

The reason I don’t include Condorcet methods is that strategic voting under a Condorcet method is vastly more difficult than under any of the other methods I consider, so devising strategies for Condorcet methods that can plausibly be effective is also quite difficult. Additionally, unlike all the other methods we consider, strategic voting under Condorcet is reliant upon the coordination of large numbers of voters, where if too few voters coordinate strategic voting is more likely to backfire than to succeed. Realistically, Condorcet does more to make strategic voting ineffective than any other voting method we’re considering.

Let’s turn to FairVote's claims about other voting methods:

In contrast, strategic voting in plurality methods is quite common, as supporters of minor candidates often strategically “compromise” to vote for a front-runner.
Two-round runoff reduces much of the incentive to compromise, but not entirely, especially in crowded fields.
Approval and score voting are highly vulnerable to bullet-voting, compromising, and burying strategies.
STAR voting partially mitigates the bullet-voting incentives inherent to approval and score voting, but it is still somewhat vulnerable to the tactic. Additionally, STAR voting is vulnerable to burying, in which voters attempt to ensure a perceived strong competitor does not advance to the final round.
Condorcet voting methods are vulnerable to burying and other strategies.

FairVote is entirely correct about Plurality and Plurality Top 2. It’s the other voting methods where their analysis goes wrong.

A basic question: what does it even mean for a voting method to be “vulnerable to bullet voting”? Here’s their definition of bullet voting:

Bullet voting: insincerely expressing a preference for only a single candidate to increase that candidate’s chance of victory. This strategy applies to any degree of insincere preference truncation, such as expressing a preference for two candidates when one sincerely prefers three.

Let’s consider this more concretely. Suppose we’re using Approval Voting and there are three candidates. I think Alice is great, but Bob and Carol are merely good. Obviously it’s better for me to only vote for Alice than to vote for all three of them. But is this, by FairVote’s standards, “bullet voting”? What does it even mean to “sincerely prefer three candidates”? Suppose there’s a fourth candidate, Dave, who I dislike and I know has no chance of winning. Is it now “bullet voting” to only vote for Alice? Doing so doesn’t give me an unfair advantage, it just means I’m not making my ballot irrelevant by voting for every single candidate who has a chance of winning. Claiming that Approval Voting is “highly vulnerable” to me expressing my preferences in such a manner seems absurd.

FairVote goes wrong by insisting on an unclear double standard for sincerity. Under a ranked voting method like IRV there is only one sincere way to vote if you rank all the candidates: you rank your sincere first choice first, your sincere second choice second, etc. A ballot is insincere if and only if you prefer some candidate A to some other candidate B but you rank B ahead of A on your ballot. This is the standard they should (but don’t) use for methods like Approval Voting, where a ballot would be insincere if you prefer A to B but vote for B and not for A.

Let’s consider a more tangible question: Are voters better off if they only vote for their favorite candidate no matter what, or if they take their more nuanced preferences into account and often support multiple candidates? For all of these voting methods in which voters are allowed to support multiple candidates, this kind of bullet voting is self-defeating:

From Wolk et al. (2023) (I’m a coauthor). Positive numbers mean a strategy is incentivized, negative numbers mean a strategy is disincentivized. Bullet voting is extremely ineffective, and none of the dishonest strategies were effective under Smith//Minimax, the Condorcet method we tested.

These data also contradict FairVote’s claim that burial is effective under STAR Voting and Condorcet methods.

FairVote’s allegations regarding compromise and burial strategies have the same flaw as their allegations regarding bullet voting: for cardinal methods (Approval, Approval Top 2, and STAR) it’s difficult to even define when a voting method is “vulnerable” to one of them. (FairVote might also think of the example with Alice, Bob, Carol, and Dave as an example of burial being effective, for instance.)

Okay, so is there any argument to be made that strategic voting is problematic in Approval, Approval Top 2, Score, or STAR Voting? My answer to this question is yes. The problem isn’t that these methods are vulnerable to strategic voting, it’s that thinking about strategy is useful for voters, such that voters who consider candidate viability can wield more influence than those who don’t. It’s a matter of strategic straightforwardness: Ideally, voters would only need to take what they think about the candidates into account when voting, not what everyone else thinks. This, in my view, is the greatest weakness of Approval Voting: voters who don’t know which candidates are viable risk throwing all their influence away by voting for none (or all) of the viable candidates.

All things considered, FairVote gets strategic voting backward. It’s not about methods being vulnerable to one strategy or another, it’s about the relative importance, from a voter’s perspective, of voting strategically. (It’s also worth noting that some voting methods really are vulnerable to strategic voting — but these are methods that hardly anyone advocates.) Here, Approval, Score, and Plurality are the worst, Condorcet methods are by far the best, and IRV, Approval Top 2, and STAR are in the middle. (See strategic straightforwardness for my justifications for these claims.)

Resistance to “spoilers”

Here’s how FairVote describes this criterion:

How well does the method prevent a minor candidate from causing a similar front-runner candidate to lose due to vote-splitting? Voting methods are resistant to “spoilers” if adding or removing candidates who are similar to front-runner candidates does not change the winner. Our spoiler analysis is closely related to the Independence of Irrelevant Alternatives criterion from Arrow’s Theorem and the Independence of Clones criterion.

And here’s what they have to say about IRV in this context:

RCV is highly resistant to spoilers because it satisfies both the Independence of Irrelevant Alternatives and Independence of Clones criteria. In practice, RCV prevents spoilers because voters who vote for a minor candidate have the opportunity to mark a similar front-runner candidate as a backup choice.

FairVote’s claim that IRV satisfies Independence of Irrelevant Alternatives is false. In the 2022 special election for US House in Alaska (that was conducted using IRV), Nick Begich would have defeated Mary Peltola head-to-head — but he lost the election to Peltola because an “irrelevant” candidate, Sarah Palin, was also in the race. IRV does a lot better in the presence of potential “spoiler” candidates than Plurality, but it’s not “highly resistant”.

In truth, no voting method is completely immune to spoilers. There are two reasons for this. First, Arrow’s Theorem says that no reasonable voting can satisfy Independence of Irrelevant Alternatives; there will always be a possible situation in which A beats B, but if C joins the race then B will be elected. (While Arrow’s theorem doesn’t apply to cardinal voting methods like Approval, strategic considerations mean that spoilers are possible there anyway.) Second, there will also be some voters who only vote for a single candidate regardless of the strategic incentives at play; 32% of people vote for a single candidate in the median IRV election despite the lack of a strategic incentive.

(Another possibility with spoilers under IRV that FairVote doesn’t mention: many jurisdictions only allow voters to rank 3–5 candidates. In such places, voters may not have enough space to rank all the candidates, so spoilers may function similarly to Plurality by causing exhausted ballots.)

Here’s what they have to say about other voting methods.

Plurality voting is highly vulnerable to spoiler candidates.
Two-Round runoff is resistant to many but not all spoilers. For example, a spoiler effect could occur between the third-place candidate and a lower-place finisher with a similar platform, preventing either candidate from earning a place in the runoff.
Both approval voting and score voting are more resistant to spoilers than plurality voting because voters can give the front-runner they like best the top score to prevent them from being “spoiled.” However, the expectation that voters will behave in this fashion depends on three assumptions, which are not always true.
First, voters need to know who the front-runners are, so they require access to accurate polling data in advance of the runoff. Second, there must only be two clear frontrunners; otherwise the question of how best to vote to avoid spoilers is further complicated. Third, voters must be comfortable insincerely giving a front-runner the same score as their actual favorite. If any of these assumptions are not true, the spoiler effect remains.
STAR voting is more resistant to spoilers than plurality voting, approval, or score voting but is still vulnerable to spoilers due to its susceptibility to strategic voting in the form of “burying”.

I broadly agree with FairVote about how all of these methods stack up against one another in this regard, and they are entirely correct about Plurality and Plurality Top 2. However, their explanations of other methods tend to be wrong and/or confusing.

Let’s consider what a “spoiled” election can look like under each voting method:

Plurality: Two or more candidates split the vote, causing a very different candidate to win. This can happen even if there’s a majority faction which constitutes a supermajority and even if “spoiler” candidates have almost no support.
Plurality Top 2: Three or more candidates split the vote such that every candidate who makes it to the runoff is from a minority faction. This has been a major concern in California.
IRV: There is a broadly popular candidate who would beat everyone else head-to-head. A divisive candidate peels off most of his first-choice support, causing him to be eliminated before the final round. The divisive candidate loses in the final round on account of being divisive. This is what happened in Alaska. (Plurality Top 2 performs basically the same as IRV in this scenario.)
Approval: On the level of an individual voter there are two ways for the addition of a candidate C to serve as a spoiler, such that instead of voting for A but not B the voter provides equal support to A and B. First, the voter might strongly prefer C to both A and B, deeming it more valuable to give C an edge over A than to prevent B from winning, and only vote for C. Second, a voter may think that C is terrible and must be stopped at all costs, causing them to vote for B as well as A.
At the level of an entire electorate, the two kinds of voters will partially cancel one another out, and this cancellation is a reason why Approval is more spoiler-resistant than Plurality even in the absence of strategic voting. A second reason is that, on the level of a single voter, a spoiler just needs to be slightly preferred to one’s favorite candidate to take a vote away under Plurality. Under Approval, the voter needs to have an extreme opinion of the spoiler candidate (relative to the others) for it to take away a vote. (For a strategy-centric way of looking at spoilers in some Approval elections, see the chicken dilemma.) Score is similar.
Approval Top 2: The possibility of spoilers is about the same under Approval Top 2 as under Approval Voting without a runoff except for two differences. First, there must be at least three viable candidates (potentially including the spoiler). If there were only two viable candidates, these candidates would be guaranteed to reach the runoff, making a spoiler effect impossible. Second, there can be a situation akin to spoilers in IRV if the candidate with the second-most votes in the first round would win the runoff. In this case, if an additional candidate entered the race and got enough votes to make it to the runoff but lost the runoff, that candidate would be a spoiler. This is substantially less likely than under IRV, however. The number of votes two candidates receive in an Approval race is a better proxy for how they’d fare head-to-head than the number of voters they receive in a Plurality race. Each individual round of IRV tabulation is like a Plurality race, so IRV is much more likely to eliminate a broadly popular candidate early in tabulation.
STAR: STAR is similar to Approval Top 2, but with two differences that make it less susceptible to spoilers and one that makes it more susceptible. The difference that makes it more susceptible is that it’s sometimes worthwhile to risk abstaining in a runoff in order to have a stronger voice in the scoring phase or another possible runoff. Suppose you’re planning to vote A:5, B:4, C:3, D:1, E:0. You think B is very nearly as good as A. Another candidate, F joins the race, and you think F is somewhere between B and C. Now it’s reasonable for you to vote A:5, B:5, F:4, C:3, D:1, E:0 — F’s entrance into the race means you’re forgoing a voice in a possible A vs. B runoff. You don’t actually care about this very much (you’re only voting like this because you consider the difference between A and B to the minor), but it’s still a way for F to be a spoiler that wouldn’t happen in Approval Top 2.
STAR’s first advantage over Approval Top 2 is that it requires less substantial changes to your ballot to distinguish between candidates. Partly reusing a previous example, maybe I’d vote Alice: 5, Bob: 1, Carol: 0 with three candidates under STAR and only vote for Alice under Approval Top 2, but I’d vote Alice: 5, Bob: 4, Carol: 3, Dave: 0 with four candidates under STAR and vote for everyone but Dave under Approval Top 2. With STAR, Dave’s entrance into the race still means that my ballot differentiates less between the first three candidates, but I’m still indicating more of a difference between them than I am under Approval Top 2. The second advantage of STAR is that, just as votes under Approval are a better proxy for success in a runoff than votes under Plurality, scores are a better proxy than votes under either.
Condorcet: FairVote rightly gives Condorcet top marks here, so I won’t bother with an example (which would have to be significantly more complicated than those for other methods).

Two takeaways: First, some methods require more convoluted circumstances than others for a candidate to play a “spoiler” role (at least if we ignore the fact that many people will vote for only a single candidate regardless of strategic incentives). Second, some voting methods are susceptible to being spoiled in more severe ways. Under Plurality and Plurality Top 2, a candidate who’s liked by 80% of voters can lose to someone who’s liked by 20% of voters if enough other candidates split the vote. With Approval, Approval Top 2, Score, and STAR, spoilers only tend to happen when the new and the old winners are relatively similar. IRV lies between these extremes; it’s immune to the most egregious forms of vote splitting that can occur under Plurality, but spoilers can easily determine which party wins a seat, as happened in Alaska.

Majority Cohesion

Here, FairVote considers the majority criterion and the mutual majority criterion. A voting method passes the majority criterion if, whenever a candidate is the first choice of an outright majority of voters, that candidate is guaranteed to be elected. A voting method passes the mutual majority criterion if, whenever there is a set of candidates such that an outright majority of voters prefers every single candidate in that set to every candidate outside that set, it is guaranteed that the winner will be from this set.

FairVote’s comments regarding which methods do and do not pass these criteria are mostly correct. (They don’t acknowledge that RCV doesn’t pass the mutual majority criterion when voters are limited in how many candidates they’re allowed to rank, but this is a bit of a nitpick.) However, they provide no arguments regarding why they are important (their statement, “For democracy to flourish, voting methods must elect candidates preferred by a majority of voters” is a bald assertion for which they offer no support). In fact, it’s not obvious that passing these criteria is desirable. Suppose a divisive candidate is loved by 51% of voters and hated by 49%, and a unifying candidate is well-respected by everyone. A voting method that passes either criterion must elect the divisive candidate.

Moreover, the scenario described by the mutual majority criterion is the sort of thing that only occurs in thought experiments and landslide elections. While a substantial majority of Democrats prefer both Joe Biden and Bernie Sanders to Donald Trump (for example), there are still some with the preferences Biden > Trump > Sanders or Sanders > Trump > Biden. If there were enough Democratic voters that an outright majority of voters had Trump as their last choice, an election would be an outright blowout once the voters with “weird” preferences were taken into account. (Okay, it’s not a blowout for every voting method; Trump could still win under Plurality in such a three-candidate election.)

In short, FairVote’s analysis on this point provides a more-or-less accurate answer to a question of dubious relevance.

Condorcet Efficiency

How often does the method elect “beats-all” candidates, — those who would win head-to-head against every other candidate in the race, when such a candidate exists? Methods that always elect the “beats-all” winner when one exists meet the Condorcet Criterion.

FairVote acknowledges that IRV doesn’t pass the Condorcet criterion, and then rightly pivots to the question of how often IRV elects the Condorcet winner, noting that ballot data from IRV elections suggests it outperforms Plurality Top 2 (as is also what you’d expect based on common sense).

However, FairVote does not consider the question of how often other voting methods elect the Condorcet winner. Instead, FairVote focuses on two questions when assessing other voting methods:

If there is a candidate who would beat everyone else head-to-head (a “Condorcet winner”), is the voting method guaranteed to elect that candidate?
Is it mathematically possible for the voting method to elect a candidate who would lose to everyone else head-to-head, regardless of how unlikely such a scenario might be?

By looking at IRV and Plurality Top 2’s Condorcet failures quantitatively (which is the right way to analyze the question) and taking a pass/fail approach for other voting methods (which isn’t), FairVote is applying a double standard. Fortunately, Richard Darlington has run simulations to compare the rates at which many of these voting methods fail to elect Condorcet winners

Here, CC = Condorcet criterion and Hare = IRV = RCV. “Disagreements with CC” shows how many trials (out of 100,000) featured a failure of the Condorcet criterion for each voting method. We’re not going to worry about the other columns.

Darlington doesn’t consider Score or Approval Top 2 (which would necessarily do better than Approval), but we can see that all of these voting methods outperform IRV in this regard, with Approval being the weakest of them.

An alert reader may have noticed something puzzling. FairVote notes that IRV has elected the Condorcet winner in 99.6% of real-world elections for which we have sufficient data. In Darlington’s simulations, IRV elects the Condorcet winner less than 55% of the time (100k-45k≈55k).

This discrepancy can be explained by Darlington’s modeling assumptions. The elections he models are far more competitive than most IRV elections; Darlington’s elections have ten candidates each, and these aren’t like the real-world ten-candidate elections where half the candidates receive less than 1% first-choice support; not all of Darlington’s candidates are viable, but Darlington isn’t modeling nobodies that almost everyone ignores in his ten-candidate elections.

Do these unrealistic modeling assumptions mean that Darlington’s results understate the effectiveness of IRV? If you’re wondering what percent of elections IRV fails to elect the Condorcet winner, the answer is clearly yes. If you’re wondering how different voting methods compare in how frequently they fail to elect the Condorcet winner, the answer, I think, is mostly no. Having more competitive elections means that every voting method (except Condorcet methods) will elect the Condorcet winner less frequently. I do think it’s plausible that Darlington’s findings overstate how well Approval Voting (without a runoff) does in comparison to IRV since Approval Voting is among the worst voting methods at electing a Condorcet winner in less contested elections in which there are exactly two candidates who have significant support. But I mostly expect Darlington’s findings to generalize pretty well to more realistic models, and the finding that IRV has some of the worst Condorcet efficiency in highly contested elections is interesting in its own right.

Interlude: Evaluating Winners

The last three criteria — resistance to spoilers, majority cohesion, and Condorcet efficiency — all appear to be aimed at assessing how well voting methods do at electing the most representative winners possible. But there’s a more productive approach to answering this question than evaluating methods on a few pass/fail criteria and armchair philosophizing about spoilers: we can consider it directly with computer simulations.

Here’s how Jameson Quinn describes Voter Satisfaction Efficiency:

Voter Satisfaction Efficiency (VSE) is a way of measuring the outcome quality a voting method will give. It relies on making various assumptions about what kind of voters and candidates are likely to occur, then running large numbers of elections that are simulated using those assumptions, and measuring how satisfied the average simulated voter is by the outcome in each election.
VSE is expressed as a percentage. A voting method which could read voters minds and always pick the candidate that would lead to the highest average happiness would have a VSE of 100%. A method which picked a candidate completely at random would have a VSE of 0%.

Results:

From Wolk et al. (2023). The “viability-aware” strategy for Smith/Minimax, the Condorcet method tested, isn’t actually beneficial for the voter using it, so ignore the yellow circle on that line.

These simulations show STAR Voting and Condorcet methods at the top of the pack, with IRV only ahead of Plurality Top 2 and Plurality. (There are, by the way, a lot of other such simulations out there; here’s a list of them.)

Using VSE to evaluate how good the winners are under a voting method has several advantages over FairVote’s approach of identifying a few considerations that seem related and considering them without the benefit of numbers.

VSE is quantitative. It answers the question of how much better one voting method performs than another, and lets us see that the difference between Approval and IRV is smaller than the difference between IRV and Plurality.
VSE isn’t susceptible to cherry-picking. FairVote considered resistance to spoilers, majority cohesion, and Condorcet efficiency. They did not consider susceptibility to the center squeeze or the equality criterion — measures on which IRV performs much worse than the alternatives. VSE naturally takes all such factors that influence winner quality into account without the need for a human to add them in one at a time.
How important is Condorcet efficiency compared to majority cohesion? FairVote doesn’t even try to answer questions like this, even though there’s no reason to assume they’re equally important. With VSE, the answers to such questions are baked in: the importance of each consideration is proportional to how much doing well on that consideration leads to voters being satisfied with outcomes.
If my beliefs about voting methods are wrong, VSE-style simulations have a better chance of changing my mind. As described above, FairVote’s approach gives enough space for pre-existing biases to work that the conclusion is essentially predetermined. But simulations can surprise you. When Jameson Quinn first ran the VSE simulations, he didn’t have STAR and Condorcet methods in mind as being the strongest. Instead, he thought methods like Majority Judgment were ideal. His simulations contradicted his initial beliefs, so he changed his mind.

Simplicity of Tabulation

I respect FairVote for including this and acknowledging that this is one of IRV’s weaknesses. There are tradeoffs in selecting a voting method, and every voting method has its weakness, or at least areas where it’s mediocre rather than good. FairVote did right to mention this straightforwardly rather than omit it on account of IRV doing poorly here.

I basically agree with FairVote’s analysis here, though I wish they’d mentioned that many Condorcet methods are batch summable while IRV isn’t. On the other hand, IRV has an advantage over Condorcet in that it’s easier to tabulate by hand.

Descriptive Representation

How well does the voting method promote the election of candidates who represent the electorate, in terms of gender, race, ethnicity, political identity, and other factors?
RCV has demonstrably improved representation for women and people of color. Research shows that RCV leads to more women and candidates of color on the ballot and in office. Additionally, candidates of color tend to do well earning second- and third-choice votes during RCV elections that go to multiple rounds of tabulation, and RCV removes the “win penalty” that could otherwise occur when multiple candidates appealing to the same constituency compete against one another.

I’ve written about the research on RCV and diverse representation previously:

In American cities that have adopted Instant Runoff Voting (single-winner Ranked Choice Voting), women and minorities have greater success than in cities that use Plurality, and their representation also tends to improve in a city after that city adopts IRV. But these observations tell us virtually nothing about how effective IRV is at causing more diverse representation. The cities that have adopted IRV have tended to be exceptionally liberal and diversity-valuing, so it’s no surprise that female and minority representation is greater in those places. And it’s not only places where IRV has been adopted that have seen more diverse representation — the phenomenon exists nationwide. In short, such trends tell us next to nothing about the efficacy of IRV; it takes a more sophisticated approach to determine causation.

To my knowledge, there have been exactly two studies on IRV and the election of female or minority candidates that could distinguish between causation and correlation. I analyzed the first of these in the article I quoted above. The second of these, which came out after I wrote the article, is the most comprehensive study to date on the subject:

While previous studies of RCV in the United States have focused on one or two cities, this study leverages nearly the full range of cities that have used RCV for at least one election since 2004. This includes 43 cities and hundreds of elections involving different offices. While some cities (e.g., San Francisco and Berkeley) have used RCV for many election cycles, others only ran one trial election using RCV or repealed it after several elections (e.g., Aspen, Colo.).

Here’s what they found:

Under IRV, non-white people seem less likely to run for mayor (as compared to white people), but more likely to run for city council. Women are less likely to run for mayor, and there doesn’t appear to be any effect with women running for city council. On balance, IRV appears ineffective at getting more diverse candidates to run. FairVote’s claim that “Research shows that RCV leads to more women and candidates of color on the ballot and in office” ignores the highest-quality research available.

What about other methods? FairVote writes:

Approval, score, STAR, and Condorcet methods are untested in practice. No evidence shows these methods would improve the diversity of our elected representatives.

I agree with FairVote’s characterization here. We have no empirical evidence to say whether they will do better, or worse, than IRV.

Compatibility with fair multi-winner elections

Does the method have an accepted version or analog method for multi-winner elections that ensures fair representation? Single-winner methods that have an analogous multi-winner method allow single-winner and multi-winner offices to appear on the same ballot in an intuitive and coherent way for the voter.

FairVote correctly notes the advantages of Single Transferable Vote (STV) over semi-proportional and non-proportional methods based on Plurality. They aren’t enthusiastic about the research on other proportional voting methods:

While some advocates have proposed proportional analogs to Condorcet, approval, score, and STAR voting, they have seen scant or non-existent use and little study or advocacy.

STV has been used in a great many governmental elections, such as for electing Australia’s Senate. By contrast, many of these other proportional methods haven’t been used at all. Sequential Proportional Approval Voting was used in Sweden in the early 20th century, but I don’t know of a single study on its use. There is a great deal of empirical research on STV that is unavailable for other methods.

However, this criterion circles back to the first one: how much a voting method has been used. FairVote treats the fact that RCV has been more used, and therefore more studied empirically, than other methods as the biggest argument for supporting RCV over Approval, STAR, or Condorcet. Basically, FairVote is saying that people should use RCV because people use RCV.

Is this being fair to FairVote? If I’m going to take a medication, I’ll be more confident that it will make me feel better instead of worse if it’s gone through rigorous testing that involved a lot of other people using it. Couldn’t we say that voting methods are similar?

No. Most proposed drugs don’t make it through the many phases of clinical trials, so knowing that a drug has been studied without it being rejected gives us a lot of information about it. By contrast, I can think of hardly any voting methods that have been tried in the real world and rejected in some analog of clinical trials. Bucklin Voting seems like the best example here, but I’m not at all confident that the reason Bucklin got repealed was that it performed poorly. The second-best example might be STV itself. Jack Santucci, the leading researcher on the history of STV, writes:

Many may not know that STV was widespread during the Progressive and New Deal periods. Reformers had formed a belief that “normal” PR could not win. So, they hooked up with the movement for nonpartisan elections, won STV in 22 cities, and then saw it repealed in all but one.

Voting methods aren’t like drugs, and there’s nothing akin to clinical trials that filter out the bad ones. The fact that voting method A has been subject to more empirical research than voting method B tells us almost nothing about whether A or B is better.

What about theoretical research? There is very little that compares STV to other proportional voting methods. There is some (for instance, by the Metric Geometry and Gerrymandering Group) that compares multi-winner districts with STV to elections with single-winner districts, but there’s no evidence that the choice to use STV instead of another proportional voting method significantly affects their findings. Aside from studies that focus on a single arbitrarily-chosen proportional voting method, I don’t think there is actually more theoretical research on STV than on other proportional voting methods. There has been a lot of analysis of Approval-based proportional voting methods in terms of pass/fail criteria. (Granted, I don’t think very highly of the pass/fail approach.) Keith Edmonds has done simulations on Score-based proportional voting methods.

I’ll conclude with some of my own research on the electoral incentives that voting methods present to candidates.

This chart compares STV to a Condorcet-based proportional voting method, STV-Minimax (and also to some other voting methods I won’t talk about). It shows that ordinary STV incentivizes candidates to care almost exclusively about voters who are relatively supportive of them, but that STV-Minimax incentivizes candidates to care a significant amount about everyone. Theoretical research comparing single-winner voting methods has shown that alternative voting methods can significantly outperform IRV. While there is nowhere near as much that compares multi-winner voting methods, this study suggests that other proportional voting methods can likewise outperform STV.