Sunday, November 12, 2006

Would a recount have made a difference?

A couple of days ago George Allen conceded the Virginia Senatorial race.

It was the right move. Here's a quote from his speech (emphasis mine):

"A lot of folks have been asking about the recount. Let me tell you about the recount.

I've said the people of Virginia, the owners of the government, have spoken. They've spoken in a closely divided voice. We have two 49s, but one has 49.55 and the other has 49.25, after at least so far in the canvasses. I'm aware this contest is so close that I have the legal right to ask for a recount at the taxpayers' expense. I also recognize that a recount could drag on all the way until Christmas.

It is with deep respect for the people of Virginia and to bind factions together for a positive purpose that I do not wish to cause more rancor by protracted litigation which would, in my judgment, not alter the results."

I would agree that it wouldn't have altered the results. In fact, when I first conceived of this post, I had envisioned it as a "why Allen should concede" post--little did I know how quickly he would do just that. To understand why, we need to review a little statistics theory.

Last Monday, Dalton Conley wrote a piece in the New York Times entitled The Deciding Vote. In it he explains a fundamental of "statistical dead-heat" elections.

The rub in these cases is that we could count and recount, we could examine every ballot four times over and we’d get — you guessed it — four different results. That’s the nature of large numbers — there is inherent measurement error. We’d like to think that there is a “true” answer out there, even if that answer is decided by a single vote. We so desire the certainty of thinking that there is an objective truth in elections and that a fair process will reveal it.

But even in an absolutely clean recount, there is not always a sure answer. Ever count out a large jar of pennies? And then do it again? And then have a friend do it? Do you always converge on a single number? Or do you usually just average the various results you come to? If you are like me, you probably settle on an average. The underlying notion is that each election, like those recounts of the penny jar, is more like a poll of some underlying voting population.

What this means is that the vote count in an election is not "the true" count, but rather a poll with a very large sample size, and can thus be treated as such. He goes on to offer a suggestion for determining a winner, which if not met should trigger a run-off election.

In an era of small town halls and direct democracy it might have made sense to rely on a literalist interpretation of “majority rule.” After all, every vote could really be accounted for. But in situations where millions of votes are cast, and especially where some may be suspect, what we need is a more robust sense of winning. So from the world of statistics, I am here to offer one: To win, candidates must exceed their rivals with more than 99 percent statistical certainty — a typical standard in scientific research. What does this mean in actuality? In terms of a two-candidate race in which each has attained around 50 percent of the vote, a 1 percent margin of error would be represented by 1.29 divided by the square root of the number of votes cast.
If this sounds like gobledy-gook to you, let me try to clarify it by throwing some Greek letters at you. I couldn't find any of my old Statistics texts, but the Wikipedia article is actually quite good, so I will draw from it. (For some even better statistics primers, check out Zeno and Echidne.) Let's start with some definitions (according to Wiki)

The margin of error expresses the amount of the random variation underlying a survey's results. This can be thought of as a measure of the variation one would see in reported percentages if the same poll were taken multiple times. The margin of error is just a specific 99% confidence interval, which is 2.58 standard errors on either side of the estimate.

Standard error = \sqrt{\frac{p(1-p)}{n}} ,where p is the probability (in the case of an election, it is the vote percentage. So for a dead-heat race, p=~ 0.5), and n is the sample size (total number of voters).

What does this mean? Since we are looking at a ballot count as a poll, we can use the margin of error to be the random variation we would get from multiple recounts. (The word random is important here. None of these formulas hold if the variation is due to malfeasance).

I won't try to explain where the standard error formula comes from, but I'll try to give some perspective. We can break it into two parts: the numerator and the denominator. The numerator p(1-p) has a maximum when p=0.5 (since 0 < p < 1). This means that the further you get from 50%, the smaller the standard error will be. Therefore, the standard error in a blow-out will be smaller than thatfrom a tie. Since the denominator is inversely proportional to the standard error, the standard error will get smaller as n (# of voters) gets larger. So the more voters you have, the smaller the error you get. One consequence of this is that you reach a point where your standard error is small enough that increasing the sample size gains you very little. (Check out Zeno's excellent post on sample size).

Again, I'll leave it up to the reader to look up how the confidence interval formula is derived--it's a bit beyond the scope of this post. What it means is that since the margin of error is the expected variation from sampling to sampling, we can see it as a multiple of standard errors from the results. And the higher the confidence interval, the more standard errors go into the margin of error. Another way of looking at it is that if you want to be 99% confident that a recount will fall into a certain interval around your result, that interval will need to be wider than if you only wanted to be 68% confident. According to Wiki (again, I'll let you look up the derivation if you wish)

Plus or minus 1 standard error is a 68 % confidence interval, plus or minus 2 standard errors is approximately a 95 % confidence interval, and a 99 % confidence interval is 2.58 standard errors on either side of the estimate.


Margin of error (99%) = 2.58 × \sqrt{\frac{0.5(1-0.5)}{n}} = \frac{1.29}{\sqrt{n}}

Which is the formula Dalton mentioned in his article. Anyway, I hope my condensed explanation at least helps a little to explain what those numbers mean.

Now, on to the Virginia race. The total votes cast, n=2,338,111 (F0r simplicity, I'll be ignoring the Independent candidate Parker and rounding out to p=0.5, so as to use the above formula.) therefore the margin of error is 0.08% which comes out to 1972.5 votes. That means that we can be 99% sure that a recount of Allen's votes will be +/- 1972.5 votes of what it was before. The actual vote count difference between Allen and Webb was 7231 votes--well outside the margin of error. 7231 votes corresponds to a confidence interval of 9.5 standard errors. Allen could've spent the rest of his life recounting the votes and not expected to alter the results. He was absolutely right to concede.

No comments: