Sunday, November 7, 2004

Election Analysis Errors

In an analysis
by Mark Newman
(currently down), he presents some interesting pictures including
some nice cartograms. His final graph presents a remarkable
observation: There are two Americas, but not the two we think. There
are a small but sizable number of counties (roughly 400) that went
extremely strongly for Kerry and the rest of the counties which were
less polarized to varying degrees. It's a striking result. It's not
true though.

Upon thinking about this histogram, I couldn't imagine how this could
happen. Was this some sort of odd gerrymandering? Almost no
statistics on this scale ever show the kind of dramatic "edge". It's
not impossible, just unlikely. I skimmed through the USA Today data
and couldn't find any of the 400 "super-Kerry" counties. So, since I
couldn't find the raw data anywhere, I did what Dr. Newman presumably
did: I collated the data from the href=>USA
Today site. Now, I don't know what tools he used, but I wrote a
couple of perl scripts to reformat the data for analysis. I ran it
through my scripts and I got a very similar result to Dr. Newman's,
though not quite the same: 362 counties voted over 98% for Kerry.
Wow. So, I decided to look at some of them, since I hadn't been able
to find one before. One of the counties was that of "Cape, NJ". I
went to the USA Today site to check out this county, to find it
doesn't exist. But, there is a "Cape May, NJ". Aha, there are 362
counties in the US with spaces in the names! Further, the balance of
those between 362 and his 400 is probably due to another data anlysis
bug caused by the fact that the numbers have commas in them.

In truth, there are no counties which voted more than 93% for Kerry,
never mind 400 which voted over 98% for Kerry, however I might wish
that to be true. There's only one voting region that went over 98%
for Bush: Glenwood Pit., ME. Maine
reports by town, not by county. There are two people there. They voted for Bush.

I'm sure this was an unintentional error (in fact, according to href=>post on MeFi he had
removed the histogram before the site went down), but I was surprised
at the extent to which it wasn't questioned. Outrageous results
shouldn't be uniformly dismissed because it's often the outrageous
that's important, but at the same time, it should be greeted with a
greater degree of skepticism.

No comments:

Post a Comment