Unlocking Those “Pesky” Polls

I make my living as a certified Six Sigma Black Belt, so statistics are an integral part of my life. Knowing what I know, I am dumbfounded that Poll results are as reliable as they appear to be – at least up to this point.

They are more accurate the closer they are taken to an election, which is logical, but this could also reflect an abandonment of political bias by pollsters at the last moment! In any event, by most ways of thinking, polls should have gone by the wayside by now (as obsolete as 8-tracks) at least in their current telephone format. Here are some of the things that make polls problematic:

Non-sampling Errors: These are errors that are not associated with the methods used to obtain a representative sample of the population. Some of the more common and egregious errors:

1) Using voting age adults or registered voters rather than likely voters (LV). Using anything other than likely voters is highly unreliable. Another question is how do you know they are likely? Using past voting historical data is one thing, but depending on the respondent to tell you is asking to be lied to!

2) *Introducing bias due to low response rate. * This refers to selected survey participants who are not available or refuse to answer. Having a high percentage of these in your sample can potentially skew the results. It is much more common of late (because of the perceived bias in the media) for many respondents to refuse to participate or even to outright lie. Also, a larger and larger percentage of the population is using answering machines to screen calls and cell phones (which are largely exempt from solicitation) over landlines.

Such instances will be magnified if there is a direct correlation with a voting segment. For example, say most Republicans refuse to cooperate in a survey–the results would be compromised. Therefore it is important to know the non-response rate for a poll and what was done about it. Be wary when it is not reported in survey results. Typically, it is not given in national polling media stories. If the non-response rate is more than 15%, here is what should be done:
• A non-response bias analysis is performed according to the National Center for Education Statistics, Standard 4-4. • This usually involves a follow-up survey to a random sample of the non-respondents using incentives to respond.
• The response rate to this follow-up survey must be 70%. I would be very surprised if any national polling organization followed this methodology given their short turnaround time.

3) *Using Stratified Sampling. * This is sampling done in an attempt to improve the trustworthiness of the poll by identifying homogeneous groups. This is done routinely by polling groups to segment Republicans and Democrats. The issue arises when the percentage breakdown of these groups does not reflect the breakdown within the population. For instance, some pollsters use an inflated number of Democrats over Republicans in their surveys this go-around because it is expected that Democratic turnout will be in excess of that in 2004. This assumption (as well as the percentage pulled out of the air) is completely subjective and can dangerously bias your results, but it is still done by some pollsters.

4) Calculating margin of error based on the percentage difference between the candidates. For example, if two candidates had survey results of 50% to 46% with a 3% margin of error, the closest the race could be would normally be 47-49, and the widest would be 53-43. That’s a -2 to 10 gap versus the intuitive 1-7 normally assumed (do the math). If, as most do, the intuitive method is used, then 3% should be transformed into 5% for accurate reading.

Sampling Error: These are sample results that differ from the actual makeup of the target population. Those errors within this group that are by chance can be mitigated by utilizing confidence level and margin of error. A 95% confidence level is an industry standard and can be arrived at by statistical formulae. Margin of error can be reduced by increasing sample size – up to a point; then, diminishing returns set in and produce only marginal improvement.

For example, 800-1000 survey participants will yield a 3% margin of error in most cases. Population size does not factor in that much, but participants have to be increased to 2000 for an incremental decrease in margin of error to 2%. The object is to be able to say you have 95% confidence that your survey results are accurate within plus or minus 3 percentage points. This is the acceptable industry standard.

On the other hand, sampling errors associated with the methods used to obtain a representative sample of the population are usually due to poorly designed surveys:

1) *The survey was not random. * This simply means that each member of the population did not have an equal chance of being chosen as every other member of the population. If your sample is not random, then your results will not mirror the population results you are after.

2) *The poll did not represent the whole population. * For example, a telephone poll that did not use random dialing to ensure unlisted phone numbers are included.

Given all of the above, is it no surprise that major polling organizations within the past week have differed from one another in the Presidential race by a spread of over 10 points! How some have managed to come close to the mark in recent primaries is one of those mysteries of science.

However, with the overabundance of factors entering into this election, including the aforementioned, I will be surprised if the final election totals are anywhere near what the pollsters project. If they are, I will suspect divination!