Lies, Damned Lies, and Statistics (39): Availability Bias

This is actually only about one type of availability bias: if a certain percentage of your friends are computer programmers or have red hair, you may conclude that the same percentage of a total population are computer programmers or have red hair. You’re not working with a random and representative sample – perhaps you like computer programmers or you are attracted to people with red hair – so you make do with the sample that you have, the one that is immediately available, and you extrapolate on the basis of that.

Most of the time you’re wrong to do so – as in the examples above. In some cases, however, it may be a useful shortcut that allows you to avoid the hard work of establishing a random and representative sample and gathering information from it. If you use a sample that’s not strictly random but also not biased by your own strong preferences such as friendship or attraction, it may give reasonably adequate information on the total population. If you have a reasonably large number of friends and if you couldn’t care less about their hair color, then it may be OK to use your friends as a proxy of a random sample and extrapolate the rates of each hair color to the total population.

The problem is the following: because the use of available samples is sometimes OK, we are perhaps fooled into thinking that they are OK even when they’re not. And then we come up with arguments like:

  • Smoking can’t be all that bad. I know a lot of smokers who have lived long and healthy lives.
  • It’s better to avoid groups of young black men at night, because I know a number of people who have been attacked by young black men (and I’ll forget that I’ll hardly ever hear of people not having been attacked).
  • Cats must have a special ability to fall from great heights and survive, because I’ve seen a lot of press reports about such events (and I forget that I’ll rarely read a report about a cat falling and dying).
  • Violent criminals should be locked up for life because I’m always reading newspaper articles about re-offenders (again, very unlikely that I’ll read anything about non-re-offenders).

As is clear from some of the examples above, availability bias can sometimes have consequences for human rights: it can foster racial bias, it can lead to “tough on crime” policies, etc.

More posts in this series are here.

Lies, Damned Lies, and Statistics (31): Common Problems in Opinion Polls

Opinion polls or surveys are very useful tools in human rights measurement. We can use them to measure public opinion on certain human rights violations, such as torture or gender discrimination. High levels of public approval of such rights violations may make them more common and more difficult to stop. And surveys can measure what governments don’t want to measure. Since we can’t trust oppressive governments to give accurate data on their own human rights record, surveys may fill in the blanks. Although even that won’t work if the government is so utterly totalitarian that it doesn’t allow private or international polling of its citizens, or if it has scared its citizens to such an extent that they won’t participate honestly in anonymous surveys.

But apart from physical access and respondent honesty in the most dictatorial regimes, polling in general is vulnerable to mistakes and fraud (fraud being a conscious mistake). Here’s an overview of the issues that can mess up public opinion surveys, inadvertently or not.

Wording effect

There’s the well-known problem of question wording, which I’ve discussed in detail before. Pollsters should avoid leading questions, questions that are put in such a way that they pressure people to give a certain answer, questions that are confusing or easily misinterpreted, wordy questions, questions using jargon, abbreviations or difficult terms, double or triple questions etc. Also quite common are “silly questions”, questions that don’t have meaningful or clear answers: for example “is the catholic church a force for good in the world?” What on earth can you answer to that? Depends on what elements of the church you’re talking about, what circumstances, country or even historical period you’re asking about. The answer is most likely “yes and no”, and hence useless.

The importance of wording is illustrated by the often substantial effects of small modifications in survey questions. Even the replacement of a single word by another, related word, can radically change survey results.

Of course, one often claims that biased poll questions corrupt the average survey responses, but that the overall results of the survey can still be used to learn about time trends and difference between groups. As long as you make a mistake consistently, you may still find something useful. That’s true, but no reason not to take care of wording. The same trends and differences can be seen in survey results that have been produced with correctly worded questions.

Order effect or contamination effect

Answers to questions depend on the order they’re asked in, and especially on the questions that preceded. Here’s an example:

Fox News yesterday came out with a poll that suggested that just 33 percent of registered voters favor the Democrats’ health care reform package, versus 55 percent opposed. … The Fox News numbers on health care, however, have consistently been worse for Democrats than those shown by other pollsters. (source)

The problem is not the framing of the question. This was the question: “Based on what you know about the health care reform legislation being considered right now, do you favor or oppose the plan?” Nothing wrong with that.

So how can Fox News ask a seemingly unbiased question of a seemingly unbiased sample and come up with what seems to be a biased result? The answer may have to do with the questions Fox asks before the question on health care. … the health care questions weren’t asked separately. Instead, they were questions #27-35 of their larger, national poll. … And what were some of those questions? Here are a few: … Do you think President Obama apologizes too much to the rest of the world for past U.S. policies? Do you think the Obama administration is proposing more government spending than American taxpayers can afford, or not? Do you think the size of the national debt is so large it is hurting the future of the country? … These questions run the gamut slightly leading to full-frontal Republican talking points. … A respondent who hears these questions, particularly the series of questions on the national debt, is going to be primed to react somewhat unfavorably to the mention of another big Democratic spending program like health care. And evidently, an unusually high number of them do. … when you ask biased questions first, they are infectious, potentially poisoning everything that comes below. (source)

If you want to avoid this mistake – if we can call it that (since in this case it’s quite likely to have been a “conscious mistake” aka fraud) – randomizing the question order for each respondent might help.

Similar to the order effect is the effect created by follow-up questions. It’s well-known that follow-up questions of the type “but what if…” or “would you change your mind if …” change the answers to the initial questions.

Bradley effect

The Bradley effect is a theory proposed to explain observed discrepancies between voter opinion polls and election outcomes in some U.S. government elections where a white candidate and a non-white candidate run against each other.

Contrary to the wording and order effects, this isn’t an effect created – intentionally or not – by the pollster, but by the respondents. The theory proposes that some voters tend to tell pollsters that they are undecided or likely to vote for a black candidate, and yet, on election day, vote for the white opponent. It was named after Los Angeles Mayor Tom Bradley, an African-American who lost the 1982 California governor’s race despite being ahead in voter polls going into the elections.

The probable cause of this effect is the phenomenon of social desirability bias. Some white respondents may give a certain answer for fear that, by stating their true preference, they will open themselves to criticism of racial motivation. They may feel under pressure to provide a politically correct answer. The existence of the effect is, however, disputed. (Some say the election of Obama disproves the effect, thereby making another statistical mistake).

Fatigue effect

Another effect created by the respondents rather than the pollsters is the fatigue effect. As respondents grow increasingly tired over the course of long interviews, the accuracy of their responses could decrease. They may be able to find shortcuts to shorten the interview; they may figure out a pattern (for example that only positive or only negative answers trigger follow-up questions). Or they may just give up halfway, causing incompletion bias.

However, this effect isn’t entirely due to respondents. Survey design can be at fault as well: there may be repetitive questioning (sometimes deliberately for control purposes), the survey may be too long or longer than initially promised, or the pollster may want to make his life easier and group different polls into one (which is what seems to have happened in the Fox poll mentioned above, creating an order effect – but that’s the charitable view of course). Fatigue effect may also be caused by a pollster interviewing people who don’t care much about the topic.

Sampling effect

Ideally, the sample of people who are to be interviewed for a survey should represent a fully random subset of the entire population. That means that every person in the population should have an equal chance of being included in the sample. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. That reduces the randomness of the sample, which can be seen from the fact that self-selection leads to polarized results. The size of the sample is also important. Samples that are too small typically produce biased results.

Even the determination of the total population from which the sample is taken, can lead to biased results. And yes, that has to be determined… For example, do we include inmates, illegal immigrants etc. in the population? See here for some examples of the consequences of such choices.

House effect

A house effect occurs when there are systematic differences in the way that a particular pollster’s surveys tend to lean toward one or the other party’s candidates; Rasmussen is known for that.

I probably forgot an effect or two. Fill in the blanks if you care. Go here for other posts in this series.

Lies, Damned Lies, and Statistics (28): Push Polls

Push polls are used in election campaigns, not to gather information about public opinion, but to modify public opinion in favor of a certain candidate, or – more commonly – against a certain candidate. They are called “push” polls because they intend to “push” the people polled towards a certain point of view.

Push polls are not cases of “lying with statistics” as we usually understand them, but it’s appropriate to talk about them since they are very similar to a “lying technique” that I discussed many times, namely leading questions (see here for example). The difference here is that leading questions aren’t used to manipulate poll results, but to manipulate people.

The push poll isn’t really a poll at all, since the purpose isn’t information gathering. Which is why many people don’t like the term and label it oxymoronic. A better term indeed would be advocacy telephone campaigns. A push poll  is more like a gossip campaign, a propaganda effort or telemarketing. They’re very similar to political attack ads, in the sense that they intend to smear candidates, often with little basis in facts. Compared to political ads, push polls have the “advantage” that they don’t seem to emanate from the campaign offices of one of the candidates. (Push polls are typically conducted by bogus polling agencies). Hence it’s more difficult for the recipients of the push poll to classify the “information” contained in the push poll as political propaganda. He or she is therefore more likely to believe the information. Which is of course the reason push polls are used. Also, the fact that they are presented as “polls” rather than campaign messages, makes it more likely that people listen, and as they listen more, they internalize the messages better than in the case of outright campaigning (which they often dismiss as propaganda).

Push polls usually, but not necessarily, contain lies or false rumors. They may also be limited to misleading or leading questions. For example, a push poll may ask people: “Do you think that the widespread and persistent rumors about Obama’s Muslim faith, based on his own statements, connections and acquaintances, are true?”. Some push polls may even contain some true but unpleasant facts about a candidate, and then hammer on these facts in order to change the opinions of the people being “polled”.

One infamous example of a push poll was the poll used by Bush against McCain in the Republican primaries of 2000 (insinuating that McCain had an illegitimate black child), or the poll used by McCain (fast learner!) against Obama in 2008 (alleging that Obama had ties with the PLO).

One way to distinguish legitimate polls from push polls is the sample size. The former are usually content with relatively small sample sizes (but not too small), whereas the latter typically want to “reach” as many people as possible. Push polls won’t include demographic questions about the people being polled (gender, age, etc.) since there is no intention to aggregate results, let alone aggregate by type of respondent. Another way to identify push polls is the selection of the target population: normal polls try to reach a random subset of the population; push polls are often targeted at certain types of voters, namely those likely to be swayed by negative campaigning about a certain candidate. Push polls also tend to be quite short compared to regular polls, since the purpose is to reach a maximum number of people.

Lies, Damned Lies, and Statistics (9): Too Small Sample Sizes in Surveys

So many things can go wrong in the design and execution of opinion surveys. And opinion surveys are a common tool in data gathering in the field of human rights.

As it’s often impossible (and undesirable) to question a whole population, statisticians usually select a sample from the population and ask their questions only to the people in this sample. They assume that the answers given by the people in the sample are representative of the opinions of the entire population. But that’s only the case if the sample is a fully random subset of the population – that means that every person in the population should have an equal chance of being chosen – and if the sample hasn’t been distorted by other factors such as self-selection by respondents (a common thing in internet polls) or personal bias by the statistician who selects the sample.

A sample that is too small is also not representative for the entire population. For example, if we ask 100 people if they approve or disapprove of discrimination of homosexuals, and 55 of them say they approve, we might assume that about 55% of the entire population approves. Now it could possible be that only 45% of the total population approve, but that we just happened, by chance, to interview an unusually large percentage of people who approve. For example, this may have happened because, by chance and without being aware of it, we selected the people in our sample in such a way that there are more religious conservatives in our sample than there are in society, relatively speaking.

This is the problem of sample size: the smaller the sample, the greater the influence of luck on the results we get. Asking the opinion of 100 people, and taking this as representative of millions of citizens, is like throwing a coin 10 times and assuming – after having 3 heads and 7 tails – that the probability of throwing heads is 30%. We all know that it’s not 30 but 50%. And we know this because we know that when we increase the “sample size” – i.e. when we throw more than 10 times, say a thousand times – we will have heads and tails approximately half of the time. Likewise, if we take our example of the survey on homosexuality: increasing the sample size reduces the chance that religious conservatives (or other groups) are disproportionately represented in the sample.

When analyzing survey results, the first thing to look at is the sample size, as well as the level of confidence (usually 95%) that the results are within a certain margin of error (usually + or – 5%). High levels of confidence that the results are correct within a small margin of error indicate that the sample was sufficiently large and random.