Lies, Damned Lies, and Statistics (45): Anonymity in Surveys Changes Survey Results

Whether interviewees are given anonymity or not makes a big difference in survey results:

When people are assured of anonymity, it turns out, a lot more of them will acknowledge that they have had same-sex experiences and that they don’t entirely identify as heterosexual. But it also turns out that when people are assured of anonymity, they will show significantly higher rates of anti-gay sentiment. These results suggest that recent surveys have been understating, at least to some degree, two different things: the current level of same-sex activity and the current level of opposition to gay rights. (source)

Anonymity can result in data that are closer to the truth, so it’s tempting to require it, especially in the case of surveys that may suffer from social desirability bias (surveys asking about opinions that people are reluctant to divulge because these opinions are socially unacceptable – the Bradley effect is one example). However, anonymity can also create problems. For example, it may make it difficult to avoid questioning the same people more than once.

Go here for other posts in this series.

Advertisements

Lies, Damned Lies, and Statistics (40): The Composition Effect

Take the evolution of the median wage in the US over the last decades. The trend is nearly flat and one would therefore naturally assume that there have been hardly any income gains for the average US citizen. However, some have argued that this conclusion is wrong because it ignores the composition effect. In this example, the composition of the labor force has obviously changed over the last decades, and has changed dramatically. More women and immigrants have entered the workforce and those tend to be lower income groups, especially at the moment of entry. When they enter the labor force, their incomes go up, obviously, but they bring the average and the median down. When, at the same time, the wages of white men go up, the aggregate effect may be close to zero. And yet, paradoxically, all groups have progressed. The conclusion that the average citizen did not progress would only hold if the composition of the population whose wages are compared over time had not changed.

Now, it seems to be the case that in this particular example there is really no large composition effect (see here). However, this effect is always a possibility and one should at least consider it and possibly rule it out before drawing hasty conclusions from historical time series. If you don’t do this, or don’t even try, then you may be “lying with statistics”.

More posts in this series are here.

Lies, Damned Lies, and Statistics (39): Availability Bias

This is actually only about one type of availability bias: if a certain percentage of your friends are computer programmers or have red hair, you may conclude that the same percentage of a total population are computer programmers or have red hair. You’re not working with a random and representative sample – perhaps you like computer programmers or you are attracted to people with red hair – so you make do with the sample that you have, the one that is immediately available, and you extrapolate on the basis of that.

Most of the time you’re wrong to do so – as in the examples above. In some cases, however, it may be a useful shortcut that allows you to avoid the hard work of establishing a random and representative sample and gathering information from it. If you use a sample that’s not strictly random but also not biased by your own strong preferences such as friendship or attraction, it may give reasonably adequate information on the total population. If you have a reasonably large number of friends and if you couldn’t care less about their hair color, then it may be OK to use your friends as a proxy of a random sample and extrapolate the rates of each hair color to the total population.

The problem is the following: because the use of available samples is sometimes OK, we are perhaps fooled into thinking that they are OK even when they’re not. And then we come up with arguments like:

  • Smoking can’t be all that bad. I know a lot of smokers who have lived long and healthy lives.
  • It’s better to avoid groups of young black men at night, because I know a number of people who have been attacked by young black men (and I’ll forget that I’ll hardly ever hear of people not having been attacked).
  • Cats must have a special ability to fall from great heights and survive, because I’ve seen a lot of press reports about such events (and I forget that I’ll rarely read a report about a cat falling and dying).
  • Violent criminals should be locked up for life because I’m always reading newspaper articles about re-offenders (again, very unlikely that I’ll read anything about non-re-offenders).

As is clear from some of the examples above, availability bias can sometimes have consequences for human rights: it can foster racial bias, it can lead to “tough on crime” policies, etc.

More posts in this series are here.

Lies, Damned Lies, and Statistics (38): The Base-Rate Fallacy

When judging whether people engage in discrimination it’s important to make the right comparisons. Take the example of an American company X where 98 percent of employees are white and only 2 percent are black. If you compare to (“if your base is”) the entire US population – of which about 13 percent are African American – then you’ll conclude that company X is motivated by racism in its employment decisions.

However, in cases such as these, it’s probably better to use another base rate, namely the number of applicants rather than the total population. If only 0.1 percent of job applications where from blacks, then an employment rate of 2 percent blacks actually shows that company X has favored black applicants.

The accusation of racism betrays a failure to point to the real causes of discrimination. It’s a failure to go back far enough and to think hard enough. The fact that only 0.1 percent of applicants were black – instead of the expected 13 percent – may still be due to racism, but not racism in company X. Blacks may suffer from low quality education, which results in a skill deficit among blacks, which in turn leads to a low application rate for certain jobs.

The opposite error is also quite common: people point to the number of blacks in prison, compare this to the total number of blacks, and conclude that blacks must be more attracted to crime. However, they should probably compare incarceration rates to arrest rates (blacks are arrested at higher rates because of racial profiling). And they should take into account jury bias as well.

More about racism. More posts in this series.

Lies, Damned Lies, and Statistics (37): When Surveyed, People Express Opinions They Don’t Hold

It’s been a while since the last post in this series, so here’s a recap of its purpose. This blog promotes the quantitative approach to human rights: we need to complement the traditional approaches – anecdotal, journalistic, legal, judicial etc. – with one that focuses on data, country rankings, international comparisons, catastrophe measurement, indexes etc.

Because this statistical approach is important, it’s also important to engage with measurement problems, and there are quite a few in the case of human rights. After all, you can’t measure respect for human rights like you can measure the weight or size of an object. There are huge obstacles to overcome in human rights measurement. On top of the measurement difficulties that are specific to the area of human rights, this area suffers from some of the general problems in statistics. Hence, there’s a blog series here about problems and abuse in statistics in general.

Take for example polling or surveying. A lot, but not all, information on human rights violations comes from surveys and opinion polls, and it’s therefore of the utmost importance to describe what can go wrong when designing, implementing and using surveys and polls. (Previous posts about problems in polling and surveying are here, here, here, here and here).

One interesting problem is the following:

Simply because the surveyor is asking the question, respondents believe that they should have an opinion about it. For example, researchers have shown that large minorities would respond to questions about obscure or even fictitious issues, such as providing opinions on countries that don’t exist. (source, source)

Of course, when people express opinions they don’t have, we risk drawing the wrong conclusions from surveys. We also risk that a future survey asking the same questions comes up with totally different results. Confusion guaranteed. After all, if we make up our opinions when someone asks us, and those aren’t really our opinions but rather unreflected reactions we give because of a sense of obligation, it’s unlikely that we will express the same opinion in the future.

Another reason for this effect is probably our reluctance to come across as ignorant: rather than selecting the “I don’t know/no opinion” answer, we just pick one of the other possible answers. Again a cause of distortions.

Lies, Damned Lies, and Statistics (33): The Omitted Variable Bias, Ctd.

I discussed the so-called Omitted Variable Bias before on this blog (here and here). So I suppose I can mention this other example: guess what is the correlation, on a country level, between per capita smoking rates and life expectancy rates? High smoking rates equal low life expectancy rates, right? And vice versa?

Actually, and surprisingly, the correlation goes the other way: the higher smoking rates – the more people smoke in a certain country – the longer the citizens of that country live, on average.

Why is that the case? Smoking is unhealthy and should therefore make life shorter, on average. However, people in rich countries smoke more; in poor countries they can’t afford it. And people in rich countries live longer. But they obviously don’t live longer because they smoke more but because of the simple fact they have the good luck to live in a rich country, which tends to be a country with better healthcare and the lot. If they would smoke less they would live even longer.

Why is this important? Not because I’m particularly interested in smoking rates. It’s important because it shows how easily we are fooled by simple correlations, how we imagine what correlations should be like, and how we can’t see beyond the two elements of a correlation when we’re confronted with one that goes against our intuitions. We usually assume that, in a correlation, one element should cause the other. And apart from the common mistake of switching the direction of the causation, we often forget that there can be a third element causing the two elements in the correlation (in this example, the prosperity of a country causing both high smoking rates and high life expectancy), rather than one element in the correlation causing the other.

More posts in this series are here.

Lies, Damned Lies, and Statistics (32): The Questioner Matters

I’ve discussed the role of framing before: the way in which you ask questions in surveys influences the answers you get and therefore modifies the survey results. (See here and here for instance). It happens quite often that polling organizations or media inadvertently or even deliberately frame questions in a way that will seduce people to answer the question in a particular fashion. In fact you can almost frame questions in such a way that you get any answer you want.

However, the questioner may matter just as much as the question.

Consider this fascinating new study, based on surveys in Morocco, which found that the gender of the interviewer and how that interviewer was dressed had a big impact on how respondents answered questions about their views on social policy. …

[T]his paper asks whether and how two observable interviewer characteristics, gender and gendered religious dress (hijab), affect survey responses to gender and non-gender-related questions. [T]he study finds strong evidence of interviewer response effects for both gender-related items, as well as those related to support for democracy and personal religiosity … Interviewer gender and dress affected responses to survey questions pertaining to gender, including support for women in politics and the role of Shari’a in family law, and the effects sometimes depended on the gender of the respondent. For support for gender equality in the public sphere, both male and female respondents reported less progressive attitudes to female interviewers wearing hijab than to other interviewer groups. For support for international standards of gender equality in family law, male respondents reported more liberal views to female interviewers who do not wear hijab, while female respondents reported more liberal views to female respondents, irrespective of dress. (source, source)

Other data indicate that the effect occurs in the U.S. as well. This is potentially a bigger problem than the framing effect since questions are usually public and can be verified by users of the survey results, whereas the nature of the questioner is not known to the users.

There’s an overview of some other effects here. More on the headscarf is here. More posts in this series are here.

Lies, Damned Lies, and Statistics (31): Common Problems in Opinion Polls

Opinion polls or surveys are very useful tools in human rights measurement. We can use them to measure public opinion on certain human rights violations, such as torture or gender discrimination. High levels of public approval of such rights violations may make them more common and more difficult to stop. And surveys can measure what governments don’t want to measure. Since we can’t trust oppressive governments to give accurate data on their own human rights record, surveys may fill in the blanks. Although even that won’t work if the government is so utterly totalitarian that it doesn’t allow private or international polling of its citizens, or if it has scared its citizens to such an extent that they won’t participate honestly in anonymous surveys.

But apart from physical access and respondent honesty in the most dictatorial regimes, polling in general is vulnerable to mistakes and fraud (fraud being a conscious mistake). Here’s an overview of the issues that can mess up public opinion surveys, inadvertently or not.

Wording effect

There’s the well-known problem of question wording, which I’ve discussed in detail before. Pollsters should avoid leading questions, questions that are put in such a way that they pressure people to give a certain answer, questions that are confusing or easily misinterpreted, wordy questions, questions using jargon, abbreviations or difficult terms, double or triple questions etc. Also quite common are “silly questions”, questions that don’t have meaningful or clear answers: for example “is the catholic church a force for good in the world?” What on earth can you answer to that? Depends on what elements of the church you’re talking about, what circumstances, country or even historical period you’re asking about. The answer is most likely “yes and no”, and hence useless.

The importance of wording is illustrated by the often substantial effects of small modifications in survey questions. Even the replacement of a single word by another, related word, can radically change survey results.

Of course, one often claims that biased poll questions corrupt the average survey responses, but that the overall results of the survey can still be used to learn about time trends and difference between groups. As long as you make a mistake consistently, you may still find something useful. That’s true, but no reason not to take care of wording. The same trends and differences can be seen in survey results that have been produced with correctly worded questions.

Order effect or contamination effect

Answers to questions depend on the order they’re asked in, and especially on the questions that preceded. Here’s an example:

Fox News yesterday came out with a poll that suggested that just 33 percent of registered voters favor the Democrats’ health care reform package, versus 55 percent opposed. … The Fox News numbers on health care, however, have consistently been worse for Democrats than those shown by other pollsters. (source)

The problem is not the framing of the question. This was the question: “Based on what you know about the health care reform legislation being considered right now, do you favor or oppose the plan?” Nothing wrong with that.

So how can Fox News ask a seemingly unbiased question of a seemingly unbiased sample and come up with what seems to be a biased result? The answer may have to do with the questions Fox asks before the question on health care. … the health care questions weren’t asked separately. Instead, they were questions #27-35 of their larger, national poll. … And what were some of those questions? Here are a few: … Do you think President Obama apologizes too much to the rest of the world for past U.S. policies? Do you think the Obama administration is proposing more government spending than American taxpayers can afford, or not? Do you think the size of the national debt is so large it is hurting the future of the country? … These questions run the gamut slightly leading to full-frontal Republican talking points. … A respondent who hears these questions, particularly the series of questions on the national debt, is going to be primed to react somewhat unfavorably to the mention of another big Democratic spending program like health care. And evidently, an unusually high number of them do. … when you ask biased questions first, they are infectious, potentially poisoning everything that comes below. (source)

If you want to avoid this mistake – if we can call it that (since in this case it’s quite likely to have been a “conscious mistake” aka fraud) – randomizing the question order for each respondent might help.

Similar to the order effect is the effect created by follow-up questions. It’s well-known that follow-up questions of the type “but what if…” or “would you change your mind if …” change the answers to the initial questions.

Bradley effect

The Bradley effect is a theory proposed to explain observed discrepancies between voter opinion polls and election outcomes in some U.S. government elections where a white candidate and a non-white candidate run against each other.

Contrary to the wording and order effects, this isn’t an effect created – intentionally or not – by the pollster, but by the respondents. The theory proposes that some voters tend to tell pollsters that they are undecided or likely to vote for a black candidate, and yet, on election day, vote for the white opponent. It was named after Los Angeles Mayor Tom Bradley, an African-American who lost the 1982 California governor’s race despite being ahead in voter polls going into the elections.

The probable cause of this effect is the phenomenon of social desirability bias. Some white respondents may give a certain answer for fear that, by stating their true preference, they will open themselves to criticism of racial motivation. They may feel under pressure to provide a politically correct answer. The existence of the effect is, however, disputed. (Some say the election of Obama disproves the effect, thereby making another statistical mistake).

Fatigue effect

Another effect created by the respondents rather than the pollsters is the fatigue effect. As respondents grow increasingly tired over the course of long interviews, the accuracy of their responses could decrease. They may be able to find shortcuts to shorten the interview; they may figure out a pattern (for example that only positive or only negative answers trigger follow-up questions). Or they may just give up halfway, causing incompletion bias.

However, this effect isn’t entirely due to respondents. Survey design can be at fault as well: there may be repetitive questioning (sometimes deliberately for control purposes), the survey may be too long or longer than initially promised, or the pollster may want to make his life easier and group different polls into one (which is what seems to have happened in the Fox poll mentioned above, creating an order effect – but that’s the charitable view of course). Fatigue effect may also be caused by a pollster interviewing people who don’t care much about the topic.

Sampling effect

Ideally, the sample of people who are to be interviewed for a survey should represent a fully random subset of the entire population. That means that every person in the population should have an equal chance of being included in the sample. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. That reduces the randomness of the sample, which can be seen from the fact that self-selection leads to polarized results. The size of the sample is also important. Samples that are too small typically produce biased results.

Even the determination of the total population from which the sample is taken, can lead to biased results. And yes, that has to be determined… For example, do we include inmates, illegal immigrants etc. in the population? See here for some examples of the consequences of such choices.

House effect

A house effect occurs when there are systematic differences in the way that a particular pollster’s surveys tend to lean toward one or the other party’s candidates; Rasmussen is known for that.

I probably forgot an effect or two. Fill in the blanks if you care. Go here for other posts in this series.

Lies, Damned Lies, and Statistics (30): Failing to Correct for Inflation

Inflation is often a significant part of growth in any time series measured in dollars (or other currencies), or – in other words – it’s an important part of an increase over time in data expressed in dollars. So when you compare data for the current year, month or whatever with the same data for some period in the past, you may just see inflation rather than actual growth or increases. By adjusting for inflation, you uncover the real growth. You may even discover that growth hides decline. Here’s an innocuous example of the consequences of failing to adjust data for inflation:

Over the last month, newspapers and film Web sites have proclaimed Avatar the highest-grossing film in American history. … Moviegoers in [the U.S.] have now spent about $700 million on tickets to Avatar. … No. 2 on the all-time list is Titanic, which brought in about $600 million. Avatar surpassed Titanic in late January. The problem with these numbers is that they aren’t adjusted for inflation. … When you adjust movie grosses for inflation, as Box Office Mojo does, you see that “Gone With the Wind” remains the top-grossing movie of all time, with $1.5 billion in box-office sales (using today’s dollars). (source)

This won’t do much damage. The problems start when unadjusted data are being used to push a political point or legislation. For example, one can claim that it isn’t a good idea to raise gasoline taxes because gasoline prices are already very high compared to the old days, but this claim loses much of its strength when you adjust the prices for inflation and it turns out that they are actually rather average, historically.

Of course, you can make mistakes while trying to adjust for inflation, and there are several techniques available, none of which will provide the same numbers. But any adjustment, especially for comparisons over long periods of time, are better than no adjustment at all.

There’s a cool inflation adjusting tool here (only for U.S. data I’m afraid).

Lies, Damned Lies, and Statistics (29): How (Not) to Frame Survey Questions, Ctd.

Here’s a nice example of the way in which small modifications in survey questions can radically change survey results:

Our survey asked the following familiar question concerning the “right to die”: “When a person has a disease that cannot be cured and is living in severe pain, do you think doctors should or should not be allowed by law to assist the patient to commit suicide if the patient requests it?”

57 percent said “doctors should be allowed,” and 42 percent said “doctors should not be allowed.” As Joshua Green and Matthew Jarvis explore in their chapter in our book, the response patterns to euthanasia questions will often differ based on framing. Framing that refers to “severe pain” and “physicians” will often lead to higher support for ending the patient’s life, while including the word “suicide” will dramatically lower support. (source)

Similarly, seniors are willing to pay considerably more for “medications” than for “drugs” or “medicine” (source). Yet another example involves the use of “Wall Street”: there’s greater public support for banking reform when the issue is more specifically framed as regulating “Wall Street banks”.

What’s the cause of this sensitivity? Difficult to tell. Cognitive bias probably has some effect, and the psychology of associations (“suicide” brings up images of blood and pain, whereas “physicians” brings up images of control; similarly “homosexual” evokes sleazy bars, “gay” evokes art and design types). Maybe the willingness not to offend the person asking the question. Anyway, the conclusion is that pollsters should be very careful when framing questions. One tactic could be to use as many different words and synonyms as possible in order to avoid a bias created by one particular word.

Lies, Damned Lies, and Statistics (28): Push Polls

Push polls are used in election campaigns, not to gather information about public opinion, but to modify public opinion in favor of a certain candidate, or – more commonly – against a certain candidate. They are called “push” polls because they intend to “push” the people polled towards a certain point of view.

Push polls are not cases of “lying with statistics” as we usually understand them, but it’s appropriate to talk about them since they are very similar to a “lying technique” that I discussed many times, namely leading questions (see here for example). The difference here is that leading questions aren’t used to manipulate poll results, but to manipulate people.

The push poll isn’t really a poll at all, since the purpose isn’t information gathering. Which is why many people don’t like the term and label it oxymoronic. A better term indeed would be advocacy telephone campaigns. A push poll  is more like a gossip campaign, a propaganda effort or telemarketing. They’re very similar to political attack ads, in the sense that they intend to smear candidates, often with little basis in facts. Compared to political ads, push polls have the “advantage” that they don’t seem to emanate from the campaign offices of one of the candidates. (Push polls are typically conducted by bogus polling agencies). Hence it’s more difficult for the recipients of the push poll to classify the “information” contained in the push poll as political propaganda. He or she is therefore more likely to believe the information. Which is of course the reason push polls are used. Also, the fact that they are presented as “polls” rather than campaign messages, makes it more likely that people listen, and as they listen more, they internalize the messages better than in the case of outright campaigning (which they often dismiss as propaganda).

Push polls usually, but not necessarily, contain lies or false rumors. They may also be limited to misleading or leading questions. For example, a push poll may ask people: “Do you think that the widespread and persistent rumors about Obama’s Muslim faith, based on his own statements, connections and acquaintances, are true?”. Some push polls may even contain some true but unpleasant facts about a candidate, and then hammer on these facts in order to change the opinions of the people being “polled”.

One infamous example of a push poll was the poll used by Bush against McCain in the Republican primaries of 2000 (insinuating that McCain had an illegitimate black child), or the poll used by McCain (fast learner!) against Obama in 2008 (alleging that Obama had ties with the PLO).

One way to distinguish legitimate polls from push polls is the sample size. The former are usually content with relatively small sample sizes (but not too small), whereas the latter typically want to “reach” as many people as possible. Push polls won’t include demographic questions about the people being polled (gender, age, etc.) since there is no intention to aggregate results, let alone aggregate by type of respondent. Another way to identify push polls is the selection of the target population: normal polls try to reach a random subset of the population; push polls are often targeted at certain types of voters, namely those likely to be swayed by negative campaigning about a certain candidate. Push polls also tend to be quite short compared to regular polls, since the purpose is to reach a maximum number of people.

Lies, Damned Lies, and Statistics (26): Objects in Statistics May Appear Bigger Than They Are, Ctd.

I’ve mentioned in a previous post how some numbers or stats can make a problem appear much bigger than it really is (the case in the previous post was about the numbers of suicides in a particular company). The error – or fraud, depending on the motivation – lies in the absence of a comparison with a “normal” number (in the previous post, people failed to compare the number of suicides in the company with the total number of suicides in the country, which made them leap to conclusions about “company stress”, “hyper-capitalism”, “worker exploitation” etc.).

The error is, in other words, absence of context and of distance from the “fait divers”. I’ve now come across a similar example, cited by Aleks Jakulin here. As you know, one of the favorite controversies (some would say nontroversies) of the American right wing is the fate of the prisoners at Guantanamo. President Obama has vowed to close the prison, and either release those who cannot be charged or tranfer them to prisons on the mainland. Many conservatives fear that releasing them would endanger America (some even believe that locking them away in supermax prisons on the mainland is a risk not worth taking). Even those who can’t be charged with a crime, they say, may be a threat in the future. I won’t deal with the perverse nature of this kind of reasoning, except to say that it would justify arbitrary and indefinite detention of large groups of “risky” people.

What I want to deal with here is one of the “facts” that conservatives cite in order to substantiate their fears: recidivism by former Guantanamo detainees.

Pentagon officials have not released updated statistics on recidivism, but the unclassified report from April says 74 individuals, or 14 percent of former detainees, have turned to or are suspected of having turned to terrorism activity since their release.

Of the more than 530 detainees released from the prison between 2002 and last spring, 27 were confirmed to have engaged in terrorist activities and 47 were suspected of participating in a terrorist act, according to Pentagon statistics cited in the spring report. (source)

Such and other stats are ostentatiously displayed and repeated by partisan mouthpieces as a means to scare the s*** out of us, and keep possibly innocent people in jail. The problem is that the levels of recidivism cited above, are way below normal levels of recidivism:

[In the] general population, … about 65% of prisoners are expected to be rearrested within 3 years. The numbers seem lower in recent years, about 58%. More at Wikipedia. (source)

Lies, Damned Lies, and Statistics (24): Mistakes in the Direction of Causation

Suppose you find a correlation between two phenomena. And you’re tempted to conclude that there’s a causal relation as well. The problem is that this causal relation – if it exists at all – can go either way. It’s a common mistake – or a case of fraud, as it happens – to choose one direction of causation and forget that the real causal link can go the other way, or both ways at the same time.

An example. We often think that people who play violent video games are more likely to show violent behavior because they are incited by the games to copy the violence in real life. But can it not be that people who are more prone to violence are more fond of violent video games? We choose a direction of causation that fits with our pre-existing beliefs.

Another widely shared belief is that uninformed and uneducated voters will destroy democracy, or at least diminish its value (see here and here). No one seems to ask the question whether it’s not a diminished form of democracy that renders citizens apathetic and uninformed. Maybe a full or deep democracy can encourage citizens to participate and become more knowledgeable through participation.

A classic example is the correlation between education levels and GDP. Do countries with higher education levels experience more economic growth because of the education levels of their citizens? Or is it that richer countries can afford to spend more on education and hence have better educated citizens? Probably both.

Lies, Damned Lies, and Statistics (23): The Omitted Variable Bias, Ctd.

You see a correlation between two variables, for example clever people wear fancy clothes. Then you assume that one variable must cause the other, in our case: a higher intellect gives people also a better sense of aesthetic taste, or good taste in clothing somehow also makes people smarter. In fact, you may be overlooking a third variable which explains the other two, as well as their correlation. In our case: clever people earn more money, which makes it easier to buy your clothes in shops which help you with your aesthetics.

Here’s an example from Nate Silver’s blog:

Gallup has some interesting data out on the percentage of Americans who pay a lot of attention to political news. Although the share of Americans following politics has increased substantially among partisans of all sides, it is considerably higher among Republicans than among Democrats:

The omitted variable here is age, and the data should be corrected for it in order to properly compare these two populations.

News tends to be consumed by people who are older and wealthier, which is more characteristic of Republicans than Democrats.

People don’t read more or less news because they are Republicans or Democrats. And here’s another one from Matthew Yglesias’ blog:

It’s true that surveys indicate that gay marriage is wildly popular among DC whites and moderately unpopular among DC blacks, but I think it’s a bit misleading to really see this as a “racial divide”. Nobody would be surprised to learn about a community where college educated people had substantially more left-wing views on gay rights than did working class people. And it just happens to be the case that there are hardly any working class white people living in DC. Meanwhile, with a 34-48 pro-con split it’s hardly as if black Washington stands uniformly in opposition—there’s a division of views reflecting the diverse nature of the city’s black population.

Lies, Damned Lies, and Statistics (22): Objects in Statistics May Appear Bigger Than They Are

From a news report some weeks ago:

French Finance Minister Christine Lagarde Thursday voiced her support for France Telecom’s chief executive, who is coming under increased pressure from French unions and opposition politicians over a recent spate of suicides at the company.

Ms. Lagarde summoned France Telecom CEO Didier Lombard to a meeting after the telecommunications company confirmed earlier this week that one of its employees had committed suicide. It was the 24th suicide at the company in 18 months.

In a statement released after the meeting, Ms. Lagarde said she had “full confidence” that Mr. Lombard could get the company through “this difficult and painful moment.”

The French state, which owns a 27% stake in France Telecom, has been keeping a close eye on the company, following complaints by unions that a continuing restructuring plan at the company is putting workers under undue stress.

The suicide rate among the company’s 100,000 employees is in line with France’s national average. Still, unions say that the relocation of staff to different branches of the company around France has added pressure onto employees and their families.

On Tuesday, a spokesman for France’s opposition Socialist Party called for France Telecom’s top management to take responsibility for the suicides and step down. Several hundred France Telecom workers also took to the streets to protest against working conditions.

In the statement released after Thursday’s meeting, France’s Finance Ministry said Mr. Lombard had set up an emergency hotline aimed at providing help to depressed workers. The company has also increased the number of psychologists available to staffers, according to the statement. (source)

More on the problems caused by averages is here.

Lies, Damned Lies, and Statistics (21): Misleading Averages

Did you hear the joke about the statistician who put her head in the oven and her feet in the refrigerator? She said, “On average, I feel just fine.” That’s the same message as in this more widely known joke about statisticians drowning in a pond with an average depth of 3ft. And then there’s this one: did you know that the great majority of people have more than the average number of legs? It’s obvious, really: Among the 57 million people in Britain, there are probably 5,000 people who have only one leg. Therefore, the average number of legs is less than 2. In this case, the median would be a better measure than the average or the mean.

But seriously now, averages can be very misleading, also in statistical work in the field of human rights. Take income data, for example. Income as such isn’t a human rights issue, but poverty is. When we look at income data, we may see that average income is rising. However, this may be due to extreme increases at the top 1% of income. If you then exclude the income increases of the top 1% of the population, the large majority of people may not experience rising income. Possible even the opposite. And rising average income – even excluding extremes at the top levels – is perfectly compatible with rising poverty for certain parts of the population.

Averages are often skewed by outliers. That is why it’s necessary to remove outliers and calculate the averages without them. That will give you a better picture of the characteristics of the general population (the “real” average income evolution in my example). A simple way to neutralize outliers is to look at the median – the middle value of a series of values – rather than the average (or the mean).

An average (or a median for that matter) also doesn’t say anything about the extremes (or, in stat-speak, about the variability or dispersion of the population). A high average income level can hide extremely low and high income levels for certain parts of the population. So, for example, if you start to compare income levels across different countries, you’ll use the average income. Yet country A may have a lower average income than country B, but also lower levels of poverty than country B. That’s because the dispersion of income levels in country A is much smaller than in country B. The average in B is the result of adding together extremely low incomes (i.e. poverty) and extremely high incomes, whereas the average in A comes from the sum of incomes that are much more equal. From the point of view of poverty average income is misleading because it identifies country A as most poor, whereas in reality there are more poor people in country B. So when looking at averages, it’s always good to look at the standard deviation as well. SD is a measure of the dispersion around the mean.

More posts in this series.

Lies, Damned Lies, and Statistics (19): Fun With Percentages

A certain company discovered that 40% of all sick days were taken on a Friday or a Monday. They immediately clamped down on sick leave before they realized their mistake. Forty percent represents two days out of a five day working week and is therefore a normal spread. Nothing to do with lazy employees wishing to extend their weekends. They are just as sick on any other day.

A more serious example, now, more relevant also to human rights:

The stunning statistic that 70% of black babies are born out of wedlock is driven, to be sure, by the fact that many poor black women have a lot of children. But it turns out it is also driven by the fact that married black women have fewer children than married white women. (source)

The fact that married black women have fewer children than married white women obviously inflates the percentage of black babies born out of wedlock. If married black women had just as many children as married white women, the proportion or percentage of black babies out of wedlock would drop mechanically. But why do they have fewer children? It seems it’s a matter of being able to afford children.

It’s well known that the black middle class has a lot less in the way of assets than whites of similar income levels – hardly surprising, given the legacy of generations of discrimination and poverty. But that also means that things that a lot of white middle class people take for granted – like help with a down-payment on a house when you have your first kid – are less available. Middle class black parents have less in the way of a parental safety net than their white equivalents, so they’re less likely to have a second kid. (source)

The 70%, when compared to the national average which is about 40%, may seem high, but it’s artificially inflated by the relatively low number of black babies in wedlock. So before you go out yelling (see here for example) that all the poverty and educational problems of African-Americans are caused by the fact that too many of their children are born and raised out of wedlock, and presumably by single parents (although the latter doesn’t follow from the former), and that it’s better to promote “traditional marriage” instead of affirmative action, welfare etc., you may want to dig a bit deeper first. If you do, you’ll paint a more nuanced picture than the one about dysfunctional black families and irresponsible black fathers.

Nevertheless, while the percentages may not be as high as they seem at first glance, it remains true that black babies still make up a disproportionate share of kids born out of wedlock. And if “born out of wedlock” means “single parents” (usually mothers) then this can be a problem. Although many single parents do a great job raising their children (and often a better job than many “normal” families), it can be tough and the risks of ending up in poverty are much higher. And yet, even this is not enough to justify sermons about irresponsible black fathers. Maybe the misguided war on drugs, racial profiling and incarceration statistics have something to do with it.

Lies, Damned Lies, and Statistics (18): Comparing Apples and Oranges

Before the introduction of tin helmets during the First World War, soldiers only had cloth hats to wear. The strange thing was that after the introduction of tin hats, the number of injuries to the head increased dramatically. Needless to say, this was counter-intuitive. The new helmets were designed precisely to avoid or limit such injuries.

Of course, people were comparing apples with oranges, namely statistics on head injuries before and after the introduction of the new helmets. In fact, what they should have done, and effectively did after they realized their mistake, was to include in the statistics, not only the injuries, but also the fatalities. After the introduction of the new helmets, the number of fatalities dropped dramatically, but the number of injuries went up because the tin helmet was saving soldiers’ lives, but the soldiers were still injured.

Lies, Damned Lies, and Statistics (17): The Correlation-Causation Problem and Omitted Variable Bias

Suppose we see from Department of Defense data that male U.S. soldiers are more likely to be killed in action than female soldiers. Or, more precisely, the percentage of male soldiers killed in action is larger than the percentage of female soldiers. So there is a correlation between the gender of soldiers and the likelihood of being killed in action.

One could – and one often does – conclude from such a finding that there is a causation of some kind: the gender of soldiers increases the chances of being killed in action. Again more precisely: one can conclude that some aspects of gender – e.g. a male propensity for risk taking – leads to higher mortality.

However, it’s here that the Omitted Variable Bias pops up. The real cause of the discrepancy between male and female combat mortality may not be gender or a gender related thing, but a third element, an “omitted variable” which doesn’t show up in the correlation. In our fictional example, it may be the type of deployment: it may be that male soldiers are more commonly deployed in dangerous combat operations, whereas female soldiers may be more active in support operations away from the front-line.

OK, time for a real example. It has to do with home-schooling. In the U.S., many parents decide to keep their children away from school and teach them at home. For different reasons: ideological ones, reasons that have to do with their children’s special needs etc. The reasons are not important here. What is important is that many people think that home-schooled children are somehow less well educated (parents, after all, aren’t trained teachers). However, proponents of home-schooling point to a study that found that these children score above average in tests. However, this is a correlation, not necessarily a causal link. It doesn’t prove that home-schooling is superior to traditional schooling. Parents who teach their children at home are, by definition, heavily involved in their children’s education. The children of such parents do above average in normal schooling as well. The omitted variable here is parents’ involvement. It’s not the fact that the children are schooled at home that explains their above average scores. It’s the type of parents. Instead of comparing home-schooled children to all other children, one should compare them to children from similar families in the traditional system.

Greg Mankiw believes he has found another example of Omitted Variable Bias in the data plotting test scores for U.S. students against their family income:

Kids from higher income families get higher average SAT scores. Of course! But so what? This fact tells us nothing about the causal impact of income on test scores. … This graph is a good example of omitted variable bias … The key omitted variable here is parents’ IQ. Smart parents make more money and pass those good genes on to their offspring. Suppose we were to graph average SAT scores by the number of bathrooms a student has in his or her family home. That curve would also likely slope upward. (After all, people with more money buy larger homes with more bathrooms.) But it would be a mistake to conclude that installing an extra toilet raises yours kids’ SAT scores. … It would be interesting to see the above graph reproduced for adopted children only. I bet that the curve would be a lot flatter. Greg Mankiw (source)

Meaning that adopted children, who usually don’t receive their genes from their new families, have equal test scores, no matter if they have been adopted by rich or poor families. Meaning in turn that the wealth of the family in which you are raised doesn’t influence your education level, test scores or intelligence.

However, in his typical hurry to discard all possible negative effects of poverty, Mankiw may have gone a bit too fast. While it’s not impossible that the correlation is fully explained by differences in parental IQ, other evidence points elsewhere. I’m always suspicious of theories that take one cause, exclude every other type of explanation and end up with a fully deterministic system, especially if the one cause that is selected is DNA. Life is more complex than that. Regarding this particular matter, education levels are to some extent determined by parental income (university enrollment is determined both by test scores and by parental income, even to the extent that people from high income families but with average test scores, are slightly more likely to enroll in university than people from poor families but with high test scores).

What Mankiw did, in trying to avoid the Omitted Variable Bias, was in fact another type of bias, one which we could call the Singular Variable Bias: assuming that a phenomenon has a singular cause.

Lies, Damned Lies, and Statistics (16): Measuring Public Opinion in Dictatorships

Measuring human rights requires a certain level of respect for human rights (freedom to travel, freedom to speak, to interview etc.). Trying to measure human rights in situations characterized by the absence of freedom is quite difficult, and can even lead to unexpected results: the absence of (access to) good data may give the impression that things aren’t as bad as they really are. Conversely, when a measurement shows a deteriorating situation, the cause of this may simply be better access to better data. And this better access to better data may be the result of more openness in society. Deteriorating measurements may therefore signal an actual improvement. I gave an example of this dynamic here (it’s an example of statistics on violence against women).

Measuring public opinion in authoritarian countries is always difficult, but if you ask the public if they love or hate their government, it’s likely that you’ll have higher rates of “love” in the more authoritarian countries. After all, in those countries it can be pretty dangerous to tell someone in the street that you hate your government. They choose to lie and say that they approve. That’s the safest answer but probably in many cases not the real one. I don’t believe for a second that the percentage of people approving of their government is 19 times higher in Azerbaijan than in Ukraine, when Ukraine is in fact much more liberal than Azerbaijan.

In the words of Robert Coalson:

The Gallup chart is actually an index of fear. What it reflects is not so much attitudes toward the government as a willingness to openly express one’s attitudes toward the government. As one member of RFE/RL’s Azerbaijan Service told me, “If someone walked up to me in Baku and asked me what I thought about the government, I’d say it was great too”.

Lies, Damned Lies, and Statistics (12): Generalization

An example from Greg Mankiw’s blog:

Should we [the U.S.] envy European healthcare? Gary Becker says the answer is no:

“A recent excellent unpublished study by Samuel Preston and Jessica Ho of the University of Pennsylvania compare mortality rates for breast and prostate cancer. These are two of the most common and deadly forms of cancer – in the United States prostate cancer is the second leading cause of male cancer deaths, and breast cancer is the leading cause of female cancer deaths. These forms of cancer also appear to be less sensitive to known attributes of diet and other kinds of non-medical behavior than are lung cancer and many other cancers. [Health effects of diet and behavior should be excluded when comparing the quality of healthcare across countries. FS]

These authors show that the fraction of men receiving a PSA test, which is a test developed about 25 years ago to detect the presence of prostate cancer, is far higher in the US than in Sweden, France, and other countries that are usually said to have better health delivery systems. Similarly, the fraction of women receiving a mammogram, a test developed about 30 years ago to detect breast cancer, is also much higher in the US. The US also more aggressively treats both these (and other) cancers with surgery, radiation, and chemotherapy than do other countries.

Preston and Hu show that this more aggressive detection and treatment were apparently effective in producing a better bottom line since death rates from breast and prostate cancer declined during the past 20 [years] by much more in the US than in 15 comparison countries of Europe and Japan.” (source)

Even if all this is true, how on earth can you assume that a healthcare system is better because it is more successful in treating two (2!) diseases?

Another example: the website of the National Alert Registry for sexual offenders used to post a few “quick facts”. One of them said:

“The chance that your child will become a victim of a sexual offender is 1 in 3 for girls… Source: The National Center for Victims of Crime“.

Someone took the trouble of actually checking this source, and found that it said:

Twenty-nine percent [i.e. approx. 1 in 3] of female rape victims in America were younger than eleven when they were raped.

One in three rape victims is a young girl, but you can’t generalize from that by saying that one in three young girls will be the victim of rape. Perhaps they will be, but you can’t know that from these data. Like you can’t conclude from the way the U.S. deals with two diseases that it “shouldn’t envy European healthcare”. Perhaps it shouldn’t, but more general data on life expectancy says it should.

These are two examples of induction or inductive reasoning, sometimes called inductive logic, a reasoning which formulates laws based on limited observations of recurring phenomenal patterns. Induction is employed, for example, in using specific propositions such as:

This door is made of wood.

to infer general propositions such as:

All doors are made of wood. (source)

More posts in this series.

Lies, Damned Lies, and Statistics (11): Polarized Statistics as a Result of Self-Selection

One of the most important things in the design of an opinion survey – and opinion surveys are a common tool in data gathering in the field of human rights – is the definition of the sample of people who will be interviewed. We can only assume that the answers given by the people in the sample are representative of the opinions of the entire population if the sample is a fully random subset of the population – that means that every person in the population should have an equal chance of being part of the survey group.

Unfortunately, many surveys depend on self-selection – people get to decide themselves if they cooperate – and self-selection distorts the randomness of the sample:

Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. (source)

Self-selection is almost always a problem in online surveys (of the PollDaddy variety), phone-in surveys for television or radio shows, and so-called “red-button” surveys in which people vote with the remote control of their television set. However, it can also occur in more traditional types of surveys. When you survey the population of a brutal dictatorial state (if you get the chance) and ask the people about their freedoms and rights, many will deselect themselves: they will refuse to cooperate with the survey for fear of the consequences.

When we limit ourselves to the effects of self-selection (or self-deselection) in democratic states, we may find that this has something to do with the often ugly and stupid “us-and-them” character of much of contemporary politics. There seems to be less and less room for middle ground, compromise or nuance.

Lies, Damned Lies, and Statistics (10): How (Not) to Frame Survey Questions

I’ve mentioned before that information on human rights depends heavily on opinion surveys. Unfortunately, surveys can be wrong and misleading for so many different reasons that we have to be very careful when designing surveys and when using and interpreting survey data. One reason I haven’t mentioned before is the framing of the questions.

Even very small differences in framing can produce widely divergent answers. And there is a wide variety of problems linked to the framing of questions:

  • Questions can be leading questions, questions that suggests the answer. For example: “It’s wrong to discriminate against people of another race, isn’t it?” Or: “Don’t you agree that discrimination is wrong?”
  • Questions can be put in such a way that they put pressure on people to give a certain answer. For example: “Most reasonable people think racism is wrong. Are you one of them?” This is also a leading question of course, but it’s more than simply “leading”.
  • Questions can be confusing or easily misinterpreted. Such questions often include a negative, or, worse, a double negative. For example: “Do you agree that it isn’t wrong to discriminate under no circumstances?” Needless to say that your survey results will be infected by answers that are the opposite of what they should have been.
  • Questions can be wordy. For example: “What do you think about discrimination (a term that refers to treatment taken toward or against a person of a certain group that is based on class or category rather than individual merit) as a type of behavior that promotes a certain group at the expense of another?” This is obviously a subtype of the confusing-variety.
  • Questions can also be confusing because they use jargon, abbreviations or difficult terms. For example: “Do you believe that UNESCO and ECOSOC should administer peer-to-peer expertise regarding discrimination in an ad hoc or a systemic way?”
  • Questions can in fact be double or even triple questions, but there is only one answer required and allowed. Hence people who may have opposing answers to the two or three sub-questions will find it difficult to provide a clear answer. For example: “Do you agree that racism is a problem and that the government should do something about it?”
  • Open questions should be avoided in a survey. For example: “What do you think about discrimination?” Such questions do not yield answers that can be quantified and aggregated.
  • You also shouldn’t ask questions that exclude some possible answers, and neither should you provide a multiple-choice set of answers that doesn’t include some possible answers. For example: “How much did the government improve its anti-discrimination efforts relative to last year? Somewhat? Average? A lot?” Notice that such a framing of the question doesn’t allow people to respond that the effort had not improved or had worsened. Another example: failure to include “don’t know” as a possible answer.

Here’s a real-life example:

In one of the most infamous examples of flawed polling, a 1992 poll conducted by the Roper organization for the American Jewish Committee found that 1 in 5 Americans doubted that the Holocaust occurred. How could 22 percent of Americans report being Holocaust deniers? The answer became clear when the original question was re-examined: “Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?” This awkwardly-phrased question contains a confusing double-negative which led many to report the opposite of what they believed. Embarrassed Roper officials apologized, and later polls, asking clear, unambiguous questions, found that only about 2 percent of Americans doubt the Holocaust. (source)

Lies, Damned Lies, and Statistics (9): Too Small Sample Sizes in Surveys

So many things can go wrong in the design and execution of opinion surveys. And opinion surveys are a common tool in data gathering in the field of human rights.

As it’s often impossible (and undesirable) to question a whole population, statisticians usually select a sample from the population and ask their questions only to the people in this sample. They assume that the answers given by the people in the sample are representative of the opinions of the entire population. But that’s only the case if the sample is a fully random subset of the population – that means that every person in the population should have an equal chance of being chosen – and if the sample hasn’t been distorted by other factors such as self-selection by respondents (a common thing in internet polls) or personal bias by the statistician who selects the sample.

A sample that is too small is also not representative for the entire population. For example, if we ask 100 people if they approve or disapprove of discrimination of homosexuals, and 55 of them say they approve, we might assume that about 55% of the entire population approves. Now it could possible be that only 45% of the total population approve, but that we just happened, by chance, to interview an unusually large percentage of people who approve. For example, this may have happened because, by chance and without being aware of it, we selected the people in our sample in such a way that there are more religious conservatives in our sample than there are in society, relatively speaking.

This is the problem of sample size: the smaller the sample, the greater the influence of luck on the results we get. Asking the opinion of 100 people, and taking this as representative of millions of citizens, is like throwing a coin 10 times and assuming – after having 3 heads and 7 tails – that the probability of throwing heads is 30%. We all know that it’s not 30 but 50%. And we know this because we know that when we increase the “sample size” – i.e. when we throw more than 10 times, say a thousand times – we will have heads and tails approximately half of the time. Likewise, if we take our example of the survey on homosexuality: increasing the sample size reduces the chance that religious conservatives (or other groups) are disproportionately represented in the sample.

When analyzing survey results, the first thing to look at is the sample size, as well as the level of confidence (usually 95%) that the results are within a certain margin of error (usually + or – 5%). High levels of confidence that the results are correct within a small margin of error indicate that the sample was sufficiently large and random.

Lies, Damned Lies, and Statistics (7): “Drowning” Data

Suppose we want to know how many forced disappearances there are in Chechnya. Assuming we have good data this isn’t hard to do. The number of disappearances that have been registered, by the government or some NGO, is x on a total Chechen population of y, giving z%. The Russian government may decide that the better measurement is for Russia as a whole. Given that there are almost no forced disappearances in other parts of Russia, the z% goes down dramatically, perhaps close to or even below the level other comparable countries.

Good points for Russia! But that doesn’t mean that the situation in Chechnya is OK. The data for Chechnya are simply “drowned” into those of Russia, giving the impression that “overall”, Russia isn’t doing all that bad. This, however, is misleading. The proper unit of measurement should be limited to the area where the problem occurs. The important thing here isn’t a comparison of Russia with other countries; it’s an evaluation of a local problem.

Something similar happens to the evaluation of the Indian economy:

Madhya Pradesh, for example, is comparable in population and incidence of poverty to the war-torn Democratic Republic of Congo. But the misery of the DRC is much better known than the misery of Madhya Pradesh, because sub-national regions do not appear on “poorest country” lists. If Madhya Pradesh were to seek independence from India, its dire situation would become more visible immediately. …

But because it’s home to 1.1 billion people, India is more able than most to conceal the bad news behind the good, making its impressive growth rates the lead story rather than the fact that it is home to more of the world’s poor than any other country. …

A 10-year-old living in the slums of Calcutta, raising her 5-year-old brother on garbage and scraps, and dealing with tapeworms and the threat of cholera, suffers neither more nor less than a 10-year-old living in the same conditions in the slums of Lilongwe, the capital of Malawi. But because the Indian girl lives in an “emerging economy,” slated to battle it out with China for the position of global economic superpower, and her counterpart in Lilongwe lives in a country with few resources and a bleak future, the Indian child’s predicament is perceived with relatively less urgency. (source)

Lies, Damned Lies, and Statistics (6): Statistical Bias in the Design and Execution of Surveys

Statisticians can – wittingly or unwittingly – introduce bias in their work. Take the case of surveys for instance. Two important steps in the design of a survey are the definition of the population and the selection of the sample. As it’s often impossible (and undesirable) to question a whole population, statisticians usually select a sample from the population and ask their questions only to the people in this sample. They assume that the answers given by the people in the sample are representative of the opinions of the entire population.

Bias can be introduced

  • at the moment of the definition of the population
  • at the moment of the selection of the sample
  • at the moment of the execution of the survey (as well as at other moments of the statistician’s work, which I won’t mention here).

Population

Let’s take a fictional example of a survey. Suppose statisticians want to measure public opinion regarding the level of respect for human rights in the country called Dystopia.

First, they set about defining their “population”, i.e. the group of people whose “public opinion” they want to measure. “That’s easy”, you think. So do they, unfortunately. It’s the people living in this country, of course, or is it?

Not quite. Suppose the level of rights protection in Dystopia is very low, as you might expect. That means that probably many people have fled the country. Including in the survey population only the residents of the country will then overestimate the level of rights protection. And there is another point: dead people can’t talk. We can assume that many victims of rights violations are dead because of them. Not including these dead people in the survey will also artificially push up the level of rights protection. (I’ll mention in a moment how it is at all possible to include dead people in a survey; bear with me).

Hence, doing a survey and then assuming that the people who answered the survey are representative for the whole population, means discarding the opinions of refugees and dead people. If those opinions were included the results would be different and more correct. Of course, in the case of dead people it’s obviously impossible to include their opinions, but perhaps it would be advisable to make a statistical correction for it. After all, we know their answers: people who died because of rights violations in their country presumably wouldn’t have a good opinion of their political regime.

Sample

And then there are the problem linked to the definition of the sample. An unbiased sample should represent a fully random subset of the entire and correctly defined population (needless to say that if the population is defined incorrectly, as in the example above, then the sample is by definition also biased even if no sampling mistakes have been made). That means that every person in the population should have an equal chance of being chosen. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. The latter is very likely in my Dystopia example. People who are too afraid to talk won’t talk. The harsher the rights violations, the more people who will fail to cooperate. So you have a perverse effect that very cruel regimes may score better on human rights surveys that modestly cruel regimes. The latter are cruel, but not cruel enough to scare the hell out of people.

The classic sampling error is from a poll on the 1948 Presidential election in the U.S.

On Election night, the Chicago Tribune printed the headline DEWEY DEFEATS TRUMAN, which turned out to be mistaken. In the morning the grinning President-Elect, Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a phone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (source)

Execution

Another reason why bias in the sampling may occur is the way in which the surveys are executed. If the government of Dystopia allows statisticians to operate on its territory, it will probably not allow them to operate freely, or circumstances may not permit them to operate freely. So the people doing the interviews are not allowed to, or don’t dare to, travel around the country. Hence they themselves deselect entire groups from the survey, distorting the randomness of the sample. Again, the more repressive the regime, the more this happens. With possible adverse effects. The people who can be interviewed are perhaps only those living in urban areas, close to the residence of the statisticians. And those living there may have a relatively large stake in the government, which makes them paint a rosy image of the regime.