Lies, Damned Lies, and Statistics (45): Anonymity in Surveys Changes Survey Results

Whether interviewees are given anonymity or not makes a big difference in survey results:

When people are assured of anonymity, it turns out, a lot more of them will acknowledge that they have had same-sex experiences and that they don’t entirely identify as heterosexual. But it also turns out that when people are assured of anonymity, they will show significantly higher rates of anti-gay sentiment. These results suggest that recent surveys have been understating, at least to some degree, two different things: the current level of same-sex activity and the current level of opposition to gay rights. (source)

Anonymity can result in data that are closer to the truth, so it’s tempting to require it, especially in the case of surveys that may suffer from social desirability bias (surveys asking about opinions that people are reluctant to divulge because these opinions are socially unacceptable – the Bradley effect is one example). However, anonymity can also create problems. For example, it may make it difficult to avoid questioning the same people more than once.

Go here for other posts in this series.

Advertisements

Lies, Damned Lies, and Statistics (40): The Composition Effect

Take the evolution of the median wage in the US over the last decades. The trend is nearly flat and one would therefore naturally assume that there have been hardly any income gains for the average US citizen. However, some have argued that this conclusion is wrong because it ignores the composition effect. In this example, the composition of the labor force has obviously changed over the last decades, and has changed dramatically. More women and immigrants have entered the workforce and those tend to be lower income groups, especially at the moment of entry. When they enter the labor force, their incomes go up, obviously, but they bring the average and the median down. When, at the same time, the wages of white men go up, the aggregate effect may be close to zero. And yet, paradoxically, all groups have progressed. The conclusion that the average citizen did not progress would only hold if the composition of the population whose wages are compared over time had not changed.

Now, it seems to be the case that in this particular example there is really no large composition effect (see here). However, this effect is always a possibility and one should at least consider it and possibly rule it out before drawing hasty conclusions from historical time series. If you don’t do this, or don’t even try, then you may be “lying with statistics”.

More posts in this series are here.

Lies, Damned Lies, and Statistics (39): Availability Bias

This is actually only about one type of availability bias: if a certain percentage of your friends are computer programmers or have red hair, you may conclude that the same percentage of a total population are computer programmers or have red hair. You’re not working with a random and representative sample – perhaps you like computer programmers or you are attracted to people with red hair – so you make do with the sample that you have, the one that is immediately available, and you extrapolate on the basis of that.

Most of the time you’re wrong to do so – as in the examples above. In some cases, however, it may be a useful shortcut that allows you to avoid the hard work of establishing a random and representative sample and gathering information from it. If you use a sample that’s not strictly random but also not biased by your own strong preferences such as friendship or attraction, it may give reasonably adequate information on the total population. If you have a reasonably large number of friends and if you couldn’t care less about their hair color, then it may be OK to use your friends as a proxy of a random sample and extrapolate the rates of each hair color to the total population.

The problem is the following: because the use of available samples is sometimes OK, we are perhaps fooled into thinking that they are OK even when they’re not. And then we come up with arguments like:

  • Smoking can’t be all that bad. I know a lot of smokers who have lived long and healthy lives.
  • It’s better to avoid groups of young black men at night, because I know a number of people who have been attacked by young black men (and I’ll forget that I’ll hardly ever hear of people not having been attacked).
  • Cats must have a special ability to fall from great heights and survive, because I’ve seen a lot of press reports about such events (and I forget that I’ll rarely read a report about a cat falling and dying).
  • Violent criminals should be locked up for life because I’m always reading newspaper articles about re-offenders (again, very unlikely that I’ll read anything about non-re-offenders).

As is clear from some of the examples above, availability bias can sometimes have consequences for human rights: it can foster racial bias, it can lead to “tough on crime” policies, etc.

More posts in this series are here.

Lies, Damned Lies, and Statistics (38): The Base-Rate Fallacy

When judging whether people engage in discrimination it’s important to make the right comparisons. Take the example of an American company X where 98 percent of employees are white and only 2 percent are black. If you compare to (“if your base is”) the entire US population – of which about 13 percent are African American – then you’ll conclude that company X is motivated by racism in its employment decisions.

However, in cases such as these, it’s probably better to use another base rate, namely the number of applicants rather than the total population. If only 0.1 percent of job applications where from blacks, then an employment rate of 2 percent blacks actually shows that company X has favored black applicants.

The accusation of racism betrays a failure to point to the real causes of discrimination. It’s a failure to go back far enough and to think hard enough. The fact that only 0.1 percent of applicants were black – instead of the expected 13 percent – may still be due to racism, but not racism in company X. Blacks may suffer from low quality education, which results in a skill deficit among blacks, which in turn leads to a low application rate for certain jobs.

The opposite error is also quite common: people point to the number of blacks in prison, compare this to the total number of blacks, and conclude that blacks must be more attracted to crime. However, they should probably compare incarceration rates to arrest rates (blacks are arrested at higher rates because of racial profiling). And they should take into account jury bias as well.

More about racism. More posts in this series.

Lies, Damned Lies, and Statistics (37): When Surveyed, People Express Opinions They Don’t Hold

It’s been a while since the last post in this series, so here’s a recap of its purpose. This blog promotes the quantitative approach to human rights: we need to complement the traditional approaches – anecdotal, journalistic, legal, judicial etc. – with one that focuses on data, country rankings, international comparisons, catastrophe measurement, indexes etc.

Because this statistical approach is important, it’s also important to engage with measurement problems, and there are quite a few in the case of human rights. After all, you can’t measure respect for human rights like you can measure the weight or size of an object. There are huge obstacles to overcome in human rights measurement. On top of the measurement difficulties that are specific to the area of human rights, this area suffers from some of the general problems in statistics. Hence, there’s a blog series here about problems and abuse in statistics in general.

Take for example polling or surveying. A lot, but not all, information on human rights violations comes from surveys and opinion polls, and it’s therefore of the utmost importance to describe what can go wrong when designing, implementing and using surveys and polls. (Previous posts about problems in polling and surveying are here, here, here, here and here).

One interesting problem is the following:

Simply because the surveyor is asking the question, respondents believe that they should have an opinion about it. For example, researchers have shown that large minorities would respond to questions about obscure or even fictitious issues, such as providing opinions on countries that don’t exist. (source, source)

Of course, when people express opinions they don’t have, we risk drawing the wrong conclusions from surveys. We also risk that a future survey asking the same questions comes up with totally different results. Confusion guaranteed. After all, if we make up our opinions when someone asks us, and those aren’t really our opinions but rather unreflected reactions we give because of a sense of obligation, it’s unlikely that we will express the same opinion in the future.

Another reason for this effect is probably our reluctance to come across as ignorant: rather than selecting the “I don’t know/no opinion” answer, we just pick one of the other possible answers. Again a cause of distortions.

Lies, Damned Lies, and Statistics (33): The Omitted Variable Bias, Ctd.

I discussed the so-called Omitted Variable Bias before on this blog (here and here). So I suppose I can mention this other example: guess what is the correlation, on a country level, between per capita smoking rates and life expectancy rates? High smoking rates equal low life expectancy rates, right? And vice versa?

Actually, and surprisingly, the correlation goes the other way: the higher smoking rates – the more people smoke in a certain country – the longer the citizens of that country live, on average.

Why is that the case? Smoking is unhealthy and should therefore make life shorter, on average. However, people in rich countries smoke more; in poor countries they can’t afford it. And people in rich countries live longer. But they obviously don’t live longer because they smoke more but because of the simple fact they have the good luck to live in a rich country, which tends to be a country with better healthcare and the lot. If they would smoke less they would live even longer.

Why is this important? Not because I’m particularly interested in smoking rates. It’s important because it shows how easily we are fooled by simple correlations, how we imagine what correlations should be like, and how we can’t see beyond the two elements of a correlation when we’re confronted with one that goes against our intuitions. We usually assume that, in a correlation, one element should cause the other. And apart from the common mistake of switching the direction of the causation, we often forget that there can be a third element causing the two elements in the correlation (in this example, the prosperity of a country causing both high smoking rates and high life expectancy), rather than one element in the correlation causing the other.

More posts in this series are here.

Lies, Damned Lies, and Statistics (32): The Questioner Matters

I’ve discussed the role of framing before: the way in which you ask questions in surveys influences the answers you get and therefore modifies the survey results. (See here and here for instance). It happens quite often that polling organizations or media inadvertently or even deliberately frame questions in a way that will seduce people to answer the question in a particular fashion. In fact you can almost frame questions in such a way that you get any answer you want.

However, the questioner may matter just as much as the question.

Consider this fascinating new study, based on surveys in Morocco, which found that the gender of the interviewer and how that interviewer was dressed had a big impact on how respondents answered questions about their views on social policy. …

[T]his paper asks whether and how two observable interviewer characteristics, gender and gendered religious dress (hijab), affect survey responses to gender and non-gender-related questions. [T]he study finds strong evidence of interviewer response effects for both gender-related items, as well as those related to support for democracy and personal religiosity … Interviewer gender and dress affected responses to survey questions pertaining to gender, including support for women in politics and the role of Shari’a in family law, and the effects sometimes depended on the gender of the respondent. For support for gender equality in the public sphere, both male and female respondents reported less progressive attitudes to female interviewers wearing hijab than to other interviewer groups. For support for international standards of gender equality in family law, male respondents reported more liberal views to female interviewers who do not wear hijab, while female respondents reported more liberal views to female respondents, irrespective of dress. (source, source)

Other data indicate that the effect occurs in the U.S. as well. This is potentially a bigger problem than the framing effect since questions are usually public and can be verified by users of the survey results, whereas the nature of the questioner is not known to the users.

There’s an overview of some other effects here. More on the headscarf is here. More posts in this series are here.