Measuring Human Rights (32): Assessing Advocacy and Policy by Way of Counterfactual Thinking

Human rights measurement is ultimately about levels of respect for human rights, but it can also be useful to try to measure the impact of human rights advocacy and policy on these levels. Both advocacy and policy (the difference being that the former is non-governmental) aim at improving levels of respect for human rights. Obviously, those levels don’t depend solely on advocacy or policy, but it’s reasonably to assume that they are to some extent dependent on those types of action. It’s hard – although not impossible – to imagine that millions of people and dozens of governments and international institutions would engage in pointless activity.

The question is then: to what extent exactly? How much do advocacy and policy help? The problem in answering this question is that we won’t necessarily learn a lot by simply looking at the levels and how they evolve. Not only is there the difficulty of comparing different possible causes; a flat trend line – or even a declining trend line – may cover up how much more awful things would have been without advocacy and policy. Levels of respect may very well stay as they are or even worsen while advocacy and policy are relatively successful because the levels without advocacy and policy would have been even lower.

Of course, it’s very hard to quantify this. If there’s improvement, you can at least try to sort out the relative contribution of different causes. If things don’t improve or even worsen, then the only way to measure the effect of advocacy and policy is the use of counterfactual thinking. And that’s a problem. How bad (or good?) would things have been without advocacy and policy? We can’t redo a part of a country’s history to test what would have happened with other choices. We can speculate about the answer to “what if” questions but since we can’t experiment we’re left with a lot of uncertainty. What if Hitler had won the war? Or had been admitted to art school? Fun questions to try and answer, but the answers won’t tell us much about the real world, unfortunately. If they did, we would know what to do.

More posts in this series are here.

Measuring Human Rights (30): Distortions Caused by the Exclusion of Prisoners

I’ve already cited one example of human rights measurement gone wrong because of the exclusion of the prison inmate population: violent crime rates seem to go down in many countries, but a lot of the decrease only happens because surveys and databases exclude the crimes that take place inside of prisons. Crime may not have gone down at all; perhaps a lot of it has just been moved to the prisons.

I’ll now add a few other examples of distortions in human rights measurement caused by the exclusion of the prisoner population. The cases I’ll cite result in distortions because the exclusion of the prison population is the exclusion of a non-representative sample of the total population. For example, it’s well-known that African-Americans make up a disproportionate share of the inmate population in the U.S. Becky Pettit, a University of Washington sociologist, argues in her book “Invisible Men” that we shouldn’t take for granted some of the indicators of black progress in the U.S.:

For example, without adjusting for prisoners, the high-school completion gap between white and black men has fallen by more than 50% since 1980 … After adjusting … the gap has barely closed and has been constant since the late 1980s. (source)

We see similar results when counting or better recounting voter turnout numbers, employment rates etc.

It should be rather easy to include prisoners in most of these measurements – certainly compared to the homeless, illegal immigrants and citizens of dictatorships. The fact that we almost systematically exclude them is testimony to our attitude towards prisoners: they are excluded from society, and they literally don’t count.

More posts in this series are here.

Measuring Human Rights (29): When More Means Less, and Vice Versa, Ctd.

Take the example of rape measurement: better statistical and reporting methods used by the police, combined with less social stigma and other factors result in statistics showing a rising number of rapes, but this increase is due to the measurement methods and other effects, not to what happened in real life. The actual number of rapes may have gone down.

This is a general problem in human rights measurement: more often means less, and vice versa. The nature of the thing we’re trying to measure – human rights violations – means that the more there is, the more difficult it is to measure; and the more difficult, the more likely that we wrongly conclude that there is less. (See here). When levels of rights violations approach totalitarianism, people won’t report, won’t dare to speak, or won’t be able to speak. It’s not social stigma or shame that prevents them from speaking, as in the case of rape, but fear. Furthermore, totalitarian governments won’t allow monitoring, and will have managed to some extent to indoctrinate their citizens. Finally, the state of the economy won’t allow for easy transport and communication, given the correlation between economic underdevelopment and authoritarian government.

Conversely, higher levels of respect for human rights will yield statistics showing more rights violations, because a certain level of respect for human rights makes monitoring easier.

More on measuring human rights.

Measuring Poverty (15): A Common Misconception About Relative Poverty

Yesterday, I had a short email exchange with Tim Harford, in which I reacted to one of his claims in this article, more specifically the claim that the use of a relative notion of poverty in poverty measurement implies that poverty will always be with us:

Eurostat, the European Union’s statistics agency, … defines the poverty line as 60 per cent of each nation’s median income. (The median income is the income of the person in the middle of the income distribution.)

This has an unfortunate consequence: poverty is permanent. If everyone in Europe woke up tomorrow to find themselves twice as rich, European poverty rates would not budge. That is indefensible. Such “poverty” lines measure inequality, not poverty.

This argument against relative poverty is as common as it is mistaken. Here’s my email to Tim:

I read your article on poverty measurement a moment ago, and I wanted to object. You say that using a relative poverty measurement of income below 60% of median income makes poverty “permanent”. It does not. True, someone with an income of 61% of the median does not suddenly become poor because the median person receives a pay rise. But it’s also true that it’s perfectly doable – mathematically if not in reality – to raise every single poor person’s income above 60% of the median without changing the median. Poverty is only permanent when one would use 60% of the average as threshold, but no one proposes such a foolish thing, fortunately.

In fairness to Tim, his article does list some advantages of relative poverty and he qualified his views in our email correspondence.

More posts in this series are here.

Capital Punishment (42): The Stupidity of Deterrent Statistics, Ctd.

The so-called deterrent effect is one of the main arguments in favor of capital punishment. I’ve argued many times before that the data we have don’t support the existence of this effect. Some of the data even suggest the possibility that instead of a deterrent effect, capital punishment has a brutalization effect (because it sends out the normative message that violent retaliation is the normal response to ill-treatment and that the sanctity of life is a naive moral ideal).

The following quote nicely summarizes the difficulty of proving the deterrent effect:

I would like to know how a statistical study, no matter how sophisticated, can possibly tell us the subjective motives for acts that were never taken and, moreover, how it can do so with the specificity of telling us approximately how many people did not do what they otherwise would have done under different circumstances. Where are these people? And, more importantly, how would we recognize one if we happened across him or her? (source)

Of course, people who want to disprove the deterrent effect also face this difficulty, but I assume we can agree that the burden of proof is on those who want to use the effect as an argument in favor of capital punishment. And that turns out to be a very heavy burden in this case.

Anyway, even if deterrence could be proven and even if we could establish with some certainty that every execution saves n lives – as some have argued, oblivious of the difficulties pointed out in the quote above – then we would still have good reasons to reject capital punishment.

More here.

Measuring Human Rights (27): Measuring Crime

A number of crimes are also human rights violations, so crime rates can tell us something about the degree of respect for human rights. Unfortunately, as in most cases of rights measurement, crime measurement is difficult. I won’t discuss the usual difficulties here – underreporting by victims or relatives, lack of evidence, corrupt or inefficient police departments etc. Instead, I want to mention one particularly interesting problem that is seldom mentioned but possibly fatal for crime rate statistics: most reductions in crime rates are not really reductions, especially not those reductions that come about as a result of tougher law enforcement and higher incarceration rates. When we imprison criminals, rather than bringing crimes rates down, we just move the crime from society towards the prisons:

the figures that suggest that violence has been disappearing in the United States contain a blind spot so large that to cite them uncritically, as the major papers do, is to collude in an epic con. Uncounted in the official tallies are the hundreds of thousands of crimes that take place in the country’s prison system, a vast and growing residential network whose forsaken tenants increasingly bear the brunt of America’s propensity for anger and violence.

Crime has not fallen in the United States—it’s been shifted. Just as Wall Street connived with regulators to transfer financial risk from spendthrift banks to careless home buyers, so have federal, state, and local legislatures succeeded in rerouting criminal risk away from urban centers and concentrating it in a proliferating web of hyperhells. (source, source)

And there’s no way to correct for this and adjust overall crime rate statistics because quality statistics on crime rates inside prison are even harder to get than statistics on “normal” crime rates – given the quasi lawlessness of prison life.

More on prison violence here and here.

Measuring Human Rights (26): Measuring Murder

Murder should be easy to measure. Unlike many other crimes or rights violations, the evidence is clear and painstakingly recorded: there is a body, at least in most cases; police seldom fail to notice a murder; and relatives or friends of the victim rarely fail to report the crime. So even if we are not always able to find and punish murderers, we should at least know how many murders there are.

And yet, even this most obvious of crimes can be hard to measure. In poorer countries, police departments may not have the means necessary to record homicides correctly and completely. Families may be weary of reporting homicides for fear of corrupt police officers entering their homes and using the occasion to extort bribes. Civil wars make it difficult to collect any data, including crime data. During wartime, homicides may not be distinguishable from casualties of the war.

And there’s more. Police departments in violent places may be under pressure to bring down crime stats and may manipulate the data as a result: moving some dubious murder cases to categories such as “accidents”, “manslaughter”, “suicide” etc.

Homicides usually take place in cities, hence the temptation to rank cities according to homicide rates. But cities differ in the way they determine their borders: suburbs may be included or not, or partially, and this affects homicide rates since suburbs tend to be less violent. Some cities have more visitors than other cities (more commuters, tourists, business trips) and visitors are usually not counted as “population” while they may also be at risk of murder.

In addition, some ideologies may cause distortions in the data. Does abortion count as murder? Honor killings? Euthanasia and  assisted suicide? Laws and opinions about all this vary between jurisdictions and introduce biases in country comparisons.

And, finally, countries with lower murder rates may not be less violent; they may just have better emergency healthcare systems allowing them to save potential murder victims.

So, if even the most obvious of human rights violations is difficult to measure, you can guess the quality of other indicators.

Measuring Poverty (14): Measuring Income Inequality

Income inequality may or may not be the best definition of poverty,  but it’s certainly one that is often used. In many European countries, you’re counted as poor when your income is below 50% or so of the median income. Maybe this is the wrong way to measure poverty, but if you use absolute measures for poverty (such as a basic income, minimum consumption etc.) you’ll also face some problems. So it’s worthwhile to examine some of the usual methods for measuring income inequality and see how they hold up, while at the same time bracketing the discussion about poverty as either absolute deprivation or unequal distribution.

Methods for measuring income inequality

The Gini coefficient is the most widely used. It’s based on the proportion of the total income of a population that is cumulatively earned by a % of the population; a value of 0 expresses perfect equality where everyone has equal shares of income and a value of 1 expresses maximal inequality where only one person has all the income. A low Gini coefficient indicates therefore a more equal distribution. (The complete formula is here).

A disadvantage of the Gini measure is that it doesn’t capture where in the distribution the inequality occurs: is a society unequal because the top 1% has astronomically high incomes, because the poor are very poor, because there is practically no middle class, or because of some other reason?

Other measures are

  • the ratio of the incomes of the top 10% (best paid) to the bottom 10% (worst paid)
  • the proportion of a population with income less than 50% of the median income
  • a population may be split into segments, e.g. quintiles or deciles, and each segment’s income share is then compared to each other segment’s (for example, the top 10% of the population – “top” in income terms – has x % of total income)
  • some other measures are here.

These different measures can give contradictory numbers: two societies with the same Gini score can have different ratios of top-bottom, top-middle or middle-bottom incomes (see an example here). Hence, no single measure will tell us the last word about inequality in a society.

What is income?

The focus of all these measurement systems is income, but we should first decide what to count as income. Income doesn’t have to be cash or currency. A farmer in a poor country who grows his own products has non-cash income. Perhaps public services such as healthcare or education should count as income. And how about tax reductions, tax refunds, government benefits such as unemployment insurance, food stamps and various vouchers?

All those forms of non-cash or non-labor income are important when measuring income inequality because the poor profit disproportionately from those non-cash or non-labor related forms of income. Hence, including them in total income can make a large difference in income inequality numbers. (Higher income groups may have less or different tax refunds and their education may represent a smaller portion of their total income – the returns of their education may of course be higher, but those returns are typically cash based in the sense that they lead to higher labor compensation).

We should also decide if we want to use income before or after taxation; depends if we want to measure the effectiveness of redistribution or simply gross inequalities. And what about capital gains, imputed house rents from home ownership, inheritance etc. In general, how should wealth be included in income? Or shouldn’t it be?

How do we measure income?

Once we’ve solved the difficult problem of defining income, we’re still left with the practical problem of measuring it. Most cash income is captured in tax return data, but not all, and not equally well in all countries. Sometimes, you’ll need to use consumption data as a proxy for income data, or surveys about living standards. “Informal” income typically does not show up in tax data, but does in consumption data.

Another problem with measures of inequality is that they may be contaminated by notions of fairness. Some deliberately design their measurement system in such as way that inequalities look bigger than they actually are. For example, they use pre-tax inequalities because those are often larger than post-tax inequalities – a lot of tax systems are redistributive towards the poor (e.g. progressive taxation systems). Or they focus on income inequality when consumption inequality may have diminished. Others may mistakenly deduce evaluations of fairness or injustice from the simply fact of income distributions and forget that measures of income inequality are silent about who is on which side of the divide. If person A in a two person economy has twice the income of person B, then the measurement of inequality would be absolutely the same when B switches places with A. Measures of income inequality say nothing about who deserves what, about how income has been acquired, about whether some occupations should yield higher compensation (for example because we want the right incentives), or about how income should ideally be distributed.

And then there is the opposite mistake: assuming that income inequality is always necessary and just because it’s the automatic result of the fact that people have different levels of human capital and productive abilities. This is a mistake because it ignores a number of facts: no one has ever been able to prove that some abilities or occupations deserve higher wages from a moral point of view, and a lot of inequality is the result not of different abilities or efforts but of differences in luck and connections. Hence, fairness remains a legitimate concern. Contrary to the “left-wing mistake”, the “right-wing mistake” will not distort the measurement of inequality: if you believe inequality is not a problem you hardly have a reason for measuring it, let alone distort the measurement.

What I want to stress is how difficult it is to measure income inequality and how many mistakes we can make. This doesn’t mean that the numbers are rubbish. We should just be careful when drawing sweeping conclusions, that’s all.

Something more about the causes of income inequality, rather than the measurement of it, is here.

Measuring Democracy (8): A Multidimensional Measurement

Any attempt to measure the degree of democracy in a country should take into account the fact that democracy is something multidimensional. It won’t suffice to measure elections, not even the different aspects of elections such as frequency, participation, fairness, transparency etc. It takes more than fair and inclusive elections to have a democracy. Of course, the theoretical ideal of democracy is a controversial notion, so we won’t be able to agree on all the necessary dimensions or elements of a true democracy. Still, you can’t escape this problem if you want to build a measurement system: measuring something means deciding which parts of it are worth measuring.

You would also do best to take a maximalist approach: leaving out too many characteristics would allow many or even all countries to qualify as fully democratic and would make it impossible to differentiate between the different levels or the different quality of democracy across countries. A measurement system is useful precisely because it offers distinctions and detailed rankings and because it makes it possible to determine the distance to an ideal, whatever the nature of the ideal. Obviously, a maximalist approach is by definition more controversial than a minimal one. Everyone agrees that you can’t have a democracy without elections (or, better, without voting more generally). Whether strong free speech rights and an independent judiciary are necessary is less clear. And the same is true for other potential attributes of democracy.

Once you’ve determined what you believe are necessary attributes you can start to measure the extent at which they are present in different countries. Hence, your measurement will look like a set of sliding scales. With all the markers on the right side in the case of a non-existing ideal democracy, and all the markers on the left side in the unfortunately very real case of total absence of democracy.

(The aggregation of these scales into a total country score is another matter that I’ve discussed elsewhere).

Some candidates of attributes are:

  • Does a country include more or less people in the right to have a democratic say? How high is the voting age? Are criminals excluded from the vote, even after they have served their sentence? Are immigrants without citizenship excluded? Are there conditions attached to the right to vote (such as property, education, gender etc.)?
  • Does a country include more or less topics in the right to a democratic say? Are voters not allowed to have a say about the affairs of the military, or about policies that have an impact on the rights of minorities? Does the judiciary have a right to judicial review of democratically approved laws?
  • Does a country include more or less positions in the right to a democratic say? Can voters elect the president, judges, prosecutors, mayors, etc., or only parliamentarians? Can they elect local office holders? Does a country have a federalist structure with important powers at the local or state level?
  • Does a country impose qualified majorities for certain topics or positions? Do voters have to approve certain measures with a two-thirds supermajority?
  • Does a country provide more or less ways to express a democratic say? Can voters only elect officials or can they also vote on issues in referenda?
  • Does a country impose more or less restrictions on the formation of a democratic say? Are free speech rights and assembly and association rights respected?
  • Does a country accept more or less imbalances of power in the formation of a democratic say? Are there campaign financing rules?
  • Does a country show more or less respect for the expression of a democratic say? How much corruption is there? Is the judiciary independent?

A “more” score on any of these attributes will push up the total “democracy score” for a country. At least it seems so, if not for the conclusion that all these complications in the measurement system are still not enough. We need to go further and add additional dimensions. For example, one can argue that we shouldn’t define democracy solely on the basis of the right to a democratic say, not even if we render this right as complex as we did above. A democracy should, ideally, also be a stable form of government, and allowing people to decide about the fundamental rights of minorities is an expression of the right to a democratic say but it is not in the long term interest of democracy. Those minorities will ultimately rebel against this tyranny of the majority and cause havoc for everyone.

More posts in this series are here.

Measuring Human Rights (24): Measuring Racism, Ctd.

Measuring racism is a problem, as I’ve argued before. Asking people if they’re racist won’t work because they don’t answer this question correctly, and understandably so. This is due to the social desirability bias. Surveys may minimize this bias if they approach the subject indirectly. For example, rather than simply asking people if they are racist or if they believe blacks are inferior, surveys could ask some of the following questions:

  • Do you believe God has created the races separately?
  • What do you believe are the reasons for higher incarceration rates/lower IQ scores/… among blacks?
  • Etc.

Still, no guarantee that bias won’t falsify the results. Maybe it’s better to dump the survey method altogether and go for something even more indirect. For example, you can measure

  • racism in employment decisions, such as numbers of callbacks received by applicants with black sounding names
  • racism in criminal justice, for example the degree to which black federal lower-court judges are overturned more often than cases authored by similar white judges, or differences in crime rates by race of the perpetrator, or jury behavior
  • racial profiling
  • residential racial segregation
  • racist consumer behavior, e.g. reluctance to buy something from a black seller
  • the numbers of interracial marriages
  • the numbers and membership of hate groups
  • the number of hate crimes
  • etc.

A disadvantage of many of these indirect measurements is that they don’t necessarily reflect the beliefs of the whole population. You can’t just extrapolate the rates you find in these measurements. It’s not because some judges and police officers are racist that the same rate of the total population is racist. Not all people who live in predominantly white neighborhoods do so because they don’t want to live in mixed neighborhoods. Different crime rates by race can be an indicator of racist law enforcement, but can also hide other causes, such as different poverty rates by race (which can themselves be indicators of racism). Higher numbers of hate crimes or hate groups may represent a radicalization of an increasingly small minority. And so on.

Another alternative measurement system is the Implicit Association Test. This is a psychological test that measures implicit attitudes and beliefs that people are either unwilling or unable to report.

Because the IAT requires that users make a series of rapid judgments, researchers believe that IAT scores may reflect attitudes which people are unwilling to reveal publicly. (source)

Participants in an IAT are asked to rapidly decide which words are associated. For example, is “female” or “male” associated with “family” and “career” respectively? This way, you can measure the strength of association between mental constructs such as “female” or “male” on the one hand and attributes such as “family” or “career” on the other. And this allows you to detect prejudice. The same is true for racism. You can read here or here how an IAT is usually performed.

Yet another measurement system uses evidence from Google search data, such as in this example. The advantage of this system is that it avoids the social desirability bias, since Google searches are done alone and online and without prior knowledge of the fact that the search results will be used to measure racism. Hence, people searching on Google are more likely to express social taboos. In this respect, the measurement system is similar to the IAT. Another advantage of the Google method, compared to traditional surveys, is that the Google sample is very large and more or less evenly distributed across all areas of a country. This allows for some fine grained geographical breakdown of racial animus.

More specifically, the purpose of the Google method is to analyze trends in searches that include words like “nigger” or “niggers” (not “nigga” because that’s slang in some Black communities, and not necessarily a disparaging term). In order to avoid searches for the term “nigger” by people who may not be racially motivated – such as researchers (Google can’t tell the difference) – you could refine the method and analyze only searches for phrases like “why are niggers lazy”, “Obama+nigger”, “niggers/blacks+apes” etc. If you find that those searches are more common in some locations than others, or that they become more common in some locations, then you can try to correlate those findings with other, existing indicators of racism such as those cited above, or with historic indicators such as prevalence of slavery or lynchings.

More posts in this series are here.

Measuring Democracy (7): Some Technical Difficulties

Suppose you want to construct a democracy index measuring the level or lack of democracy in different countries in the world. The normal thing to do is to select some supposedly essential characteristics or attributes of democracy and try to measure the level or presence of those. So, for example, you may select free speech, elections, judicial independence and a number of other characteristics. Some of those are perhaps already measured and you can simply take those measurements. For others, you may have to set up your own measurement (e.g. a survey, analysis of newspapers or official documents etc.), or use a proxy.

In any case, you’ll end up with different datasets on different attributes of democracy, and you’ll have to bring those datasets together somehow in order to make your overall index, you single country-level democracy score. The problem is that the datasets contain different kinds of scales which cannot as such be aggregated into a global index. The scales and the values in the scales have to be normalized, i.e. translated into a common metric.

normalized value = raw value/maximum raw value

First, however, you have to rescale some existing scales so that they start at 0 – in other words, so that the lowest score is 0 (instead of starting at 1 for example, or at -10 such as the Polity IV scale). This way, all scales will have a normalized range from 0 to 1; 0 being the negation or total absence of the attribute; 1 being the complete and perfect protection or presence of the attribute.

What about weighting the different attributes? Some may be more important for a democracy than others. However, introducing weights in this way inevitably means introducing value judgments. While value judgments can’t be avoided (they’ll pop up at the moment of the selection of the attributes as well, for example), they can be minimized. If you choose not to use weighting, you consider all attributes to be equally important, which is a view that can be defended given the often interdependent nature of the attributes of democracy (an independent judiciary for example will likely not survive without a free press).

Once the different data sources are translated into normalized scales and, if necessary, weighted appropriately, they have to be aggregated in order to calculate the global index of quality of democracy. One possible aggregation rule would be this:

global index = source 1 * source 2 * ... * source n.

So a simple multiplication. But that would mean that a value of 0 for one attribute results in labeling the country as a whole as having 0 democratic quality. This is counter-intuitive, even with the assumption of equal importance of all attributes. Hence, a better aggregation rule is the geometric or arithmetic mean (or perhaps the median).

However, there’s also a problem with averages: low scores on one attribute can be compensated by high scores on another. So very different democracies can have the same score. Also, within one country, a high score on suffrage rights but 0 on actual participation would give a medium democracy score, whereas in reality we wouldn’t want to call this country democratic at all (the score should be 0 or close to 0). Perhaps we can’t avoid weights after all.

More posts in this series are here.

Lies, Damned Lies, and Statistics (37): When Surveyed, People Express Opinions They Don’t Hold

It’s been a while since the last post in this series, so here’s a recap of its purpose. This blog promotes the quantitative approach to human rights: we need to complement the traditional approaches – anecdotal, journalistic, legal, judicial etc. – with one that focuses on data, country rankings, international comparisons, catastrophe measurement, indexes etc.

Because this statistical approach is important, it’s also important to engage with measurement problems, and there are quite a few in the case of human rights. After all, you can’t measure respect for human rights like you can measure the weight or size of an object. There are huge obstacles to overcome in human rights measurement. On top of the measurement difficulties that are specific to the area of human rights, this area suffers from some of the general problems in statistics. Hence, there’s a blog series here about problems and abuse in statistics in general.

Take for example polling or surveying. A lot, but not all, information on human rights violations comes from surveys and opinion polls, and it’s therefore of the utmost importance to describe what can go wrong when designing, implementing and using surveys and polls. (Previous posts about problems in polling and surveying are here, here, here, here and here).

One interesting problem is the following:

Simply because the surveyor is asking the question, respondents believe that they should have an opinion about it. For example, researchers have shown that large minorities would respond to questions about obscure or even fictitious issues, such as providing opinions on countries that don’t exist. (source, source)

Of course, when people express opinions they don’t have, we risk drawing the wrong conclusions from surveys. We also risk that a future survey asking the same questions comes up with totally different results. Confusion guaranteed. After all, if we make up our opinions when someone asks us, and those aren’t really our opinions but rather unreflected reactions we give because of a sense of obligation, it’s unlikely that we will express the same opinion in the future.

Another reason for this effect is probably our reluctance to come across as ignorant: rather than selecting the “I don’t know/no opinion” answer, we just pick one of the other possible answers. Again a cause of distortions.

Measuring Human Rights (23): When “Worse” Doesn’t Necessarily Mean “Worse”, Ctd.

Just because nobody complains does not mean all parachutes are perfect. Benny Hill

A nice illustration of this piece of wisdom:

Using state-level variation in the timing of political reforms, we find that an increase in female representation in local government induces a large and significant rise in documented crimes against women in India. Our evidence suggests that this increase is good news, driven primarily by greater reporting rather than greater incidence of such crimes. (source)

The cited “increase in female representation in local government” resulted from a constitutional amendment requiring Indian states to have women in one-third of local government council positions.

Since then, documented crimes against women have risen by 44 percent, rapes per capita by 23 percent, and kidnapping of women by 13 percent. (source)

This uptick is probably not retaliatory – male “revenge” for female empowerment – but rather the result of the fact that more women in office has led to more crime reporting. Worse is therefore not worse. A timely reminder of the difficulties measuring human rights violations. Measurements often depend on reporting, and reporting can be influenced, for good and for bad. Also, a good lesson about the danger of taking figures at face value.

Similar cases are here and here. More posts in this series are here.

Measuring Human Rights (22): When Can You Call Something a “Famine”?

With yet another famine in the Horn of Africa, perhaps it’s a good time for a few words about famine measurement.

People have a right to adequate nourishment and to be free from chronic hunger (see article 25 of the Universal Declaration). Starvation is an extreme form of violation of this right (and is obviously also a violation of the right to life). So we obviously want to know the existence and extent of cases of starvation. There are individual cases of starvation – a elderly person who has lost her mobility and social network may starve abandoned in her flat – but most cases involve large scale famines. Let’s focus on the latter.

The problem is that death by famine or starvation is difficult to identify. People suffering from extreme malnutrition often don’t die of hunger but of diseases provoked by malnutrition, such as pneumonia or diarrhea. Since those are diseases that can have other causes besides malnutrition, it’s often difficult to count the number of people who have died from malnutrition. Their body weight may tell us something, but you can’t go about weighing corpses on a large scale.

Hence it’s difficult to determine whether or not a famine has occurred or is occurring. When does widespread suffering of hunger become a famine? Not every food crisis or widespread occurrence of malnutrition leads to famine-type starvation. A famine is obviously characterized by mortality caused by malnutrition. So we must look at mortality rates, but given the difficulty of establishing whether deaths are caused by malnutrition or other factors, how do we decide that a certain mortality rate is caused by malnutrition and is therefore the symptom of a famine? It’s difficult.

And yet, it’s common to find newspaper reports about “an outbreak of famine” is this or other part of the world. Ideally, we only want to declare a famine when a famine is actually occurring or about to occur. False alarms are not only silly but they create indifference. Fortunately, people seem to have overcome some of the difficulties and have agreed on a non-arbitrary way to determine that there is a famine going on:

  • when overall mortality rates in a region are extremely high, or high compared to the baseline – which may itself be high already, perhaps because of a war (a mortality rate of at least two people per 10,000 per day is usually considered part of the evidence of famine conditions)
  • when this is combined with survey indicators about low food availability and malnutrition (a rate of malnutrition – ratio of weight to height – among children age six months to five years above an average of 30% is the usual measure here)
  • when there is anecdotal evidence (perhaps also from surveys)
  • and when there are proxy measures such as below average rainfall

then you can build a useful measurement and a more or less scientific way of ascertaining that a food crisis has passed the famine threshold.

None of this should be understood as implying that food crises which don’t reach the famine threshold are unimportant and don’t deserve attention or assistance. It only means that it’s a good thing to distinguish real famines from lesser crises and to avoid crying wolf.

One problem with the measurement system presented above is that it’s no help in preventing a famine. It’s difficult to turn it into a probability index rather than a threshold index. It tells you when a famine has occurred or is ongoing, not when there’s a risk of famine. When mortality rates are high, you’re already late, perhaps too late.

Measuring Human Rights (20): What is More Important, the Number or Percentage of People Suffering Human Rights Violations?

Take just one human right, the right not to suffer poverty: if we want to measure progress for this human right, we get something like the following fact:

[N]ever in the world have there been so many paupers as in the present times. But the reason of this is that there have never been so many people around. Indeed never in the history of the world has been the percentage of poor people been so low. (source)

So, is this good news or bad news? If it’s more important to reduce the share of the world population suffering a particular type of rights violation, then this is good news. On the other hand, there are now more people – in absolute, not in relative numbers – suffering from poverty. If we take individuals and the distinctions between persons seriously, we should conclude that this is bad news and we’re doing worse than before.

Thomas Pogge has argued for the latter view. Take another example: killing a given number of people doesn’t become less troubling if the world’s population increases. If we would discover that the real number of the world’s population at the time of the Holocaust was twice as large as previously assumed, that wouldn’t diminish the importance of the Holocaust. What matters is the absolute number of people suffering.

On the other hand, if we see that policies and interventions lead to a significant lowering of the proportion of people in poverty – or suffering from any other type of rights violation – between times t and t+n, then we would welcome that, and we would certainly want to know it. The fact that the denominator – total world population – has increased in the mean time, is probably something that has happened independently of those policies. In the specific case of poverty, a growing population can even make a decrease in relative numbers of people suffering from poverty all the more admirable. After all, many still believe (erroneously) in the Malthusian trap theory, which states that population growth necessarily leads to increases in poverty in absolute numbers.

More posts in this series are here.

Measuring Human Rights (18): Guerrilla Polling in Dictatorships

Measuring respect for human rights is most important in societies where respect is a rare commodity. The problem is that it’s not only most important in such societies, but also most difficult. You need a certain level of freedom to measure respect for human rights. And regimes that violate rights also have the means to cover up those violations. I’ve called that the catch 22 of rights measurement. One problem is public opinion: a lot of human rights measurement depends on public opinion polls, but such polls are notoriously unreliable in repressive regimes, for obvious reasons: the public in those countries is either misinformed, indoctrinated or afraid to speak out, or all of the above.

Hence, good quality human rights measurement requires some creative polling. Political scientists Angela Hawken and Matt Leighty have come up with a new strategy, called guerrilla polling. Here’s an example:

Kim Eun Ho is a former police officer from North Korea who defected to the South in 2008. … With the aid of a friend and a smuggled cell phone, he is circumventing North Korea’s leadership to solicit opinions from its citizens.

Kim conducts a nightly public-opinion poll of North Korean residents, the first poll of its kind and illegal in North Korea. Here’s how it works: Kim calls his friend in North Korea on a smuggled cell phone. The friend then uses a North Korean land line to call a subject and presses the cell phone against the handset of the landline phone, allowing Kim to conduct a brief interview.

If the interviewee were discovered by the police, they would almost certainly be punished — perhaps severely. To circumvent the North Korean police, Kim has tailored his questions so that they take about 90 seconds to answer. He tapped phones himself as a North Korean police officer, and he estimates that it takes about two to three minutes for the police to trace a call. (source)

More posts about human rights measurement are here.

Measuring Human Rights (17): Human Rights and Progress

We’re all aware of the horrors of recent history. The 20th century doesn’t get a good press. And yet, most of us still think that humanity is, on average, much better off today  than it was some centuries or millennia ago. The holocaust, Rwanda, Hiroshima, AIDS, terrorism etc. don’t seem to have discouraged the idea of human progress in popular imagination. Those have been disasters of biblical proportions, and yet they are seen as temporary lapses, regrettable but exceptional incidents that did not jeopardize the overall positive evolution of mankind. Some go even further and call these events instances of “progressive violence”: disasters so awful that they bring about progress. Hitler was necessary in order to finally make Germany democratic. The Holocaust was necessary to give the Jews their homeland and the world the Universal Declaration. Evil has to become so extreme that it finally convinces humanity that evil should be abolished.

While that is obviously ludicrous, it’s true that there has been progress:

  • we did practically abolish slavery
  • torture seems to be much less common and much more widely condemned, despite the recent uptick
  • poverty is on the retreat
  • equality has come within reach for non-whites, women and minorities of different kinds
  • there’s a real reduction in violence over the centuries
  • war is much less common and much less bloody
  • more and more countries are democracies and freedom is much more widespread
  • there’s more free speech because censorship is much more difficult now thanks to the internet
  • health and labor conditions have improved for large segments of humanity, resulting in booming life expectancy
  • etc.

So, for a number of human rights, things seem to be progressing quite a lot. Of course, there are some areas of regress: the war on terror, gendercide, islamism etc. Still, those things don’t seem to be weighty enough to discourage the idea of progress, which is still quite popular. On the other hand, some human rights violations were caused by elements of human progress. The Holocaust, for example, would have been unimaginable outside of our modern industrial society. Hiroshima and Mutually Assured Destruction are other examples. Both nazism and communism are “progressive” philosophies in the sense that they believe that they are working for a better society.

Whatever the philosophical merits of the general idea of progress, progress in the field of respect for human rights boils down to a problem of measurement. How doe we measure the level of respect for the whole of the set of human rights? It’s difficult enough to measure respect for the present time, let alone for previous periods in human history for which data are incomplete or even totally absent. Hence, general talk about progress in the field of human rights is probably impossible. More specific measurements of parts of the system of human rights are more likely to succeed, but only for relatively recent time frames.

Measuring Democracy (6): Three Waves of Democratization According to Polity IV and Google Ngrams

Following Samuel Huntington, many political scientists believe that there have been three waves of democratization in recent history. The first wave of democracy began in the early 19th century when suffrage was gradually extended to disenfranchised groups of citizens. At its peak, however, there were only about 20 democracies in the world during this first wave. After WWI, with the rise of fascism and communism, the wave started to ebb, and this ebb lasted until the end of WWII. The second wave began following the Allied victory in World War II. This wave culminated in the 1960s with around 30 democracies in the world. The third wave started in the 1970s and really took off in the late 1980s, with the democratization of Latin America and the fall of the Berlin Wall. Today there are some 60 democracies in the world.

Maybe recent vents in the Maghreb and the Middle East are the start of a fourth wave, now focused on Arab countries.

Those numbers I cited above come from one of the two major democracy indexes, namely Polity IV. Polity IV gives countries a score ranging from -10 to +10; the numbers above are of countries achieving the rather ambitious score of +8 or higher (in other interpretations of the Polity IV score, +6 is already a democracy). Freedom House, the other index, usually gives a higher number of democracies, but is only available for the most recent decades. I don’t want to discuss the relative merits of either measurement system in the current post. Let’s just assume, arguendo, that Polity IV is a good measure (Freedom House probably measures something a bit different). In the graph below, the green line represents the Polity IV score (number of countries with a score of +8 or more):

vv

The three waves are clearly visible in the green line. Although some have expressed doubts about the quality of Huntington’s work and the reality of the three waves (see here for instance), there does seem to be at least some truth in the metaphor.

I’ve also included in the graph above the results of a search in Google’s Ngram tool. I searched for “democracy” (blue line) and “democratic” (red line) (democratic without a capital D because I don’t want results including mentions of the Democratic Party). As you may know, this tool allows you to calculate the frequency of keywords in the millions of books available in Google’s book collection. Such frequencies can be thought of as approximations of the general use and popularity of a word at a certain time. One can assume that when there’s a wave of democratization there’s also an uptick in the frequency of the use of word such as “democracy”.

I find it interesting that both the first and the third wave of democratization are reflected in a rising popularity of the words “democracy” and “democratic”, but not the second wave. When the number of democracies was at its lowest point in the 30s and 40s, talk about democracy was most common, more common even than today. And the interest in democracy decreased steadily from the 50s until the 80s, while the number of democracies rose during those decades.

More posts in this series here.

Measuring Human Rights (15): Measuring Segregation Using the Dissimilarity Index

If people tend to live, work, eat or go to school together with other members of their group – race, gender etc. – then we shouldn’t automatically assume that this is caused by discrimination, forced separation, restrictions on movement or choice of residence, or other kinds of human rights violations. It can be their free choice. However, if it’s not, then we usually call it segregation and we believe it’s a moral wrong that should be corrected. People have a right to live where they want, go to school where they want, and move freely about (with some restrictions necessary to protect the property rights and the freedom of association of others). If they are prohibited from doing so, either by law (e.g. Jim Crow) or by social pressure (e.g. discrimination by landlords or employers), then government policy and legislation should step in in order to better protect people’s rights. Forced desegregation is then an option, and this can take various forms, such as anti-discrimination legislation in employment and rent, forced integration of schools, busing, zoning laws, subsidized housing etc.

There’s also some room for intervention when segregation is not the result of conscious, unconscious, legal or social discrimination. For example, poor people tend to be segregated in poor districts, not because other people make it impossible for them to live elsewhere but because their poverty condemns them to certain residential areas. The same is true for schooling. In order to avoid poverty traps or membership poverty, it’s better to do something about that as well.

In all such cases, the solution should not necessarily be found in physical desegregation, i.e. forcibly moving people about. Perhaps the underlying causes of segregation, rather than segregation itself, should be tackled. For example, rather than moving poor children to better schools or poor families to better, subsidized housing, perhaps we should focus on their poverty directly.

However, before deciding what to do about segregation, we have to know its extent. Is it a big problem, or a minor one? How does it evolve? Is it getting better? How segregated are residential areas, schools, workplaces etc.? And to what extent is this segregation involuntary? The latter question is a hard one, but the others can be answered. There are several methods for measuring different kinds of segregation. The most popular measure of residential segregation is undoubtedly the so-called index of dissimilarity. If you have a city, for example, that is divide into N districts (or sections, census tracts or whatever), the dissimilarity index measures the percentage of a group’s population that would have to change districts for each district to have the same percentage of that group as the whole city.

The dissimilarity index is not perfect, mainly because it depends on the sometimes arbitrary way in which cities are divided into districts or sections. Which means that modifying city partitions can influence levels of “segregation”, which is not something we want. Take this extreme example. You can show the same city twice, with two different partitions, A and B situation. No one has moved residency between situations A and B, but the district boundaries have been altered radically. In situation A with the districts drawn in a certain way, there is no segregation (dissimilarity index of 0). But in situation B, with the districts drawn differently, there is complete segregation (index = 1), although no one has physically moved. That’s why other, complementary measures are probably necessary for correct information about levels of segregation. Some of those measures are proposed here and here.

Measuring Human Rights (14): Numbers of Illegal Immigrants

Calculating a reliable number for a segment of the population that generally wants to hide from officials is very difficult, but it’s politically very important to know more or less how many illegal immigrants there are, and whether their number is increasing or decreasing. There’s a whole lot of populist rhetoric floating around, especially regarding jobs and crime, and passions are often inflamed. Knowing how many illegal immigrants there are – more or less – allows us to quantify the real effects on employment and crime, and to deflate some of the rhetoric.

Immigration is a human rights issue in several respects. Immigration is often a way for people to escape human rights violations (such as poverty or persecution). And upon arrival, immigrants – especially illegal immigrants – often face other human rights violations (invasion of privacy, searches, labor exploitation etc.). The native population may also fear – rightly or wrongly – that the presence of large groups of immigrants will lower their standard of living or threaten their physical security. Illegal immigrants especially are often accused of pulling down wages and labor conditions and of creating native unemployment. If we want to disprove such accusations, we need data on the numbers of immigrants.

So how do we count the number of illegal immigrants? Obviously there’s nothing in census data. The Census Bureau doesn’t ask people about their immigration status, in part because such questions may drive down overall response rates. Maybe in some cases the census data of other countries can help. Other countries may ask their residents how many family members have gone abroad to find a job.

Another possible source are the numbers of births included in hospital data. If you assume a certain number of births per resident, and compare that to the total number of births, you may be able to deduce the number of births among illegal immigrants (disparagingly called “anchor babies“), which in turn may give you an idea about the total number of illegal immigrants.

Fluctuations in the amounts of remittances – money sent back home by immigrants – may also indicate trends in illegal immigration, although remittances are of course sent by both legal and illegal immigrants. Furthermore, it’s not because remittances go down that immigrants leave. It might just be a temporary drop following an economic recession, and immigrants decide to sweat it out (possibly supported by reverse remittances for the time of the recession). Conversely, an increase in remittances may simply reflect technological improvements in international payment systems.

Perhaps a better indicator are the numbers of apprehensions by border-patrol units. However, fluctuations in these numbers may not be due to fluctuations in immigration. Better or worse performance by border-patrol officers or tighter border security may be the real reasons.

So, it’s really not easy to count illegal immigrants, and that means that all rhetoric about illegal immigration – both positive and negative – should be taken with a grain of salt.

More posts on this series are here.

Measuring Human Rights (13): When More Means Less and Vice Versa

Human rights violations can make it difficult to measure human rights violations, and can distort international comparisons of the levels of respect for human rights. Country A, which is generally open and accessible and on average respects basic rights such as speech, movement and press fairly well, may be more in the spotlight of human rights groups than country B which is borderline totalitarian. And not just more in the spotlight: attempts to quantify or measure respect for human rights may in fact yield a score that is worse for A than for B, or at least a score that isn’t much better for A than for B. The reason is of course the openness of A:

  • Human rights groups, researchers and statisticians can move and speak relatively freely in A.
  • The citizens of A aren’t scared shitless by their government and will speak to outsiders.
  • Country A may even have fostered a culture of public discourse, to some extent. Perhaps its citizens are also better educated and better able to analyze political conditions.
  • As Tocqueville has famously argued, the more a society liberates itself from inequalities, the harder it becomes to bear the remaining inequalities. Conversely, people in country B may not know better or may have adapted their ambitions to the rule of oppression. So, citizens of A may have better access to human rights groups to voice their complaints, aren’t afraid to do so, can do so because they are relatively well educated, and will do so because their circumstances seem more outrageous to them even if they really aren’t. Another reason to overestimate rights violations in A and underestimate them in B.
  • The government administration of A may also be more developed, which often means better data on living conditions. And better data allow for better human rights measurement. Data in country B may be secret or non-existent.

I called all this the catch 22 of human rights measurement: in order to measure whether countries respect human rights, you already need respect for human rights. Investigators or monitors must have some freedom to control, to engage in fact finding, to enter countries and move around, to investigate “in situ”, to denounce etc., and victims should have the freedom to speak out and to organize themselves in pressure groups. So we assume what we want to establish. (A side-effect of this is that authoritarian leaders may also be unaware of the extent of suffering among their citizens).

You can see the same problem in the common complaints that countries such as the U.S. and Israel get a raw deal from human rights groups:

[W]hy would the watchdogs neglect authoritarians? We asked both Human Rights Watch and Amnesty, and received similar replies. In some cases, staffers said, access to human rights victims in authoritarian countries was impossible, since the country’s borders were sealed or the repression was too harsh (think North Korea or Uzbekistan). In other instances, neglected countries were simply too small, poor, or unnewsworthy to inspire much media interest. With few journalists urgently demanding information about Niger, it made little sense to invest substantial reporting and advocacy resources there. … The watchdogs can and do seek to stimulate demand for information on the forgotten crises, but this is an expensive and high risk endeavor. (source)

So there may also be a problem with the supply and demand curve in media: human rights groups want to influence public opinion, but can only do so with the help of the media. If the media neglect certain countries or problems because they are deemed “unnewsworthy”, then human rights groups will not have an incentive to monitor those countries or problems. They know that what they will be able to tell will fall on deaf ears anyway. So better focus on the things and the countries which will be easier to channel through the media.

Both the catch 22 problem and the problems caused by media supply and demand can be empirically tested by comparing the intensity of attention given by human rights monitoring organizations to certain countries/problems to the intensity of human rights violations (the latter data are assumed to be available, which is a big assumption, but one could use very general measures such as these). It seems that both effects are present but not much:

[W]e subjected the 1986-2000 Amnesty [International] data to a barrage of statistical tests. (Since Human Rights Watch’s early archival procedures seemed spotty, we did not include their data in our models.) Amnesty’s coverage, we found, was driven by multiple factors, but contrary to the dark rumors swirling through the blogosphere, we discovered no master variable at work. Most importantly, we found that the level of actual violations mattered. Statistically speaking, Amnesty reported more heavily on countries with greater levels of abuse. Size also mattered, but not as expected. Although population didn’t impact reporting much, bigger economies did receive more coverage, either because they carried more weight in global politics and economic affairs, or because their abundant social infrastructure produced more accounts of abuse. Finally, we found that countries already covered by the media also received more Amnesty attention. (source)

More posts in this series are here.

Measuring Human Rights (12): Measuring Public Opinion on Torture

Measuring the number and gravity of cases of actual torture is extremely difficult, for apparent reasons. It takes place in secret, and the people subjected to torture are often in prison long afterwards, or don’t survive it. Either way, they can’t tell us.

That’s why people try to find other ways to measure torture. Asking the public when and under which circumstances they think torture is acceptable may give an approximation of the likelihood of torture, at least as long as we assume that in democratic countries governments will only engage in torture if there’s some level of public support for it. This approach won’t work in dictatorships, obviously, since public opinion in a dictatorship is often completely irrelevant.

However, measuring public opinion on torture has proven to be very difficult and misleading:

Many journalists and politicians believe that during the Bush administration, a majority of Americans supported torture if they were assured that it would prevent a terrorist attack. … But this view was a misperception … we show here that a majority of Americans were opposed to torture throughout the Bush presidency…even when respondents were asked about an imminent terrorist attack, even when enhanced interrogation techniques were not called torture, and even when Americans were assured that torture would work to get crucial information. Opposition to torture remained stable and consistent during the entire Bush presidency.

Gronke et al. attribute confusion of beliefs [among many journalists] to the so-called false consensus effect studied by cognitive psychologists, in which people tend to assume that others agree with them. For example: The 30% who say that torture can “sometimes” be justified believe that 62% of Americans do as well. (source)

Lies, Damned Lies, and Statistics (31): Common Problems in Opinion Polls

Opinion polls or surveys are very useful tools in human rights measurement. We can use them to measure public opinion on certain human rights violations, such as torture or gender discrimination. High levels of public approval of such rights violations may make them more common and more difficult to stop. And surveys can measure what governments don’t want to measure. Since we can’t trust oppressive governments to give accurate data on their own human rights record, surveys may fill in the blanks. Although even that won’t work if the government is so utterly totalitarian that it doesn’t allow private or international polling of its citizens, or if it has scared its citizens to such an extent that they won’t participate honestly in anonymous surveys.

But apart from physical access and respondent honesty in the most dictatorial regimes, polling in general is vulnerable to mistakes and fraud (fraud being a conscious mistake). Here’s an overview of the issues that can mess up public opinion surveys, inadvertently or not.

Wording effect

There’s the well-known problem of question wording, which I’ve discussed in detail before. Pollsters should avoid leading questions, questions that are put in such a way that they pressure people to give a certain answer, questions that are confusing or easily misinterpreted, wordy questions, questions using jargon, abbreviations or difficult terms, double or triple questions etc. Also quite common are “silly questions”, questions that don’t have meaningful or clear answers: for example “is the catholic church a force for good in the world?” What on earth can you answer to that? Depends on what elements of the church you’re talking about, what circumstances, country or even historical period you’re asking about. The answer is most likely “yes and no”, and hence useless.

The importance of wording is illustrated by the often substantial effects of small modifications in survey questions. Even the replacement of a single word by another, related word, can radically change survey results.

Of course, one often claims that biased poll questions corrupt the average survey responses, but that the overall results of the survey can still be used to learn about time trends and difference between groups. As long as you make a mistake consistently, you may still find something useful. That’s true, but no reason not to take care of wording. The same trends and differences can be seen in survey results that have been produced with correctly worded questions.

Order effect or contamination effect

Answers to questions depend on the order they’re asked in, and especially on the questions that preceded. Here’s an example:

Fox News yesterday came out with a poll that suggested that just 33 percent of registered voters favor the Democrats’ health care reform package, versus 55 percent opposed. … The Fox News numbers on health care, however, have consistently been worse for Democrats than those shown by other pollsters. (source)

The problem is not the framing of the question. This was the question: “Based on what you know about the health care reform legislation being considered right now, do you favor or oppose the plan?” Nothing wrong with that.

So how can Fox News ask a seemingly unbiased question of a seemingly unbiased sample and come up with what seems to be a biased result? The answer may have to do with the questions Fox asks before the question on health care. … the health care questions weren’t asked separately. Instead, they were questions #27-35 of their larger, national poll. … And what were some of those questions? Here are a few: … Do you think President Obama apologizes too much to the rest of the world for past U.S. policies? Do you think the Obama administration is proposing more government spending than American taxpayers can afford, or not? Do you think the size of the national debt is so large it is hurting the future of the country? … These questions run the gamut slightly leading to full-frontal Republican talking points. … A respondent who hears these questions, particularly the series of questions on the national debt, is going to be primed to react somewhat unfavorably to the mention of another big Democratic spending program like health care. And evidently, an unusually high number of them do. … when you ask biased questions first, they are infectious, potentially poisoning everything that comes below. (source)

If you want to avoid this mistake – if we can call it that (since in this case it’s quite likely to have been a “conscious mistake” aka fraud) – randomizing the question order for each respondent might help.

Similar to the order effect is the effect created by follow-up questions. It’s well-known that follow-up questions of the type “but what if…” or “would you change your mind if …” change the answers to the initial questions.

Bradley effect

The Bradley effect is a theory proposed to explain observed discrepancies between voter opinion polls and election outcomes in some U.S. government elections where a white candidate and a non-white candidate run against each other.

Contrary to the wording and order effects, this isn’t an effect created – intentionally or not – by the pollster, but by the respondents. The theory proposes that some voters tend to tell pollsters that they are undecided or likely to vote for a black candidate, and yet, on election day, vote for the white opponent. It was named after Los Angeles Mayor Tom Bradley, an African-American who lost the 1982 California governor’s race despite being ahead in voter polls going into the elections.

The probable cause of this effect is the phenomenon of social desirability bias. Some white respondents may give a certain answer for fear that, by stating their true preference, they will open themselves to criticism of racial motivation. They may feel under pressure to provide a politically correct answer. The existence of the effect is, however, disputed. (Some say the election of Obama disproves the effect, thereby making another statistical mistake).

Fatigue effect

Another effect created by the respondents rather than the pollsters is the fatigue effect. As respondents grow increasingly tired over the course of long interviews, the accuracy of their responses could decrease. They may be able to find shortcuts to shorten the interview; they may figure out a pattern (for example that only positive or only negative answers trigger follow-up questions). Or they may just give up halfway, causing incompletion bias.

However, this effect isn’t entirely due to respondents. Survey design can be at fault as well: there may be repetitive questioning (sometimes deliberately for control purposes), the survey may be too long or longer than initially promised, or the pollster may want to make his life easier and group different polls into one (which is what seems to have happened in the Fox poll mentioned above, creating an order effect – but that’s the charitable view of course). Fatigue effect may also be caused by a pollster interviewing people who don’t care much about the topic.

Sampling effect

Ideally, the sample of people who are to be interviewed for a survey should represent a fully random subset of the entire population. That means that every person in the population should have an equal chance of being included in the sample. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. That reduces the randomness of the sample, which can be seen from the fact that self-selection leads to polarized results. The size of the sample is also important. Samples that are too small typically produce biased results.

Even the determination of the total population from which the sample is taken, can lead to biased results. And yes, that has to be determined… For example, do we include inmates, illegal immigrants etc. in the population? See here for some examples of the consequences of such choices.

House effect

A house effect occurs when there are systematic differences in the way that a particular pollster’s surveys tend to lean toward one or the other party’s candidates; Rasmussen is known for that.

I probably forgot an effect or two. Fill in the blanks if you care. Go here for other posts in this series.

Measuring Poverty (7): Different Types of Poverty

I already mentioned the obvious but consequential fact that poverty measurement depends on the choice of the type of poverty you want to measure. Definitional issues are always important, but when it comes to poverty the choice of a definition of poverty determines who will benefit from government benefits and who won’t. For example, in the U.S. you’re poor when you’re income is below a certain poverty line. If that’s the case, you’re eligible for certain benefits. So poverty is a function of income.

1. Insufficient income

Usually, and not only in the U.S., poverty is indeed understood as insufficient income (preferably post-tax and post-benefits). Measuring poverty in this case means

  1. determining a sufficient level of income (sufficient for a decent human life); this is usually called a “poverty line” or “poverty rate”
  2. measuring actual income
  3. counting the number of people who have less income than the sufficient level.

There are some problems with this measurement system or this choice of type of poverty. Actual income levels are notoriously difficult to measure. People have a lot of informal income which they will not disclose to people doing a survey. Likewise, there is tax evasion and income in kind (market based or from government benefits, e.g. social housing), and material or immaterial support by local social networks. None of this is included correctly if at all in income measurement, leading to an overestimate of poverty. Another disadvantage of income based measurements: they neglect people’s ability to borrow or to draw from savings in periods of lower income. Again, this overestimates poverty (although one could say that it just estimates it a bit too early, since borrowing and eating up savings can lead to future poverty).

2. Insufficient consumption

Because of these problems, some countries define poverty, not by income levels, but by consumption levels. Measuring poverty in this case means

  1. determining a sufficient level of consumption (sufficient for a decent human life)
  2. measuring actual consumption
  3. counting the number of people who consume less than the sufficient level.

However, this measurement isn’t without problems either. As is the case for income levels, actual consumption levels are difficult to measure. How much do people actually consume? And what does it mean “to consume”? Is it calorie intake? Is it financial expenses? Or something else perhaps? Consumption levels are also deceiving: people tend to smooth their consumption over time, even more so than their income. If they face a financial crisis because of unemployment, bad health, drought etc. they will sell some of their assets (their house for instance) or take a loan. If you determine whether someone is poor on the basis of consumption levels, you won’t consider people dealing with a crisis as being poor because they continue to consume at the same levels. However, because of loans or the sale of assets, they are likely to face poverty in the future. They may also shift their diet away to low quality food, taking in the same amount of calories but risking their health and hence their future income. Similarly, they may be forced by their crisis situation to delay health expenditures in order to smooth consumption, with the same long term results.

And even if you manage somehow to measure consumption, you’re still faced with the problem of the threshold of sufficient consumption: that’s hard to determine as well. Consumption needs differ from person to person, depending on age, gender, occupation, climate etc.

3. Direct physical measures of real consumption

Rather than trying to measure total income or consumption, you can choose to measure consumption of certain specific physical items, and combine that with some easy to measure elements of standard of living, such as child mortality or education levels. It’s possible to argue that poverty isn’t an insufficient level of overall income or consumption, but instead the absence of certain specific consumption articles. People are poor if they don’t have a bicycle or a car, a solid floor, a phone etc. Or when their children die, can’t go to school or are undernourished. These items or indicators are relatively easy to measure (for example, there’s the Demographic and Health Survey). While they may not tell us a lot about relative living standards in developed countries (where few children die from preventable diseases for instance), they do provide poverty indicators in developing countries.

The OECD has done a lot of good work on this. They call it “measuring material deprivation“. It’s the same assumption: there are certain consumer goods and certain elements of living standard that are universally considered important elements of a decent life. The OECD tries to measure ownership of these goods or occurrence of these elements, and when people report several types of deprivation at the same time, they are considered to be poor.

Take note that we’re not talking about monetary measures here, contrary to income and overall levels of consumption. Sometimes, all that has to be measured is a “yes” or a “no”. Which of course makes it easier.

Unfortunately, not easy enough. This type of poverty measurement has its own drawbacks. Measures of material deprivation often fail to distinguish between real deprivation and the results of personal choices and lifestyles. Some people can’t have a decent life without a car or a solid floor; others voluntarily choose not to have those goods. It’s likely that only the former are “poor”. Furthermore, since these measurements are often based on surveys, there are some survey related problems. The really poor may be systematically excluded from the survey because we can’t find them (e.g. the homeless). These surveys measure self-reported poverty, and self-reported poverty can be affected by low aspirations or habit. People may also be ashamed about their poverty and hence not report it correctly.

Conclusion

There isn’t a perfect system for poverty measurement. And that has a lot to do with the fact that poverty is an inherently vague concept. It really shouldn’t be a surprise that people choose different definitions and types, and hence different measurement methods that all provide different data. There’s no “correct” definition of poverty, and hence no correct poverty measure.

More posts in this series on the difficulties of poverty measurement.

Measuring Poverty (6): The Poverty Line in the U.S.

The poverty rate or poverty line in the U.S. is based on a system pioneered by Mollie Orshansky in 1963. In the 1960s, the average US family spend one third of its income on food. The poverty line was calculated by valuing an “emergency food” budget for a family, and then multiplying that number by 3. (Some more data here).

This results in a specific dollar amount that varies by family size but is the same across the U.S. (the amounts are adjusted for inflation annually). To determine who is poor, actual family income is then compared to these amounts. Obviously, if you’re under, you’re poor.

Amazingly, this system hasn’t changed a lot since the 1960s, yet it suffers from a series of measurement problems, resulting in either an over- or underestimation of the number of families living in poverty. The problems are situated both in the calculation of the poverty rates and in the calculation of the income that is subsequently compared to the rates:

  • Obviously, the system should take regional differences in the cost of living, especially in housing, into account. It doesn’t.
  • As already apparent from the image above, a family today spends relatively less on food and more on housing, health care and child care etc. yet the poverty line is still dollars for emergency food times 3. So the question is: should the system take today’s spending patterns into account? We would have to know which it is: 1) Either the increased spending on non-food items has occurred because people can now afford to spend more on such items. 2) Or the increased spending on non-food items has occurred because these items got disproportionately more expensive (housing for instance) or because there wasn’t really any need to buy those items in the old days. Only if 2) is the case should that have an influence on the poverty line. And I think that to some extent it is the case. Child care for instance has become a necessity. In the 1960s, many mothers didn’t go out and work. Now they do, and therefore they have to pay for child care. Those payments should be deducted from income when measuring disposable income and comparing it to the poverty line. The same is true for cars or phones. Today you can’t really have a job without them so they’re no longer luxuries. A society would show very little ambition if it continued to designate the poor as those who have to wash by hand, read with candlelight, and shit in a hole in the floor. In fact, what I’m advocating here is some kind of relative concept of poverty. I’ll come back to that later. All I can tell you now is that this isn’t without complications either.
  • The current poverty measurement doesn’t take into account disproportionate price rises (it merely adjusts for general inflation) and changing needs. An obvious improvement of the U.S. measurement system would be to adjust for exceptional price evolutions (such as for housing) and also to revisit the definitions of basic needs and luxuries. Hence, a better poverty measurement should subtract from income some work-related expenses, child care expenses, and perhaps also some health expenses to the extent that these have become disproportionately more expensive. But that’s not easy:

There is considerable disagreement on the best way to incorporate medical care in a measure of poverty, even though medical costs have great implications for poverty rates. But costs differ greatly depending upon personal health, preferences, and age, and family costs may be very different from year to year, making it hard to determine what exactly should be counted. Subtracting out-of-pocket costs from income is one imperfect approach, but if someone’s expenses are low because they are denied care, then they would usually be considered worse off, not better off. (source)

  • Another problem: the current poverty rate doesn’t take all welfare benefits into account. Income from cash welfare programs counts, but the value of non-cash benefits such as food stamps, school lunches and public housing doesn’t (because such benefits weren’t very common in the 1960s). Those benefits successfully raise the standard of living for poverty stricken individuals. There’s a bit of circular reasoning going on here, because the poverty rate is used, i.a., to decide who gets benefits, so benefits should not be included. But if you want to know how many people are actually poor, you should consider benefits as well because benefits lift many out of poverty.
  • The poverty measure doesn’t include some forms of interests on savings or property such as housing.
  • The poverty measure doesn’t take taxes into account, largely because they didn’t affect the poor very much in the 1960s. Income is counted before subtracting payroll, income, and other taxes, overstating income for some families. On the other hand, the federal Earned Income Tax Credit isn’t counted either, underestimating income for other families.
  • And there’s also a problem counting the effects of cohabitation and co-residency, overestimating poverty because overestimating expenses.

Because the poverty measurement disregards non-cash benefits and certain tax credits, it fails to serve its purpose. Poverty measurement is done in order to measure progress and to look at the effects of anti-poverty policies. Two of those policies – non-cash benefits and certain tax credits – aren’t counted, even though they reduce poverty. So we have a poverty statistic that can’t measure the impact of anti-poverty policy… That’s like measuring road safety without looking at the number of accidents avoided by government investment in safety. Since the 1970s, the U.S. government implemented a number of policies that increased spending for the poor, but the effects of this spending were invisible in the poverty statistics.

This had a perverse effect: certain politicians now found it easy to claim that spending on the poor was ineffective and a waste of money. It’s no coincidence that trickle down economics became so popular in the 1980s. The poverty measurement, rather than helping the government become more effective in its struggle against poverty, has led to policies that reduced benefits. Of course, I’m not saying that poverty reduction is just a matter of government benefits, or that benefits can’t have adverse effects. Read more here.

Fortunately, the US Census Bureau has taking these criticism to heart and has been working on an alternative measure that counts food stamps and other government support as income, while also accounting for child-care costs, geographic difference etc. First results show that the number of poor is higher according to the new measurement system (it adds about 3 million people). For some reason, I think the old system has still some life in it.

Some details of the new measurement:

when you account for the Earned Income Tax Credit the poverty rate goes down by two points. Accounting for SNAP (food stamps) lowers the poverty rate about 1.5 points. … when you account for the rise in Medical Out of Pocket costs, the poverty rate goes up by more than three points. (source)

More posts about problems with poverty measurement are here.