Measuring Human Rights (26): Measuring Murder

Murder should be easy to measure. Unlike many other crimes or rights violations, the evidence is clear and painstakingly recorded: there is a body, at least in most cases; police seldom fail to notice a murder; and relatives or friends of the victim rarely fail to report the crime. So even if we are not always able to find and punish murderers, we should at least know how many murders there are.

And yet, even this most obvious of crimes can be hard to measure. In poorer countries, police departments may not have the means necessary to record homicides correctly and completely. Families may be weary of reporting homicides for fear of corrupt police officers entering their homes and using the occasion to extort bribes. Civil wars make it difficult to collect any data, including crime data. During wartime, homicides may not be distinguishable from casualties of the war.

And there’s more. Police departments in violent places may be under pressure to bring down crime stats and may manipulate the data as a result: moving some dubious murder cases to categories such as “accidents”, “manslaughter”, “suicide” etc.

Homicides usually take place in cities, hence the temptation to rank cities according to homicide rates. But cities differ in the way they determine their borders: suburbs may be included or not, or partially, and this affects homicide rates since suburbs tend to be less violent. Some cities have more visitors than other cities (more commuters, tourists, business trips) and visitors are usually not counted as “population” while they may also be at risk of murder.

In addition, some ideologies may cause distortions in the data. Does abortion count as murder? Honor killings? Euthanasia and  assisted suicide? Laws and opinions about all this vary between jurisdictions and introduce biases in country comparisons.

And, finally, countries with lower murder rates may not be less violent; they may just have better emergency healthcare systems allowing them to save potential murder victims.

So, if even the most obvious of human rights violations is difficult to measure, you can guess the quality of other indicators.

Measuring Human Rights (25): Measuring Hunger

First, and for those in doubt: hunger is a human rights violations (see article 25 of the Universal Declaration). Second, before we discuss ways to measure this violation, we have to know what it is that we want to measure. It’s surprisingly difficult to define hunger.

Definition of hunger

The word “hunger” in this context does not refer to the subjective sensation that we have when lunch is late. We’re talking here about a chronic lack of food or a sudden and catastrophic lack of food (as in the case of a famine). We measure a lack of food by measuring dietary energy deficiency, which in turn is computed based on average daily calorie intake. The FAO estimates that the average minimum energy requirement per person is 1800 kcal per day. The global average per capita daily calorie intake is currently about 2800 kcal. This average obviously masks extreme differences between the obese and the chronically undernourished.

The FAO minimum energy requirement per person of 1800 kcal is also an average. The minimum calorie need depends on many things: age, climate, health, height, occupation etc.

Usually, the concept of “hunger” as it is defined here is different from “malnutrition“. Hunger is a lack of food defined as a lack of calorie intake. Malnutrition is a lack of quality food, of micronutrients such as vitamins and minerals, and of a divers diet. Hence, people may have access to sufficient quantities of food and still be malnourished.

Hunger and famine are also different concepts. Hunger is a chronic and creeping lack of food, while a famine results from the sudden collapse of food stocks. A famine implies widespread starvation during a limited period. It can’t go on forever because it must stop when everyone has died or when food supplies are restored. Chronic hunger on the other hand can go on forever because it doesn’t imply widespread starvation. Of course, people do die of chronic hunger, and on a global level hunger kills more people than famines do. But whereas in the case of famine people die of starvation, the victims of chronic hunger usually don’t starve to death. When we say that hunger kills someone every 3.6 seconds we usually mean that this person dies from an infectious disease brought on by hunger. Hunger increases people’s vulnerability to diseases which are otherwise nonfatal (e.g. diarrhea, pneumonia etc.). In fact, most hunger related deaths do not occur during famines. Chronic hunger is much more deadly – it’s just not as noticeable as a famine. When and where famines occur, they are more deadly and catastrophic. But they occur, thank God, only exceptionally. Hunger on the other hand is a permanent fixture of the lives of millions and ubiquitous in many countries.

Measurement of hunger

Given this definition, how do we go about and measure the extent of chronic hunger? (The measurement of famine is a separate problem, discussed here). There are different possible methods:

  • So-called food intake surveys (FIS) estimate dietary intake and try to relate this to energy needs determined by physical activity. Calorie intake below a minimum level means hunger. The problem here is that minimum calorie intake thresholds are somewhat arbitrary and do not always take people’s different calorie requirements into account. Even for a single individual, this threshold can vary over time (depending on the climate, the individual’s age, occupation and health etc.). Moreover, when trying to measure calorie intake, you’re faced with the problem of hunger due to imperfect absorption: it’s not because someone in a sample buys and consumes x number of calories that he or she actually absorbs those calories. The widespread incidence of diarrhea and other health problems often mean that only a fraction of calories eaten are absorbed by the body.
  • In order to bypass this, some propose a measurement method based on revealed preferences. The greater the share of calories people receive from the cheapest foods available to them, the hungrier they are; and, conversely, the more they buy expensive sources of calories, the less hungry they are. Their choice of foods reveals whether they have enough calories. This method therefore eliminates the threshold and absorption problems.

Our approach derives from the fact that when a person is below their nutrition threshold, there is a large utility penalty due to the physical discomfort associated with the body’s physiological and biochemical reaction to insufficient nutrition. At this stage, the marginal utility of calories is extremely high, so a utility-maximizing consumer will largely choose foods that are the cheapest available source of calories, typically a staple like cassava, rice or wheat. However, once they have passed subsistence, the marginal utility of calories declines significantly and they will begin to substitute towards foods that are more expensive sources of calories but that have higher levels of non-nutritional attributes such as taste. Thus, though any individual’s actual subsistence threshold is unobservable, their choice to switch away from the cheapest source of calories reveals that their marginal utility of calories is low and that they have surpassed subsistence. Accordingly, the percent of calories consumed from the staple food source, or the staple calorie share (SCS), can be used as an indicator for nutritional sufficiency. (source, source)

  • Still another method consists of measuring hunger’s physical effects on growth and thinness. Instead of measuring calorie intake, hunger or revealed preferences, you measure people’s length, their stunted growth and their body mass index. However, this is very approximative since length and weight may be determined by lots of factors, many of them unrelated to hunger.
  • And finally there are subjective approaches. The WFP does surveys asking people how often they ate in the last week and what they ate, how often they skip meals, how far they are away from markets, if their hunger is temporary or chronic etc. Gallup does something similar.

More on hunger here. And more posts in this series here.

Measuring Democracy (8): A Multidimensional Measurement

Any attempt to measure the degree of democracy in a country should take into account the fact that democracy is something multidimensional. It won’t suffice to measure elections, not even the different aspects of elections such as frequency, participation, fairness, transparency etc. It takes more than fair and inclusive elections to have a democracy. Of course, the theoretical ideal of democracy is a controversial notion, so we won’t be able to agree on all the necessary dimensions or elements of a true democracy. Still, you can’t escape this problem if you want to build a measurement system: measuring something means deciding which parts of it are worth measuring.

You would also do best to take a maximalist approach: leaving out too many characteristics would allow many or even all countries to qualify as fully democratic and would make it impossible to differentiate between the different levels or the different quality of democracy across countries. A measurement system is useful precisely because it offers distinctions and detailed rankings and because it makes it possible to determine the distance to an ideal, whatever the nature of the ideal. Obviously, a maximalist approach is by definition more controversial than a minimal one. Everyone agrees that you can’t have a democracy without elections (or, better, without voting more generally). Whether strong free speech rights and an independent judiciary are necessary is less clear. And the same is true for other potential attributes of democracy.

Once you’ve determined what you believe are necessary attributes you can start to measure the extent at which they are present in different countries. Hence, your measurement will look like a set of sliding scales. With all the markers on the right side in the case of a non-existing ideal democracy, and all the markers on the left side in the unfortunately very real case of total absence of democracy.

(The aggregation of these scales into a total country score is another matter that I’ve discussed elsewhere).

Some candidates of attributes are:

  • Does a country include more or less people in the right to have a democratic say? How high is the voting age? Are criminals excluded from the vote, even after they have served their sentence? Are immigrants without citizenship excluded? Are there conditions attached to the right to vote (such as property, education, gender etc.)?
  • Does a country include more or less topics in the right to a democratic say? Are voters not allowed to have a say about the affairs of the military, or about policies that have an impact on the rights of minorities? Does the judiciary have a right to judicial review of democratically approved laws?
  • Does a country include more or less positions in the right to a democratic say? Can voters elect the president, judges, prosecutors, mayors, etc., or only parliamentarians? Can they elect local office holders? Does a country have a federalist structure with important powers at the local or state level?
  • Does a country impose qualified majorities for certain topics or positions? Do voters have to approve certain measures with a two-thirds supermajority?
  • Does a country provide more or less ways to express a democratic say? Can voters only elect officials or can they also vote on issues in referenda?
  • Does a country impose more or less restrictions on the formation of a democratic say? Are free speech rights and assembly and association rights respected?
  • Does a country accept more or less imbalances of power in the formation of a democratic say? Are there campaign financing rules?
  • Does a country show more or less respect for the expression of a democratic say? How much corruption is there? Is the judiciary independent?

A “more” score on any of these attributes will push up the total “democracy score” for a country. At least it seems so, if not for the conclusion that all these complications in the measurement system are still not enough. We need to go further and add additional dimensions. For example, one can argue that we shouldn’t define democracy solely on the basis of the right to a democratic say, not even if we render this right as complex as we did above. A democracy should, ideally, also be a stable form of government, and allowing people to decide about the fundamental rights of minorities is an expression of the right to a democratic say but it is not in the long term interest of democracy. Those minorities will ultimately rebel against this tyranny of the majority and cause havoc for everyone.

More posts in this series are here.

Measuring Human Rights (24): Measuring Racism, Ctd.

Measuring racism is a problem, as I’ve argued before. Asking people if they’re racist won’t work because they don’t answer this question correctly, and understandably so. This is due to the social desirability bias. Surveys may minimize this bias if they approach the subject indirectly. For example, rather than simply asking people if they are racist or if they believe blacks are inferior, surveys could ask some of the following questions:

  • Do you believe God has created the races separately?
  • What do you believe are the reasons for higher incarceration rates/lower IQ scores/… among blacks?
  • Etc.

Still, no guarantee that bias won’t falsify the results. Maybe it’s better to dump the survey method altogether and go for something even more indirect. For example, you can measure

  • racism in employment decisions, such as numbers of callbacks received by applicants with black sounding names
  • racism in criminal justice, for example the degree to which black federal lower-court judges are overturned more often than cases authored by similar white judges, or differences in crime rates by race of the perpetrator, or jury behavior
  • racial profiling
  • residential racial segregation
  • racist consumer behavior, e.g. reluctance to buy something from a black seller
  • the numbers of interracial marriages
  • the numbers and membership of hate groups
  • the number of hate crimes
  • etc.

A disadvantage of many of these indirect measurements is that they don’t necessarily reflect the beliefs of the whole population. You can’t just extrapolate the rates you find in these measurements. It’s not because some judges and police officers are racist that the same rate of the total population is racist. Not all people who live in predominantly white neighborhoods do so because they don’t want to live in mixed neighborhoods. Different crime rates by race can be an indicator of racist law enforcement, but can also hide other causes, such as different poverty rates by race (which can themselves be indicators of racism). Higher numbers of hate crimes or hate groups may represent a radicalization of an increasingly small minority. And so on.

Another alternative measurement system is the Implicit Association Test. This is a psychological test that measures implicit attitudes and beliefs that people are either unwilling or unable to report.

Because the IAT requires that users make a series of rapid judgments, researchers believe that IAT scores may reflect attitudes which people are unwilling to reveal publicly. (source)

Participants in an IAT are asked to rapidly decide which words are associated. For example, is “female” or “male” associated with “family” and “career” respectively? This way, you can measure the strength of association between mental constructs such as “female” or “male” on the one hand and attributes such as “family” or “career” on the other. And this allows you to detect prejudice. The same is true for racism. You can read here or here how an IAT is usually performed.

Yet another measurement system uses evidence from Google search data, such as in this example. The advantage of this system is that it avoids the social desirability bias, since Google searches are done alone and online and without prior knowledge of the fact that the search results will be used to measure racism. Hence, people searching on Google are more likely to express social taboos. In this respect, the measurement system is similar to the IAT. Another advantage of the Google method, compared to traditional surveys, is that the Google sample is very large and more or less evenly distributed across all areas of a country. This allows for some fine grained geographical breakdown of racial animus.

More specifically, the purpose of the Google method is to analyze trends in searches that include words like “nigger” or “niggers” (not “nigga” because that’s slang in some Black communities, and not necessarily a disparaging term). In order to avoid searches for the term “nigger” by people who may not be racially motivated – such as researchers (Google can’t tell the difference) – you could refine the method and analyze only searches for phrases like “why are niggers lazy”, “Obama+nigger”, “niggers/blacks+apes” etc. If you find that those searches are more common in some locations than others, or that they become more common in some locations, then you can try to correlate those findings with other, existing indicators of racism such as those cited above, or with historic indicators such as prevalence of slavery or lynchings.

More posts in this series are here.

Measuring Democracy (7): Some Technical Difficulties

Suppose you want to construct a democracy index measuring the level or lack of democracy in different countries in the world. The normal thing to do is to select some supposedly essential characteristics or attributes of democracy and try to measure the level or presence of those. So, for example, you may select free speech, elections, judicial independence and a number of other characteristics. Some of those are perhaps already measured and you can simply take those measurements. For others, you may have to set up your own measurement (e.g. a survey, analysis of newspapers or official documents etc.), or use a proxy.

In any case, you’ll end up with different datasets on different attributes of democracy, and you’ll have to bring those datasets together somehow in order to make your overall index, you single country-level democracy score. The problem is that the datasets contain different kinds of scales which cannot as such be aggregated into a global index. The scales and the values in the scales have to be normalized, i.e. translated into a common metric.

normalized value = raw value/maximum raw value

First, however, you have to rescale some existing scales so that they start at 0 – in other words, so that the lowest score is 0 (instead of starting at 1 for example, or at -10 such as the Polity IV scale). This way, all scales will have a normalized range from 0 to 1; 0 being the negation or total absence of the attribute; 1 being the complete and perfect protection or presence of the attribute.

What about weighting the different attributes? Some may be more important for a democracy than others. However, introducing weights in this way inevitably means introducing value judgments. While value judgments can’t be avoided (they’ll pop up at the moment of the selection of the attributes as well, for example), they can be minimized. If you choose not to use weighting, you consider all attributes to be equally important, which is a view that can be defended given the often interdependent nature of the attributes of democracy (an independent judiciary for example will likely not survive without a free press).

Once the different data sources are translated into normalized scales and, if necessary, weighted appropriately, they have to be aggregated in order to calculate the global index of quality of democracy. One possible aggregation rule would be this:

global index = source 1 * source 2 * ... * source n.

So a simple multiplication. But that would mean that a value of 0 for one attribute results in labeling the country as a whole as having 0 democratic quality. This is counter-intuitive, even with the assumption of equal importance of all attributes. Hence, a better aggregation rule is the geometric or arithmetic mean (or perhaps the median).

However, there’s also a problem with averages: low scores on one attribute can be compensated by high scores on another. So very different democracies can have the same score. Also, within one country, a high score on suffrage rights but 0 on actual participation would give a medium democracy score, whereas in reality we wouldn’t want to call this country democratic at all (the score should be 0 or close to 0). Perhaps we can’t avoid weights after all.

More posts in this series are here.

Measuring Human Rights (20): What is More Important, the Number or Percentage of People Suffering Human Rights Violations?

Take just one human right, the right not to suffer poverty: if we want to measure progress for this human right, we get something like the following fact:

[N]ever in the world have there been so many paupers as in the present times. But the reason of this is that there have never been so many people around. Indeed never in the history of the world has been the percentage of poor people been so low. (source)

So, is this good news or bad news? If it’s more important to reduce the share of the world population suffering a particular type of rights violation, then this is good news. On the other hand, there are now more people – in absolute, not in relative numbers – suffering from poverty. If we take individuals and the distinctions between persons seriously, we should conclude that this is bad news and we’re doing worse than before.

Thomas Pogge has argued for the latter view. Take another example: killing a given number of people doesn’t become less troubling if the world’s population increases. If we would discover that the real number of the world’s population at the time of the Holocaust was twice as large as previously assumed, that wouldn’t diminish the importance of the Holocaust. What matters is the absolute number of people suffering.

On the other hand, if we see that policies and interventions lead to a significant lowering of the proportion of people in poverty – or suffering from any other type of rights violation – between times t and t+n, then we would welcome that, and we would certainly want to know it. The fact that the denominator – total world population – has increased in the mean time, is probably something that has happened independently of those policies. In the specific case of poverty, a growing population can even make a decrease in relative numbers of people suffering from poverty all the more admirable. After all, many still believe (erroneously) in the Malthusian trap theory, which states that population growth necessarily leads to increases in poverty in absolute numbers.

More posts in this series are here.

Measuring Human Rights (16): The Right to Healthcare

(There’s a more theoretical post here about the reasons why we should call health care a human right. But even if you think those are bad reasons, you may find the following useful).

The right to health care is one of the most difficult rights to measure. You can either try to measure people’s health directly and assume that good health means good health care, or you can measure the provision of health care and assume that there will be good health with a good health care system. Doing the latter means, for example:

  • measuring the number of health workers per capita for countries
  • measuring the quality of hospitals
  • measuring health care spending by governments
  • measuring the availability and affordability of health care
  • measuring the availability and affordability of health insurance
  • etc.

Doing the former means:

  • measuring life expectancy
  • measuring infant mortality
  • measuring maternal mortality
  • measuring calorie intake
  • measuring the incidence of certain diseases
  • measuring the survival rates for certain diseases
  • etc.

Needless to say that every single one of these measurements is fraught with problems, although some more so than others. Even if you’re able to have a pretty good measurement for a single indicator for a single country, it may be difficult to compare the measurement across countries. For example, health insurance is organized in so many different ways that it may be impossible to compare the level of insurance across different countries.

But let’s focus on another measure. Life expectancy is often used as a proxy for health. And indeed, when people live longer, on average, we can reasonably assume that they are healthier and that their health care system is better. It’s also something that is relatively easy to measure, compared to other indicators, since even developing countries usually have reasonably good data based on birth and death certificates. And yet, I say “relatively” because there are some conceptual and definitional problems:

  • Exceptional events such as a natural disasters or a war can drag down life expectancy numbers, but those events need not influence health in general or the quality of health care.
  • Wealthy countries may have more deaths from car accidents than poorer countries, simply because they have more cars. This will pull their relative life expectancy down somewhat, given that younger people are more likely to die in car accidents. And if you use life expectancy to measure health you’ll get a smaller health gap compared to poorer countries than is the case in reality (at least if life expectancy is not corrected for this and if it’s not supplemented with other health indicators).
  • How are miscarriages counted? If they are counted as child mortality, they drag down life expectancy rates compared to countries where they are not counted.
  • What about countries that have more homicides? Or suicides? Although the latter should arguably count since suicides are often caused by bad mental health. If a country’s life expectancy rate is pulled down by high suicide rates, life expectancy rates are still a good indicator of health and of the quality of health care, assuming that health care can reduce suicide rates and remove, to some extent, the underlying health causes of suicide. However, homicides are different: a country with a very good health care system, a very high level of health and a high murder rate can have its health rating pulled down artificially when only life expectancy is used to measure health.
  • Differences in diet and other types of risky behavior should also be excluded when comparing health and life expectancy across countries. It’s wellknown, for instance, that obesity is more of a problem in the U.S. than in many countries that are otherwise comparable to it. Obesity drags down life expectancy and reduces the average level of health, so life expectancy rates which are not corrected for obesity rates are still a good measure for health, but they are not a good measure for the quality of the U.S. health care system. If you want to use life expectancy rates to compare the quality of health care systems you’ll have to correct for obesity rates and perhaps for other types of risky behavior such as smoking or the absence of exercise. Maybe the U.S. health care system, even though it “produces” somewhat lower life expectancy rates than in comparable countries, is actually better than in other countries, yet still not good enough to offset the detrimental effects of high average obesity.

Hence, uncorrected life expectancy rates may not be such a good indicator of national health and of the quality of a national health care system. If we return to the case of the U.S., some of this may explain the strange fact that this country spends a lot more on health and yet has somewhat lower life expectancy rates than comparable countries.

Or maybe this discrepancy is caused by a combination of some misuse and waste at the spending side – more spending on health doesn’t necessarily result in better health – and some problems or peculiarities with the measurement of life expectancy. Let’s focus on the latter. As stated above, some cultural elements of American society, such as obesity, pull down life expectancy and worsen health outcomes. But there are other peculiarities that also pull down life expectancy, and that have nothing to do with health. I’m thinking of course of the relatively high levels of violence in the U.S. Death by assault is 5 to 10 times higher in the U.S. than in comparable countries (although those numbers tend to go down with the passing of time). This affects younger people more than older people, and when more young people die, life expectancy rates drop sharper than when more old people die.

However, even if you correct U.S. life expectancy rates for this, the rates don’t move up a lot (see here). The reason is that the numbers of deaths caused by homicide pale in comparison to other causes. Obesity levels, for instance, are a more important cause. But correcting life expectancy rates for obesity levels doesn’t seem appropriate, because we want to measure health. If you leave out all reasons for bad health from life expectancy statistics, your life expectancy rates go up, but your average health doesn’t. Obesity isn’t the same as homicide. Correcting life expectancy statistics for non-health related deaths such as homicide makes them a better indicator of health. Removing deaths from obesity doesn’t. If you have life expectancy rates without obesity, they may be a fairer judgment of the health care system but not a fairer judgment of health: a health care system in a country with a lot of obesity may be equally good as the one in another country and yet result in lower life expectancy. The former country does not necessarily have lower life expectancy because of its underperforming health care system – we assumed it’s of the same quality as elsewhere – but because of its culture of obesity.

However, if you really want to judge health care systems, you could argue that countries plagued by obesity should have a better quality system than other countries. They need a better quality system to fight the consequences of obesity and achieve similar life expectancy rates as other countries that don’t need to spend so much to fight obesity. So, life expectancy is then reinstituted as a good measure of health.

Measuring Human Rights (15): Measuring Segregation Using the Dissimilarity Index

If people tend to live, work, eat or go to school together with other members of their group – race, gender etc. – then we shouldn’t automatically assume that this is caused by discrimination, forced separation, restrictions on movement or choice of residence, or other kinds of human rights violations. It can be their free choice. However, if it’s not, then we usually call it segregation and we believe it’s a moral wrong that should be corrected. People have a right to live where they want, go to school where they want, and move freely about (with some restrictions necessary to protect the property rights and the freedom of association of others). If they are prohibited from doing so, either by law (e.g. Jim Crow) or by social pressure (e.g. discrimination by landlords or employers), then government policy and legislation should step in in order to better protect people’s rights. Forced desegregation is then an option, and this can take various forms, such as anti-discrimination legislation in employment and rent, forced integration of schools, busing, zoning laws, subsidized housing etc.

There’s also some room for intervention when segregation is not the result of conscious, unconscious, legal or social discrimination. For example, poor people tend to be segregated in poor districts, not because other people make it impossible for them to live elsewhere but because their poverty condemns them to certain residential areas. The same is true for schooling. In order to avoid poverty traps or membership poverty, it’s better to do something about that as well.

In all such cases, the solution should not necessarily be found in physical desegregation, i.e. forcibly moving people about. Perhaps the underlying causes of segregation, rather than segregation itself, should be tackled. For example, rather than moving poor children to better schools or poor families to better, subsidized housing, perhaps we should focus on their poverty directly.

However, before deciding what to do about segregation, we have to know its extent. Is it a big problem, or a minor one? How does it evolve? Is it getting better? How segregated are residential areas, schools, workplaces etc.? And to what extent is this segregation involuntary? The latter question is a hard one, but the others can be answered. There are several methods for measuring different kinds of segregation. The most popular measure of residential segregation is undoubtedly the so-called index of dissimilarity. If you have a city, for example, that is divide into N districts (or sections, census tracts or whatever), the dissimilarity index measures the percentage of a group’s population that would have to change districts for each district to have the same percentage of that group as the whole city.

The dissimilarity index is not perfect, mainly because it depends on the sometimes arbitrary way in which cities are divided into districts or sections. Which means that modifying city partitions can influence levels of “segregation”, which is not something we want. Take this extreme example. You can show the same city twice, with two different partitions, A and B situation. No one has moved residency between situations A and B, but the district boundaries have been altered radically. In situation A with the districts drawn in a certain way, there is no segregation (dissimilarity index of 0). But in situation B, with the districts drawn differently, there is complete segregation (index = 1), although no one has physically moved. That’s why other, complementary measures are probably necessary for correct information about levels of segregation. Some of those measures are proposed here and here.

Measuring Human Rights (14): Numbers of Illegal Immigrants

Calculating a reliable number for a segment of the population that generally wants to hide from officials is very difficult, but it’s politically very important to know more or less how many illegal immigrants there are, and whether their number is increasing or decreasing. There’s a whole lot of populist rhetoric floating around, especially regarding jobs and crime, and passions are often inflamed. Knowing how many illegal immigrants there are – more or less – allows us to quantify the real effects on employment and crime, and to deflate some of the rhetoric.

Immigration is a human rights issue in several respects. Immigration is often a way for people to escape human rights violations (such as poverty or persecution). And upon arrival, immigrants – especially illegal immigrants – often face other human rights violations (invasion of privacy, searches, labor exploitation etc.). The native population may also fear – rightly or wrongly – that the presence of large groups of immigrants will lower their standard of living or threaten their physical security. Illegal immigrants especially are often accused of pulling down wages and labor conditions and of creating native unemployment. If we want to disprove such accusations, we need data on the numbers of immigrants.

So how do we count the number of illegal immigrants? Obviously there’s nothing in census data. The Census Bureau doesn’t ask people about their immigration status, in part because such questions may drive down overall response rates. Maybe in some cases the census data of other countries can help. Other countries may ask their residents how many family members have gone abroad to find a job.

Another possible source are the numbers of births included in hospital data. If you assume a certain number of births per resident, and compare that to the total number of births, you may be able to deduce the number of births among illegal immigrants (disparagingly called “anchor babies“), which in turn may give you an idea about the total number of illegal immigrants.

Fluctuations in the amounts of remittances – money sent back home by immigrants – may also indicate trends in illegal immigration, although remittances are of course sent by both legal and illegal immigrants. Furthermore, it’s not because remittances go down that immigrants leave. It might just be a temporary drop following an economic recession, and immigrants decide to sweat it out (possibly supported by reverse remittances for the time of the recession). Conversely, an increase in remittances may simply reflect technological improvements in international payment systems.

Perhaps a better indicator are the numbers of apprehensions by border-patrol units. However, fluctuations in these numbers may not be due to fluctuations in immigration. Better or worse performance by border-patrol officers or tighter border security may be the real reasons.

So, it’s really not easy to count illegal immigrants, and that means that all rhetoric about illegal immigration – both positive and negative – should be taken with a grain of salt.

More posts on this series are here.

Measuring Human Rights (13): When More Means Less and Vice Versa

Human rights violations can make it difficult to measure human rights violations, and can distort international comparisons of the levels of respect for human rights. Country A, which is generally open and accessible and on average respects basic rights such as speech, movement and press fairly well, may be more in the spotlight of human rights groups than country B which is borderline totalitarian. And not just more in the spotlight: attempts to quantify or measure respect for human rights may in fact yield a score that is worse for A than for B, or at least a score that isn’t much better for A than for B. The reason is of course the openness of A:

  • Human rights groups, researchers and statisticians can move and speak relatively freely in A.
  • The citizens of A aren’t scared shitless by their government and will speak to outsiders.
  • Country A may even have fostered a culture of public discourse, to some extent. Perhaps its citizens are also better educated and better able to analyze political conditions.
  • As Tocqueville has famously argued, the more a society liberates itself from inequalities, the harder it becomes to bear the remaining inequalities. Conversely, people in country B may not know better or may have adapted their ambitions to the rule of oppression. So, citizens of A may have better access to human rights groups to voice their complaints, aren’t afraid to do so, can do so because they are relatively well educated, and will do so because their circumstances seem more outrageous to them even if they really aren’t. Another reason to overestimate rights violations in A and underestimate them in B.
  • The government administration of A may also be more developed, which often means better data on living conditions. And better data allow for better human rights measurement. Data in country B may be secret or non-existent.

I called all this the catch 22 of human rights measurement: in order to measure whether countries respect human rights, you already need respect for human rights. Investigators or monitors must have some freedom to control, to engage in fact finding, to enter countries and move around, to investigate “in situ”, to denounce etc., and victims should have the freedom to speak out and to organize themselves in pressure groups. So we assume what we want to establish. (A side-effect of this is that authoritarian leaders may also be unaware of the extent of suffering among their citizens).

You can see the same problem in the common complaints that countries such as the U.S. and Israel get a raw deal from human rights groups:

[W]hy would the watchdogs neglect authoritarians? We asked both Human Rights Watch and Amnesty, and received similar replies. In some cases, staffers said, access to human rights victims in authoritarian countries was impossible, since the country’s borders were sealed or the repression was too harsh (think North Korea or Uzbekistan). In other instances, neglected countries were simply too small, poor, or unnewsworthy to inspire much media interest. With few journalists urgently demanding information about Niger, it made little sense to invest substantial reporting and advocacy resources there. … The watchdogs can and do seek to stimulate demand for information on the forgotten crises, but this is an expensive and high risk endeavor. (source)

So there may also be a problem with the supply and demand curve in media: human rights groups want to influence public opinion, but can only do so with the help of the media. If the media neglect certain countries or problems because they are deemed “unnewsworthy”, then human rights groups will not have an incentive to monitor those countries or problems. They know that what they will be able to tell will fall on deaf ears anyway. So better focus on the things and the countries which will be easier to channel through the media.

Both the catch 22 problem and the problems caused by media supply and demand can be empirically tested by comparing the intensity of attention given by human rights monitoring organizations to certain countries/problems to the intensity of human rights violations (the latter data are assumed to be available, which is a big assumption, but one could use very general measures such as these). It seems that both effects are present but not much:

[W]e subjected the 1986-2000 Amnesty [International] data to a barrage of statistical tests. (Since Human Rights Watch’s early archival procedures seemed spotty, we did not include their data in our models.) Amnesty’s coverage, we found, was driven by multiple factors, but contrary to the dark rumors swirling through the blogosphere, we discovered no master variable at work. Most importantly, we found that the level of actual violations mattered. Statistically speaking, Amnesty reported more heavily on countries with greater levels of abuse. Size also mattered, but not as expected. Although population didn’t impact reporting much, bigger economies did receive more coverage, either because they carried more weight in global politics and economic affairs, or because their abundant social infrastructure produced more accounts of abuse. Finally, we found that countries already covered by the media also received more Amnesty attention. (source)

More posts in this series are here.

Measuring Poverty (12): The Experimental Method

The so-called experimental method of poverty measurement is akin to the subjective approach. Rather than measuring poverty on the basis of objective economic numbers about income or consumption the experimental method uses people’s subjective evaluation of living standards and living conditions. But contrary to the usual subjective approach it’s aim is not to ask people directly about what poverty means to them, about what they think is a reasonable minimum level of income or consumption or a maximum tolerable level of deprivation in certain specific areas (food, health, education etc.). Instead, it uses experiments to try to gather this information.

For example, you can set up a group of 20 people from widely different social backgrounds and some of them may suffer from different types of deprivation, or from no deprivation at all. The group receives a sum of money and has to decide how to spend it on poverty alleviation (within their test group or outside of the group). The decision as to who will receive which amount of funding targeted at which type of deprivation has to be made after deliberation and possibly even unanimously.

The advantage of this experimental approach, compared to simply asking individual survey respondents, is that you get a deliberated choice: people will think together about what poverty means, about which types of deprivation are most important and about the best way to intervene. It’s assumed that such a deliberated choice is better than an individual choice.

More posts in this series are here.

Measuring Poverty (11): The Subjective Approach

Usually, we measure poverty on the basis of objective numbers about income or consumption. Income or consumption levels are put on a continuum from lowest to highest and somewhere along the continuum we put a threshold that indicates the difference between poor and non-poor. For example, the Indian government uses a consumption threshold of 2,400 calories a day in rural areas and 2,100 in urban areas. The World Bank uses an income threshold of one dollar a day (corrected for purchasing power).

There are numerous disadvantages to these objective approaches. One is the inevitably arbitrary positioning of the threshold. One dollar a day, even after correction for purchasing power, means different things to different people in different areas, circumstances, groups etc. Calorie intake also means different things to different people, depending on people’s way of life etc. Moreover, income levels are notoriously difficult to measure (poor people in particular have a lot of informal income, e.g. “income” coming from all sorts of assistance from relatives etc.). Consumption as well is a difficult measure: it doesn’t necessarily have to mean just calorie intake for example. Poverty can mean a lack of non-food consumption. And if you focus on calorie levels after all, you’ll miss the issue of the quality of the food.

Also the third most common approach to poverty measurement suffers from some disadvantages. This approach, also called the multidimensional approach, tries to assess to what extent people suffer from a series of different types of deprivation: do they have access to water, to electricity, are they literate, malnourished etc. Rather than purely quantitative these measurements can be qualitative: a binary yes/no is often enough. Unfortunately, also this measurement system has some drawbacks: it fails to distinguish between deprivation and choice; there’s necessarily a level of arbitrariness in the determination of the “basic needs” or forms of deprivation that are measured; and these needs are often overly general, obscuring some very specific needs for some people in some areas or groups.

That’s why people have been searching for alternative measures of poverty. One such alternative is the use of surveys that ask people about poverty. You could ask people what they believe is “the smallest amount of money a family needs each week to get along in this community”, “what is the level of income below which families can’t make ends meet” etc. That would remove some of the arbitrariness of the cutoff line between poor and non-poor, and putting that decision in the hands of the people rather than the scientists.

Or you could also present people with evocative descriptions of different family situations, of types of families according to their level of income or consumption or according to the type of deprivation. People would then have to decide for every family situation what they believe the standard of living is and which situation can be described as “poverty”. That would specify what poverty means to people. And what it means to people is much more important than what it means to researchers and statisticians.

A disadvantage of this subjective approach is the wellknown effect that people’s income levels affect their judgments about income adequacy. In short, relatively rich people overestimate the level of income inadequacy. A solution to this problem could be to ask only poor respondents about poverty, on the reasonable assumption that poor people are the best experts on poverty. But that’s a circular reasoning: you already think you know what poverty is before you start asking about it. Since you focus only on the poor, you’ve already decided what poverty is.

An advantage of the subjective approach is that researchers don’t have to list basic needs or types of deprivation in order to assess what poverty is; people tell you what poverty is. There’s also no need for researchers to specify regionally or socially undifferentiated and general cutoff levels of income or consumption below which people are considered to be poor.

Measuring Human Rights (12): Measuring Public Opinion on Torture

Measuring the number and gravity of cases of actual torture is extremely difficult, for apparent reasons. It takes place in secret, and the people subjected to torture are often in prison long afterwards, or don’t survive it. Either way, they can’t tell us.

That’s why people try to find other ways to measure torture. Asking the public when and under which circumstances they think torture is acceptable may give an approximation of the likelihood of torture, at least as long as we assume that in democratic countries governments will only engage in torture if there’s some level of public support for it. This approach won’t work in dictatorships, obviously, since public opinion in a dictatorship is often completely irrelevant.

However, measuring public opinion on torture has proven to be very difficult and misleading:

Many journalists and politicians believe that during the Bush administration, a majority of Americans supported torture if they were assured that it would prevent a terrorist attack. … But this view was a misperception … we show here that a majority of Americans were opposed to torture throughout the Bush presidency…even when respondents were asked about an imminent terrorist attack, even when enhanced interrogation techniques were not called torture, and even when Americans were assured that torture would work to get crucial information. Opposition to torture remained stable and consistent during the entire Bush presidency.

Gronke et al. attribute confusion of beliefs [among many journalists] to the so-called false consensus effect studied by cognitive psychologists, in which people tend to assume that others agree with them. For example: The 30% who say that torture can “sometimes” be justified believe that 62% of Americans do as well. (source)

Measuring Poverty (9): Absolute and Relative Poverty Lines

There are many ways you can measure how many people in a country are poor. Quite common is the use of a so-called poverty line. First you decide what you mean by poverty – for instance an income that’s insufficient to buy life’s necessities, or an income that’s less than half the average income etc. Then you calculate your poverty line – for instance the amount of income someone needs in order to buy necessities, or the income that’s half the average income, or the income of the person who has the tenth lowest income if the population was one hundred etc. And then you just select the people who are under this poverty line.

I intentionally chose these examples to make a point about absolute and relative poverty. In the U.S., people mostly use an absolute poverty line, whereas in Europe relative poverty lines are used as well. As is clear from the examples above, an absolute poverty line is a threshold, usually expressed in terms of income that is sufficient for basic needs, that is fixed over time in real terms. In other words, it’s adjusted for inflation only and doesn’t move with economic growth, average income, changes in living standards or needs.

A relative poverty line, on the other hand, varies with income growth or economic growth, usually 1-to-1 since it’s commonly expressed as a fixed percentage of average or median income. (It obviously can have an elasticity of less than 1 since you may want to avoid a disproportionate impact on the poverty line of very high and very volatile incomes. I’ve never heard of an elasticity of more than 1).

Both absolute and relative poverty lines can be criticized. Does an absolute poverty line make sense when we know that expectations change, that basic needs change (in contemporary Western societies, not having a car, a phone or a bank account can lead to poverty), and that the things that you need to fully participate in society are a lot different now than they once were? We know that people’s well-being does not only depend on the avoidance of absolute deprivation but also on comparisons with others. The average standard of living defines people’s expectations and when they are unable to reach the average, they feel excluded, powerless and resentful. Can people who fail to realize their own expectations, who lose their self-esteem, and who feel excluded and marginalized be called “poor”? Probably yes. They are, in a sense, deprived. It all depends which definition of poverty we can agree on.

It seems that people do think about poverty in this relative sense. If you compare the (rarely used) relative poverty line of 50% of median income in the U.S. with the so-called subjective poverty lines that result from regular Gallup polls asking Americans “how much they would need to get along”, you’ll see that the lines correspond quite well.

So if relative poverty corresponds to common sense, it seems to be a good measure. On the other hand, a relative poverty line means moving the goal posts for all eternity. We’ll never vanquish relative poverty since this type of poverty just moves as incomes rise. It’s even the case that relative poverty can increase as absolute poverty decreases, namely when there’s strong economic growth (i.e. strong average income growth) combined with widening income inequality (something we’ve seen for example in the U.S. during the last decades). (Technically, if you use the median earner as the benchmark, relative poverty can disappear if all earners who are below the median earner move towards the median and earn just $1 or so less than the median. But in practice I don’t see that happening).

Lies, Damned Lies, and Statistics (31): Common Problems in Opinion Polls

Opinion polls or surveys are very useful tools in human rights measurement. We can use them to measure public opinion on certain human rights violations, such as torture or gender discrimination. High levels of public approval of such rights violations may make them more common and more difficult to stop. And surveys can measure what governments don’t want to measure. Since we can’t trust oppressive governments to give accurate data on their own human rights record, surveys may fill in the blanks. Although even that won’t work if the government is so utterly totalitarian that it doesn’t allow private or international polling of its citizens, or if it has scared its citizens to such an extent that they won’t participate honestly in anonymous surveys.

But apart from physical access and respondent honesty in the most dictatorial regimes, polling in general is vulnerable to mistakes and fraud (fraud being a conscious mistake). Here’s an overview of the issues that can mess up public opinion surveys, inadvertently or not.

Wording effect

There’s the well-known problem of question wording, which I’ve discussed in detail before. Pollsters should avoid leading questions, questions that are put in such a way that they pressure people to give a certain answer, questions that are confusing or easily misinterpreted, wordy questions, questions using jargon, abbreviations or difficult terms, double or triple questions etc. Also quite common are “silly questions”, questions that don’t have meaningful or clear answers: for example “is the catholic church a force for good in the world?” What on earth can you answer to that? Depends on what elements of the church you’re talking about, what circumstances, country or even historical period you’re asking about. The answer is most likely “yes and no”, and hence useless.

The importance of wording is illustrated by the often substantial effects of small modifications in survey questions. Even the replacement of a single word by another, related word, can radically change survey results.

Of course, one often claims that biased poll questions corrupt the average survey responses, but that the overall results of the survey can still be used to learn about time trends and difference between groups. As long as you make a mistake consistently, you may still find something useful. That’s true, but no reason not to take care of wording. The same trends and differences can be seen in survey results that have been produced with correctly worded questions.

Order effect or contamination effect

Answers to questions depend on the order they’re asked in, and especially on the questions that preceded. Here’s an example:

Fox News yesterday came out with a poll that suggested that just 33 percent of registered voters favor the Democrats’ health care reform package, versus 55 percent opposed. … The Fox News numbers on health care, however, have consistently been worse for Democrats than those shown by other pollsters. (source)

The problem is not the framing of the question. This was the question: “Based on what you know about the health care reform legislation being considered right now, do you favor or oppose the plan?” Nothing wrong with that.

So how can Fox News ask a seemingly unbiased question of a seemingly unbiased sample and come up with what seems to be a biased result? The answer may have to do with the questions Fox asks before the question on health care. … the health care questions weren’t asked separately. Instead, they were questions #27-35 of their larger, national poll. … And what were some of those questions? Here are a few: … Do you think President Obama apologizes too much to the rest of the world for past U.S. policies? Do you think the Obama administration is proposing more government spending than American taxpayers can afford, or not? Do you think the size of the national debt is so large it is hurting the future of the country? … These questions run the gamut slightly leading to full-frontal Republican talking points. … A respondent who hears these questions, particularly the series of questions on the national debt, is going to be primed to react somewhat unfavorably to the mention of another big Democratic spending program like health care. And evidently, an unusually high number of them do. … when you ask biased questions first, they are infectious, potentially poisoning everything that comes below. (source)

If you want to avoid this mistake – if we can call it that (since in this case it’s quite likely to have been a “conscious mistake” aka fraud) – randomizing the question order for each respondent might help.

Similar to the order effect is the effect created by follow-up questions. It’s well-known that follow-up questions of the type “but what if…” or “would you change your mind if …” change the answers to the initial questions.

Bradley effect

The Bradley effect is a theory proposed to explain observed discrepancies between voter opinion polls and election outcomes in some U.S. government elections where a white candidate and a non-white candidate run against each other.

Contrary to the wording and order effects, this isn’t an effect created – intentionally or not – by the pollster, but by the respondents. The theory proposes that some voters tend to tell pollsters that they are undecided or likely to vote for a black candidate, and yet, on election day, vote for the white opponent. It was named after Los Angeles Mayor Tom Bradley, an African-American who lost the 1982 California governor’s race despite being ahead in voter polls going into the elections.

The probable cause of this effect is the phenomenon of social desirability bias. Some white respondents may give a certain answer for fear that, by stating their true preference, they will open themselves to criticism of racial motivation. They may feel under pressure to provide a politically correct answer. The existence of the effect is, however, disputed. (Some say the election of Obama disproves the effect, thereby making another statistical mistake).

Fatigue effect

Another effect created by the respondents rather than the pollsters is the fatigue effect. As respondents grow increasingly tired over the course of long interviews, the accuracy of their responses could decrease. They may be able to find shortcuts to shorten the interview; they may figure out a pattern (for example that only positive or only negative answers trigger follow-up questions). Or they may just give up halfway, causing incompletion bias.

However, this effect isn’t entirely due to respondents. Survey design can be at fault as well: there may be repetitive questioning (sometimes deliberately for control purposes), the survey may be too long or longer than initially promised, or the pollster may want to make his life easier and group different polls into one (which is what seems to have happened in the Fox poll mentioned above, creating an order effect – but that’s the charitable view of course). Fatigue effect may also be caused by a pollster interviewing people who don’t care much about the topic.

Sampling effect

Ideally, the sample of people who are to be interviewed for a survey should represent a fully random subset of the entire population. That means that every person in the population should have an equal chance of being included in the sample. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. That reduces the randomness of the sample, which can be seen from the fact that self-selection leads to polarized results. The size of the sample is also important. Samples that are too small typically produce biased results.

Even the determination of the total population from which the sample is taken, can lead to biased results. And yes, that has to be determined… For example, do we include inmates, illegal immigrants etc. in the population? See here for some examples of the consequences of such choices.

House effect

A house effect occurs when there are systematic differences in the way that a particular pollster’s surveys tend to lean toward one or the other party’s candidates; Rasmussen is known for that.

I probably forgot an effect or two. Fill in the blanks if you care. Go here for other posts in this series.

Measuring Poverty (8): Deep Poverty

Most systems for measuring poverty use a so-called poverty rate or poverty line (that’s the case in the U.S. for instance, or in the UN’s Millennium Development Goals). That’s a level of income (or consumption etc.) which is considered to be the minimum that’s necessary for a decent human life and for the satisfaction of basic needs. These systems are also called “headcount” measures of poverty: they simply count how many people fall below the fixed point that determines the difference between poverty and non-poverty.

You can see the problem coming: according to these systems, you’re either poor or you ain’t. They just tell us how many people are poor, not how poor they actually are. This is a big problem in developed countries that use such poverty measurement systems. The poverty rates in those countries are rather high in dollar terms. For example, the thresholds in the U.S. are, as of 2008:

  • One adult: $11,200 annual income, not including the EITC or non-cash benefits (Food Stamps, Medicaid, housing assistance, employer health-insurance contributions, etc.), and including taxes
  • Two adults: $14,400
  • One adult, two kids: $17,300
  • Two adults, two kids: $21,800.

By “rather high” I don’t mean to say that the people under those poverty lines aren’t really poor and that the U.S. measurement system is too generous (if anything, it’s the contrary). What I want to say is that in developed countries, people need a substantial income in order to escape poverty. If you want a job, you’ll probably need a car, a phone, internet connection, child care etc. If you want a place to live, you’ll need to spend a huge amount of money on a house, and so on. Poverty lines in developed countries are therefore not so low that being poor means being on the brink of starvation. They are set at such a level that being poor means being unable to afford a job, quality housing, healthcare and education.

Given the fact that poverty rates are rather high, there’s a lot of space below them. Hence, you have different kinds of poor people: there are those who have a job and an income, albeit a rather low one, but who struggle to survive because of their expenses; and there are those who just live on the street. You have people who are poor for some years and people who are poor their entire lives.

All these people are equally poor in the measurement system we’re discussing here. This system doesn’t provide data on the distance from the poverty line, or, in other words, on the depth of poverty. In the worst case, people who are already poor according to the system could become much poorer, without any change in the headcount of poverty. If the 13% or so of Americans who are currently under the poverty line all became homeless beggars, you wouldn’t see a change in U.S. poverty statistics.

In order to solve this problem, people have come up with the concept of the poverty gap (incidence * depth of poverty): the mean distance separating the population from the poverty line (with the non-poor being given a distance of zero), expressed as a percentage of the poverty line (see also here). Unfortunately, this hasn’t become a very popular number. It’s probably too complicated. A clear and simple poverty line is much more appealing yet deficient.

More post on the problems of poverty measurement are here.

Measuring Poverty (7): Different Types of Poverty

I already mentioned the obvious but consequential fact that poverty measurement depends on the choice of the type of poverty you want to measure. Definitional issues are always important, but when it comes to poverty the choice of a definition of poverty determines who will benefit from government benefits and who won’t. For example, in the U.S. you’re poor when you’re income is below a certain poverty line. If that’s the case, you’re eligible for certain benefits. So poverty is a function of income.

1. Insufficient income

Usually, and not only in the U.S., poverty is indeed understood as insufficient income (preferably post-tax and post-benefits). Measuring poverty in this case means

  1. determining a sufficient level of income (sufficient for a decent human life); this is usually called a “poverty line” or “poverty rate”
  2. measuring actual income
  3. counting the number of people who have less income than the sufficient level.

There are some problems with this measurement system or this choice of type of poverty. Actual income levels are notoriously difficult to measure. People have a lot of informal income which they will not disclose to people doing a survey. Likewise, there is tax evasion and income in kind (market based or from government benefits, e.g. social housing), and material or immaterial support by local social networks. None of this is included correctly if at all in income measurement, leading to an overestimate of poverty. Another disadvantage of income based measurements: they neglect people’s ability to borrow or to draw from savings in periods of lower income. Again, this overestimates poverty (although one could say that it just estimates it a bit too early, since borrowing and eating up savings can lead to future poverty).

2. Insufficient consumption

Because of these problems, some countries define poverty, not by income levels, but by consumption levels. Measuring poverty in this case means

  1. determining a sufficient level of consumption (sufficient for a decent human life)
  2. measuring actual consumption
  3. counting the number of people who consume less than the sufficient level.

However, this measurement isn’t without problems either. As is the case for income levels, actual consumption levels are difficult to measure. How much do people actually consume? And what does it mean “to consume”? Is it calorie intake? Is it financial expenses? Or something else perhaps? Consumption levels are also deceiving: people tend to smooth their consumption over time, even more so than their income. If they face a financial crisis because of unemployment, bad health, drought etc. they will sell some of their assets (their house for instance) or take a loan. If you determine whether someone is poor on the basis of consumption levels, you won’t consider people dealing with a crisis as being poor because they continue to consume at the same levels. However, because of loans or the sale of assets, they are likely to face poverty in the future. They may also shift their diet away to low quality food, taking in the same amount of calories but risking their health and hence their future income. Similarly, they may be forced by their crisis situation to delay health expenditures in order to smooth consumption, with the same long term results.

And even if you manage somehow to measure consumption, you’re still faced with the problem of the threshold of sufficient consumption: that’s hard to determine as well. Consumption needs differ from person to person, depending on age, gender, occupation, climate etc.

3. Direct physical measures of real consumption

Rather than trying to measure total income or consumption, you can choose to measure consumption of certain specific physical items, and combine that with some easy to measure elements of standard of living, such as child mortality or education levels. It’s possible to argue that poverty isn’t an insufficient level of overall income or consumption, but instead the absence of certain specific consumption articles. People are poor if they don’t have a bicycle or a car, a solid floor, a phone etc. Or when their children die, can’t go to school or are undernourished. These items or indicators are relatively easy to measure (for example, there’s the Demographic and Health Survey). While they may not tell us a lot about relative living standards in developed countries (where few children die from preventable diseases for instance), they do provide poverty indicators in developing countries.

The OECD has done a lot of good work on this. They call it “measuring material deprivation“. It’s the same assumption: there are certain consumer goods and certain elements of living standard that are universally considered important elements of a decent life. The OECD tries to measure ownership of these goods or occurrence of these elements, and when people report several types of deprivation at the same time, they are considered to be poor.

Take note that we’re not talking about monetary measures here, contrary to income and overall levels of consumption. Sometimes, all that has to be measured is a “yes” or a “no”. Which of course makes it easier.

Unfortunately, not easy enough. This type of poverty measurement has its own drawbacks. Measures of material deprivation often fail to distinguish between real deprivation and the results of personal choices and lifestyles. Some people can’t have a decent life without a car or a solid floor; others voluntarily choose not to have those goods. It’s likely that only the former are “poor”. Furthermore, since these measurements are often based on surveys, there are some survey related problems. The really poor may be systematically excluded from the survey because we can’t find them (e.g. the homeless). These surveys measure self-reported poverty, and self-reported poverty can be affected by low aspirations or habit. People may also be ashamed about their poverty and hence not report it correctly.

Conclusion

There isn’t a perfect system for poverty measurement. And that has a lot to do with the fact that poverty is an inherently vague concept. It really shouldn’t be a surprise that people choose different definitions and types, and hence different measurement methods that all provide different data. There’s no “correct” definition of poverty, and hence no correct poverty measure.

More posts in this series on the difficulties of poverty measurement.

Measuring Equality of Opportunity

People on opposite sides of political debates often agree on very little, but they do agree on the importance of equality of opportunity. There is almost universal agreement that people should have at least a starting position that guarantees an equal chance of success in whatever life projects one chooses, for those willing to invest an equal amount of effort. More specifically, equality of opportunity is often defined as an equal likelihood of success for all at age 18 (in order to factor in possible inequalities of opportunity determined by education).

Equality of opportunity is by definition an impossible goal. The lottery of birth means more than being unable to choose to be born in a wealthy family with caring parents who can finance your education and motivate you to achieve your goals. It also means that you can’t choose which talents and genes you are born with. Genetic differences are no more a matter of choice than the character and means of your parents. And genetic differences affect people’s talents, skills and maybe even their capacity to invest effort. So, as long as we can’t redistribute beneficial genes or disable harmful ones, and as long as we don’t want to intervene in people’s families and redistribute children, we can’t remove the impact of genes and parents.

However, we can do something. Equality of opportunity may be impossible but there is less or more inequality of opportunity. Or concern should be to provide as much equality of opportunity as possible, and to expand opportunity for those who are relatively less privileged. This means removing things that hold some people back (e.g. discrimination, unemployment, bad schools etc.), and – more positively – helping people to cultivate their capabilities and expand their choices.

How doe we measure if these interventions are successful? It seems very difficult to measure equality of opportunity. All we can do is measure some of the elements of opportunity:

  • We can measure unequal income and infer unequal opportunity from this. People with low income obviously have less opportunities than other people. However, not all opportunities can be bought and maybe low income isn’t the result of a disadvantaged upbringing or bad schools, but of bad choices, or even conscious choices. Can we say that a child of a millionaire who chose to be a hermit suffered from unequal opportunity? Don’t think so.
  • We can measure unequal education and skills (educational attainment or degrees, IQ tests etc.). However, someone who comes from a very privileged family but with low or alternative aspirations may score low on educational attainment or even IQ.
  • We can deduce unequal opportunity by the absence of opportunity enhancing government policies and legislation. The Civil Rights Act was self-evidently a boost for the opportunities of African-Americans.
  • We can measure social mobility and assume, correctly I think, that very low levels of mobility indicate inequality of opportunity.
  • Etc.

Whatever actions we take to enhance opportunity, it will probably always be relatively unclear what the net outcome will be on overall equality of opportunity. Of course, that doesn’t mean we shouldn’t do anything. And when we do something, we should also distinguish clearly between things we can do and things we can’t to, or things we feel are immoral (e.g. genetic redistribution or child redistribution). We know that parental attitudes, genetics, talent, appearance, networks and luck have a huge impact on individuals’ chances of success, but those are things we can’t do anything about, either because it’s impossible or because it’s immoral. But we can teach people skills and perseverance, to a certain extent. We can help the unlucky, for example with unemployment benefits. We can regulate firms’ employment policies so as to counteract the “old boys networks” or racism in employment decisions. We can impose an inheritance tax in order to limit the effects of the lottery of family. Etc etc.

Measuring Poverty (6): The Poverty Line in the U.S.

The poverty rate or poverty line in the U.S. is based on a system pioneered by Mollie Orshansky in 1963. In the 1960s, the average US family spend one third of its income on food. The poverty line was calculated by valuing an “emergency food” budget for a family, and then multiplying that number by 3. (Some more data here).

This results in a specific dollar amount that varies by family size but is the same across the U.S. (the amounts are adjusted for inflation annually). To determine who is poor, actual family income is then compared to these amounts. Obviously, if you’re under, you’re poor.

Amazingly, this system hasn’t changed a lot since the 1960s, yet it suffers from a series of measurement problems, resulting in either an over- or underestimation of the number of families living in poverty. The problems are situated both in the calculation of the poverty rates and in the calculation of the income that is subsequently compared to the rates:

  • Obviously, the system should take regional differences in the cost of living, especially in housing, into account. It doesn’t.
  • As already apparent from the image above, a family today spends relatively less on food and more on housing, health care and child care etc. yet the poverty line is still dollars for emergency food times 3. So the question is: should the system take today’s spending patterns into account? We would have to know which it is: 1) Either the increased spending on non-food items has occurred because people can now afford to spend more on such items. 2) Or the increased spending on non-food items has occurred because these items got disproportionately more expensive (housing for instance) or because there wasn’t really any need to buy those items in the old days. Only if 2) is the case should that have an influence on the poverty line. And I think that to some extent it is the case. Child care for instance has become a necessity. In the 1960s, many mothers didn’t go out and work. Now they do, and therefore they have to pay for child care. Those payments should be deducted from income when measuring disposable income and comparing it to the poverty line. The same is true for cars or phones. Today you can’t really have a job without them so they’re no longer luxuries. A society would show very little ambition if it continued to designate the poor as those who have to wash by hand, read with candlelight, and shit in a hole in the floor. In fact, what I’m advocating here is some kind of relative concept of poverty. I’ll come back to that later. All I can tell you now is that this isn’t without complications either.
  • The current poverty measurement doesn’t take into account disproportionate price rises (it merely adjusts for general inflation) and changing needs. An obvious improvement of the U.S. measurement system would be to adjust for exceptional price evolutions (such as for housing) and also to revisit the definitions of basic needs and luxuries. Hence, a better poverty measurement should subtract from income some work-related expenses, child care expenses, and perhaps also some health expenses to the extent that these have become disproportionately more expensive. But that’s not easy:

There is considerable disagreement on the best way to incorporate medical care in a measure of poverty, even though medical costs have great implications for poverty rates. But costs differ greatly depending upon personal health, preferences, and age, and family costs may be very different from year to year, making it hard to determine what exactly should be counted. Subtracting out-of-pocket costs from income is one imperfect approach, but if someone’s expenses are low because they are denied care, then they would usually be considered worse off, not better off. (source)

  • Another problem: the current poverty rate doesn’t take all welfare benefits into account. Income from cash welfare programs counts, but the value of non-cash benefits such as food stamps, school lunches and public housing doesn’t (because such benefits weren’t very common in the 1960s). Those benefits successfully raise the standard of living for poverty stricken individuals. There’s a bit of circular reasoning going on here, because the poverty rate is used, i.a., to decide who gets benefits, so benefits should not be included. But if you want to know how many people are actually poor, you should consider benefits as well because benefits lift many out of poverty.
  • The poverty measure doesn’t include some forms of interests on savings or property such as housing.
  • The poverty measure doesn’t take taxes into account, largely because they didn’t affect the poor very much in the 1960s. Income is counted before subtracting payroll, income, and other taxes, overstating income for some families. On the other hand, the federal Earned Income Tax Credit isn’t counted either, underestimating income for other families.
  • And there’s also a problem counting the effects of cohabitation and co-residency, overestimating poverty because overestimating expenses.

Because the poverty measurement disregards non-cash benefits and certain tax credits, it fails to serve its purpose. Poverty measurement is done in order to measure progress and to look at the effects of anti-poverty policies. Two of those policies – non-cash benefits and certain tax credits – aren’t counted, even though they reduce poverty. So we have a poverty statistic that can’t measure the impact of anti-poverty policy… That’s like measuring road safety without looking at the number of accidents avoided by government investment in safety. Since the 1970s, the U.S. government implemented a number of policies that increased spending for the poor, but the effects of this spending were invisible in the poverty statistics.

This had a perverse effect: certain politicians now found it easy to claim that spending on the poor was ineffective and a waste of money. It’s no coincidence that trickle down economics became so popular in the 1980s. The poverty measurement, rather than helping the government become more effective in its struggle against poverty, has led to policies that reduced benefits. Of course, I’m not saying that poverty reduction is just a matter of government benefits, or that benefits can’t have adverse effects. Read more here.

Fortunately, the US Census Bureau has taking these criticism to heart and has been working on an alternative measure that counts food stamps and other government support as income, while also accounting for child-care costs, geographic difference etc. First results show that the number of poor is higher according to the new measurement system (it adds about 3 million people). For some reason, I think the old system has still some life in it.

Some details of the new measurement:

when you account for the Earned Income Tax Credit the poverty rate goes down by two points. Accounting for SNAP (food stamps) lowers the poverty rate about 1.5 points. … when you account for the rise in Medical Out of Pocket costs, the poverty rate goes up by more than three points. (source)

More posts about problems with poverty measurement are here.

Measuring Poverty (4): The Problem of the Definition of Poverty

Before you can start to measure poverty, you first have to decide what you actually want to measure. What is poverty? That’s not just a philosophical problem because depending on the definition of poverty you use, your measurements will be radically different (even with an identical definition, measurements will be different because of different measurement methods).

Among people who measure poverty, roughly 6 different definitions of poverty are used:

  • insufficient income
  • insufficient consumption spending
  • insufficient calorie intake
  • food consumption spending above a certain share of total spending
  • certain health indicators such as stunting, malnutrition, infant mortality rates or life expectancy
  • certain education indicators such as illiteracy.

None of these definitions is ideal, although the first and second on the list are the most widely used. A few words about the advantages and disadvantages of each.

Income

Advantages:

In developed countries, income is a common definition because it’s easy to measure. Most people in developed countries earn a salary or get their income from sources that are easy to estimate (interest payments, the value of houses, stock market returns etc.). They don’t depend for their income on the climate, crop yields etc. Moreover, developed countries have good tax data which can be used to calculate incomes.

Disadvantages:

In developing countries, however, income data tend to be underestimated because it’s difficult to value the income of farmers and shepherds. Farmers’ incomes fluctuate heavily with climate conditions, crop yields etc. If you ask them one day what their income is, there’s no guarantee that this is a good estimate of their yearly income.

Another disadvantage is that people are generally reluctant to disclose their full income. Some income may have been hidden from the tax administration or may have been earned from illegal activity such as corruption, smuggling, drug trade, prostitution, theft etc. For this reason, using income to estimate poverty means overestimating it.

And, finally, some income may be difficult to calculate (e.g. rising value of livestock).

Consumption

Advantages:

The main advantage of using consumption rather than income to measure poverty is that consumption is much more stable over the year and over a lifetime (see above). Hence, if you ask people about the level of their consumption, they can just tell you about their current situation, without having to go back in time or to predict the future – which they would have to do if you asked them about income. Their current consumption is likely to be representative of their long term consumption, which isn’t the case for income. This is even more true in the case of farmers who depend on the weather for their income and hence have a more volatile income. If you know that farmers are often relatively poor, then this issue is all the more salient for poverty measurement.

Another advantage of using consumption is that people aren’t as reticent to talk about it as they are about certain parts of their income. It’s also appears that people tend to remember their spending better than their income.

Disadvantages:

If you want to measure how much people consume, you have to include durable goods and housing. And consumption of those goods is difficult to measure because it’s difficult to value them. For example, if a household owns a house, you have to estimate what it would cost to rent that particular house and add this to the total consumption of that household, at least if you want to compare their consumption to the consumption of the household next door who has to rent its house. And you can’t make poverty statistics if you don’t make such comparisons. Then you have to do the same for cars etc.

Another difficulty in measuring consumption, is that in developing countries households consume a lot of what they themselves produce on the family farm. This as well is often difficult to value correctly.

And finally, different people have different consumption needs, depending of their age, health, work etc. It’s not clear to me how these different needs are taken into account when consumption is measured and used as an indicator of poverty.

Other definitions

Calorie intake: the problem with this is that different people need different amounts of calories (depending on their type of work, their age, health etc.), and that it isn’t very easy to measure how many calories people actually consume.

Food spending as a fraction of total spending: if you say people who spend more than x % of their total spending on food are considered poor, you still have to factor in relative food prices.

Stunting as an indicator of malnutrition and hence of poverty: stunting (height for age) is a notoriously difficult thing to measure.

Other issues

Some aspects of life tend to be excluded from poverty measurement, even though they have a huge impact on people’s wellbeing. The amount of leisure time people have is perhaps a good indicator of poverty, in certain circumstances (excluding CEOs and US Presidents), but it’s hardly ever counted in poverty measurements.

Another thing: people may have comparable incomes or even consumption patterns, but they may face very different social or environmental conditions: an annual income of $500 may be adequate for people living in a rural environment with a temperate climate where housing is cheap, heating isn’t necessary and subsistence farming is relatively easy. But the same income can mean deep poverty for a family living in a crowded city on the edge of a desert. The presence or absence of public goods such as quality schools, roads, running water and electricity also makes a lot of difference, but poverty measurement usually doesn’t take these goods into account.

Measuring Poverty (2): Some Problems With Poverty Measurement

The struggle against poverty is a worthy social goal, and the absence of poverty is a human right. But poverty is also an obstacle to other social goals, particularly the full realization of other human rights. A necessary instrument in poverty reduction is data: how many people suffer from poverty? Without an answer to that question it’s very difficult to assess the success of poverty reduction policies (such as development aid).

And that’s were the problems start. There’s some uncertainty in the data. The data may not reflect accurately the real number of people living in poverty. There are definition issues – what is poverty? – that may reduce the accuracy of the data or the comparability between different measurements of poverty (or between different measurements over time), and there are issues related to the measurements themselves. I’ll focus on the latter for the moment.

Poverty is often measured by way of surveys. These surveys, however, can be biased because of

  1. sample errors: underreporting of the very rich and the very poor (more on sample errors here), and
  2. reporting errors: failure of the very rich and the very poor to report accurately.

The rich are less likely than middle-income people to respond to surveys because they are less accessible (their houses for instance are less accessible). In addition, when they respond, they may tend to underreport a larger fraction of their wealth as they have more incentives to hide (for tax reasons for example).

The very poor may also be inaccessible, but for other reasons. They may be hard to interview when they don’t have a fixed address or an official identification. In poor countries, they may be hard to find because they live in remote areas with inadequate transportation access. And again, when they report, it may be difficult to estimate their “wealth” because their assets are often in kind rather than in currency.

Because we can have underreporting of the two extremes on the wealth distribution, we believe that income distribution is more egalitarian than it really is. Hence we underestimate income inequality and relative poverty.

But apart from relative poverty we also underestimate absolute poverty since we’re often unable to include the very poor in the reporting for the reasons given above. By “cutting off” the people at the poor end of the distribution, it seems like most people are middle class and society largely egalitarian.

However, absolute poverty can also be overestimated: if the poor respond, we may fail to accurately assess their “wealth” given that much of it is in kind. And it’s unlikely that these two errors – underestimation and overestimation – cancel each other out.

These and other problems of poverty measurement make it difficult to claim that we “know” more or less precisely how many poor people there are, but if we make the same errors consistently we may be able to guess, not the levels of poverty, but at least the trends: is poverty going up or down?

Measuring Human Rights (9): When “Worse” Doesn’t Necessarily Mean “Worse”

I discussed in this older post some of the problems related to the measurement of human rights violations, and to the assessment of progress or deterioration. One of the problems I mentioned is caused by improvements in measurement methods. Such improvements can in fact result in a statistic showing increasing numbers of rights violations, whereas in reality the numbers may not be increasing, and perhaps even decreasing. Better measurement means that you now compare current data that are more complete and better measured, with older numbers of rights violations that were simply incomplete.

The example I gave was about rape statistics: better statistical and reporting methods used by the police, combined with less social stigma etc. result in statistics showing a rising number of rapes, but this increase was due to the measurement methods (and other effects), not to what happened in real life.

I now came across another example. Collateral damage – or the unintentional killing of civilians during wars – seems to be higher now than a century ago (source). This may also be the result of better monitoring hiding a totally different trend. We all know that civilian deaths are much less acceptable now than they used to be, and that journalism and war reporting are probably much better (given better communication technology). Hence, people may now believe that it’s more important to count civilian deaths, and have better means to do so. As a result, the numbers of civilian deaths showing up in statistics will rise compared to older periods, but perhaps the real numbers don’t rise at all.

Of course, the increase of collateral damage may be the result of something else than better measurement: perhaps the lower level of acceptability of civilian deaths forces the army to classify some of those deaths as unintentional, even if they’re not (and then we have worse rather than better measurement). Or perhaps the relatively recent development of precision-guided munition has made the use of munition more widespread so that there are more victims: more bombs, even more precise bombs, can make more victims than less yet more imprecise bombs. Or perhaps the current form of warfare, with guerilla troops hiding among populations, does indeed produce more civilian deaths.

Still, I think my point stands: better measurement of human rights violations can give the wrong impression. Things may look as if they’re getting worse, but they’re not.

Lies, Damned Lies, and Statistics (18): Comparing Apples and Oranges

Before the introduction of tin helmets during the First World War, soldiers only had cloth hats to wear. The strange thing was that after the introduction of tin hats, the number of injuries to the head increased dramatically. Needless to say, this was counter-intuitive. The new helmets were designed precisely to avoid or limit such injuries.

Of course, people were comparing apples with oranges, namely statistics on head injuries before and after the introduction of the new helmets. In fact, what they should have done, and effectively did after they realized their mistake, was to include in the statistics, not only the injuries, but also the fatalities. After the introduction of the new helmets, the number of fatalities dropped dramatically, but the number of injuries went up because the tin helmet was saving soldiers’ lives, but the soldiers were still injured.

Lies, Damned Lies, and Statistics (16): Measuring Public Opinion in Dictatorships

Measuring human rights requires a certain level of respect for human rights (freedom to travel, freedom to speak, to interview etc.). Trying to measure human rights in situations characterized by the absence of freedom is quite difficult, and can even lead to unexpected results: the absence of (access to) good data may give the impression that things aren’t as bad as they really are. Conversely, when a measurement shows a deteriorating situation, the cause of this may simply be better access to better data. And this better access to better data may be the result of more openness in society. Deteriorating measurements may therefore signal an actual improvement. I gave an example of this dynamic here (it’s an example of statistics on violence against women).

Measuring public opinion in authoritarian countries is always difficult, but if you ask the public if they love or hate their government, it’s likely that you’ll have higher rates of “love” in the more authoritarian countries. After all, in those countries it can be pretty dangerous to tell someone in the street that you hate your government. They choose to lie and say that they approve. That’s the safest answer but probably in many cases not the real one. I don’t believe for a second that the percentage of people approving of their government is 19 times higher in Azerbaijan than in Ukraine, when Ukraine is in fact much more liberal than Azerbaijan.

In the words of Robert Coalson:

The Gallup chart is actually an index of fear. What it reflects is not so much attitudes toward the government as a willingness to openly express one’s attitudes toward the government. As one member of RFE/RL’s Azerbaijan Service told me, “If someone walked up to me in Baku and asked me what I thought about the government, I’d say it was great too”.

Measuring Human Rights (8): Measurement of the Fairness of Trials and of Expert Witnesses

An important part of the system of human rights are the rules intended to offer those accused of crimes a fair trial in court. We try to treat everyone, even suspected criminals, with fairness, and we have two principal reasons for this:

  • We only want to punish real criminals. A fair trial is one in which everything is done to avoid punishing the wrong persons. We want to avoid miscarriages of justice.
  • We also want to use court proceedings only to punish criminals and deter crime, not for political or personal reasons, as is often the case in dictatorships.

Most of these rules are included in, for example, articles 9, 10, 14 and 15 of the International Covenant on Civil and Political Rights, article 10 of the Universal Declaration, article 6 of the European Convention of Human Rights, and the Sixth Amendment to the United States Constitution.

Respect for many of these rules can be measured statistically. I’ll mention only one here: the rule regarding the intervention of expert witnesses for the defense or the prosecution. Here’s an example of the way in which this aspect of a fair trial can measured:

In the late 1990s, Harris County, Texas, medical examiner [and forensic specialist] Patricia Moore was repeatedly reprimanded by her superiors for pro-prosecution bias. … In 2004, a statistical analysis showed Moore diagnosed shaken baby syndrome (already a controversial diagnosis) in infant deaths at a rate several times higher than the national average. … One woman convicted of killing her own child because of Moore’s testimony was freed in 2005 after serving six years in prison. Another woman was cleared in 2004 after being accused because of Moore’s autopsy results. In 2001, babysitter Trenda Kemmerer was sentenced to 55 years in prison after being convicted of shaking a baby to death based largely on Moore’s testimony. The prosecutor in that case told the Houston Chronicle in 2004 that she had “no concerns” about Moore’s work. Even though Moore’s diagnosis in that case has since been revised to “undetermined,” and Moore was again reprimanded for her lack of objectivity in the case, Kemmerer remains in prison. (source)

Measuring Poverty (1): Measuring Poverty in India

The government of India uses a consumption based method to measure poverty: given that an average adult male has to eat food representing approximately 2000-2500 calories per day in order to sustain the human body, how much would it cost to buy these calories? Those who have an income that is lower than this cost, are poor.

Actually, the Indian government uses the thresholds of 2,400 calories a day in rural areas and 2,100 in urban areas. (City dwellers are thought to exert less energy, so they should need to consume less. See here).

Of course, this measure, like all measures, isn’t perfect. A person may be able to afford to buy food that contains 2,400 calories, but the quality or nutritional value of this food (in terms of vitamins etc.) may be so low that we can hardly exclude this person from the population of the poor. He or she may be able to buy 2,400 calories, but not enough nutritional value to lead a decent life.

However, I wonder whether India’s poverty measurements include only consumption of food. Poverty is more than just a nutritional issue. People may be able to buy enough food of sufficient nutritional quality, but may be left without resources for shelter, healthcare, education etc.

Lies, Damned Lies, and Statistics (11): Polarized Statistics as a Result of Self-Selection

One of the most important things in the design of an opinion survey – and opinion surveys are a common tool in data gathering in the field of human rights – is the definition of the sample of people who will be interviewed. We can only assume that the answers given by the people in the sample are representative of the opinions of the entire population if the sample is a fully random subset of the population – that means that every person in the population should have an equal chance of being part of the survey group.

Unfortunately, many surveys depend on self-selection – people get to decide themselves if they cooperate – and self-selection distorts the randomness of the sample:

Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. (source)

Self-selection is almost always a problem in online surveys (of the PollDaddy variety), phone-in surveys for television or radio shows, and so-called “red-button” surveys in which people vote with the remote control of their television set. However, it can also occur in more traditional types of surveys. When you survey the population of a brutal dictatorial state (if you get the chance) and ask the people about their freedoms and rights, many will deselect themselves: they will refuse to cooperate with the survey for fear of the consequences.

When we limit ourselves to the effects of self-selection (or self-deselection) in democratic states, we may find that this has something to do with the often ugly and stupid “us-and-them” character of much of contemporary politics. There seems to be less and less room for middle ground, compromise or nuance.

Lies, Damned Lies, and Statistics (10): How (Not) to Frame Survey Questions

I’ve mentioned before that information on human rights depends heavily on opinion surveys. Unfortunately, surveys can be wrong and misleading for so many different reasons that we have to be very careful when designing surveys and when using and interpreting survey data. One reason I haven’t mentioned before is the framing of the questions.

Even very small differences in framing can produce widely divergent answers. And there is a wide variety of problems linked to the framing of questions:

  • Questions can be leading questions, questions that suggests the answer. For example: “It’s wrong to discriminate against people of another race, isn’t it?” Or: “Don’t you agree that discrimination is wrong?”
  • Questions can be put in such a way that they put pressure on people to give a certain answer. For example: “Most reasonable people think racism is wrong. Are you one of them?” This is also a leading question of course, but it’s more than simply “leading”.
  • Questions can be confusing or easily misinterpreted. Such questions often include a negative, or, worse, a double negative. For example: “Do you agree that it isn’t wrong to discriminate under no circumstances?” Needless to say that your survey results will be infected by answers that are the opposite of what they should have been.
  • Questions can be wordy. For example: “What do you think about discrimination (a term that refers to treatment taken toward or against a person of a certain group that is based on class or category rather than individual merit) as a type of behavior that promotes a certain group at the expense of another?” This is obviously a subtype of the confusing-variety.
  • Questions can also be confusing because they use jargon, abbreviations or difficult terms. For example: “Do you believe that UNESCO and ECOSOC should administer peer-to-peer expertise regarding discrimination in an ad hoc or a systemic way?”
  • Questions can in fact be double or even triple questions, but there is only one answer required and allowed. Hence people who may have opposing answers to the two or three sub-questions will find it difficult to provide a clear answer. For example: “Do you agree that racism is a problem and that the government should do something about it?”
  • Open questions should be avoided in a survey. For example: “What do you think about discrimination?” Such questions do not yield answers that can be quantified and aggregated.
  • You also shouldn’t ask questions that exclude some possible answers, and neither should you provide a multiple-choice set of answers that doesn’t include some possible answers. For example: “How much did the government improve its anti-discrimination efforts relative to last year? Somewhat? Average? A lot?” Notice that such a framing of the question doesn’t allow people to respond that the effort had not improved or had worsened. Another example: failure to include “don’t know” as a possible answer.

Here’s a real-life example:

In one of the most infamous examples of flawed polling, a 1992 poll conducted by the Roper organization for the American Jewish Committee found that 1 in 5 Americans doubted that the Holocaust occurred. How could 22 percent of Americans report being Holocaust deniers? The answer became clear when the original question was re-examined: “Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?” This awkwardly-phrased question contains a confusing double-negative which led many to report the opposite of what they believed. Embarrassed Roper officials apologized, and later polls, asking clear, unambiguous questions, found that only about 2 percent of Americans doubt the Holocaust. (source)

Measuring Human Rights (7): Don’t Let Governments Make it Easy on Themselves

In many cases, the task of measuring respect for human rights in a country falls on the government of that country. It’s obvious that this isn’t a good idea in dictatorships: governments there will not present correct statistics on their own misbehavior. But if not the government, who else? Dictatorships aren’t known for their thriving and free civil societies, or for granting access to outside monitors. As a result, human rights protection can’t be measured.

The problem, however, of depending on governments for human rights measurement isn’t limited to dictatorships. I also gave examples of democratic governments not doing a good job in this respect. Governments, also democratic ones, tend to choose indicators they already have. For example, number of people benefiting from government food programs (they have numbers for that), neglecting private food programs for which information isn’t readily available. In this case, but in many other cases as well, governments choose indicators which are easy to measure, rather than indicators which measure what needs to be measured but which require a lot of effort and money.

Human rights measurement also fails to measure what needs to be measured when the people whose rights we want to measure don’t have a say on which indicators are best. And that happens a lot, even in democracies. Citizen participation is a messy thing and governments tend to want to avoid it, but the result may be that we’re measuring the wrong thing. For example, we think we are measuring poverty when we count the number of internet connections for disadvantaged groups, but these groups may consider the lack of cable TV or public transportation a much more serious deprivation. The reason we’re not measuring what we think we are measuring, or what we really need to measure, is not – as in the previous case – complacency, lack of budgets etc. The reason is a lack of consultation. Because there hasn’t been consultation, the definition of “poverty” used by those measuring human rights is completely different from the one used by those whose rights are to be measured. And, as a result, the indicators that have been chosen aren’t the correct ones, or they don’t show the whole picture. Many indicators chosen by governments are also too specific, measuring only part of the human right (e.g. free meals for the elderly instead of poverty levels for the elderly).

However, even if the indicators that are chosen are the correct ones – i.e. indicators that measure what needs to be measured, completely and not partially – it’s still the case that human rights measurement is extremely difficult, not only conceptually, but also and primarily on the level of execution. Not only are there many indicators to measure, but the data sources are scarce and often unreliable, even in developed countries. For example, let’s assume that we want to measure the human right not to suffer poverty, and that we agree that the best and only indicator to measure respect for this right is the level of income.* So we cleared up the conceptual difficulties. The problem now is data sources. Do you use tax data (taxable income)? We all know that there is tax fraud. Low income declared in tax returns may not reflect real poverty. Tax returns also don’t include welfare benefits etc.

Even if you manage to produce neat tables and graphs you always have to stop and think about the messy ways in which they have been produced, about the flaws and lack of completeness of the chosen indicators themselves, and about the problems encountered while gathering the data. Human rights measurement will always be a difficult thing to do, even under the best circumstances.

* This isn’t obvious. Other indicators could be level of consumption, income inequality etc. But let’s assume, for the sake of simplicity, that level of income is the best and only indicator for this right.

Lies, Damned Lies, and Statistics (7): “Drowning” Data

Suppose we want to know how many forced disappearances there are in Chechnya. Assuming we have good data this isn’t hard to do. The number of disappearances that have been registered, by the government or some NGO, is x on a total Chechen population of y, giving z%. The Russian government may decide that the better measurement is for Russia as a whole. Given that there are almost no forced disappearances in other parts of Russia, the z% goes down dramatically, perhaps close to or even below the level other comparable countries.

Good points for Russia! But that doesn’t mean that the situation in Chechnya is OK. The data for Chechnya are simply “drowned” into those of Russia, giving the impression that “overall”, Russia isn’t doing all that bad. This, however, is misleading. The proper unit of measurement should be limited to the area where the problem occurs. The important thing here isn’t a comparison of Russia with other countries; it’s an evaluation of a local problem.

Something similar happens to the evaluation of the Indian economy:

Madhya Pradesh, for example, is comparable in population and incidence of poverty to the war-torn Democratic Republic of Congo. But the misery of the DRC is much better known than the misery of Madhya Pradesh, because sub-national regions do not appear on “poorest country” lists. If Madhya Pradesh were to seek independence from India, its dire situation would become more visible immediately. …

But because it’s home to 1.1 billion people, India is more able than most to conceal the bad news behind the good, making its impressive growth rates the lead story rather than the fact that it is home to more of the world’s poor than any other country. …

A 10-year-old living in the slums of Calcutta, raising her 5-year-old brother on garbage and scraps, and dealing with tapeworms and the threat of cholera, suffers neither more nor less than a 10-year-old living in the same conditions in the slums of Lilongwe, the capital of Malawi. But because the Indian girl lives in an “emerging economy,” slated to battle it out with China for the position of global economic superpower, and her counterpart in Lilongwe lives in a country with few resources and a bleak future, the Indian child’s predicament is perceived with relatively less urgency. (source)

Measuring Human Rights (6): Don’t Make Governments Do It

In the case of dictatorial governments or other governments that are widely implicated in the violation of the rights of their citizens, it’s obvious that the task of measuring respect for human rights should be – where possible – carried out by independent non-governmental organizations, possibly even international or foreign ones (if local ones are not allowed to operate). Counting on the criminal to report on his crimes isn’t a good idea. Of course, sometimes there’s no other way. It’s often impossible to estimate census data, for example, or data on mortality, healthcare providers etc. without using official government information.

All this is rather trivial. The more interesting point, I hope, is that the same is true, to some extent, of governments that generally have a positive attitude towards human rights. Obviously, the human rights performance of these governments also has to be measured, because there are rights violations everywhere, and a positive attitude doesn’t guarantee positive results. However, even in such cases, it’s not always wise to trust governments with the task of measuring their own performance in the field of human rights. An example from a paper by Marilyn Strathern (source, gated):

In 1993, new regulations [required] local authorities in the UK … to publish indicators of output, no fewer than 152 of them, covering a variety of issues of local concern. The idea was … to make councils’ performance transparent and thus give them an incentive to improve their services. As a result, however,… even though elderly people might want a deep freeze and microwave rather than food delivered by home helps, the number of home helps [was] the indicator for helping the elderly with their meals and an authority could only improve its recognised performance of help by providing the elderly with the very service they wanted less of, namely, more home helps.

Even benevolent governments can make crucial mistakes like these. This example isn’t even a measurement error; it’s measuring the wrong thing. And the mistake wasn’t caused by the government’s will to manipulate, but by a genuine misunderstanding of what the measurement should be all about.

I think the general point I’m trying to make is that human rights measurement should take place in a free market of competing measurements – and shouldn’t be a (government) monopoly. Measurement errors are more likely to be identified if there is a possibility to compare competing measurements of the same thing.

Measuring Democracy (3): But What Kind of Democracy?

Those who want to measure whether countries are democratic or not, or want the measure to what degree countries are democratic, necessarily have to answer the question “what is democracy?”. You can’t start to measure democracy until you have answered this question, as in general you can’t start to measure anything until you have decided what it is you want to measure.

Two approaches to measuring democracy

As the concept of democracy is highly contestable – almost everyone has a different view on what it means to call a country a democracy, or to call it more or less democratic than another – it’s not surprising to see that most of the research projects that have attempted to measure democracy – such as Polity IV, Freedom House etc. – have chosen a different definition of democracy, and are, therefore, actually measuring something different. I don’t intend to give an overview of the differences between all these measures here (this is a decent attempt). What I want to do here is highlight the pros and cons of two extremely different approaches: the minimalist and the maximalist one. The former could, for example, view democracy as no more than a system of regular elections, and measure simply the presence or absence of elections in different countries. The latter, on the other hand, could include in its definition of democracy stuff like rights protections, freedom of the press, division of powers etc., and measure the presence or absence of all of these things, and aggregate the different scores in order to decide whether a country is democratic or not, and to what extent.

When measuring the democratic nature of different countries (and of course comparing them), should we use a minimalist or maximalist definition of democracy? Here are some pros and cons of either approach.

Differentiation

A minimalist definition makes it very difficult to differentiate between countries. It would make it possible to distinguish democracies (minimally defined) from non-democracies, but it wouldn’t allow to measure the degree of democracy of a given country. I believe an ordinal scale with different ranks for different levels of quality of democracy in different countries (ranging from extremely poor quality, i.e. non-democracies, to perfect democracies) is more interesting than a binary scale limited to democracy/non-democracy. The use of a maximalist definition of democracy would make it possible to rank all types of regimes on such an ordinal scale. A maximalist definition of democracy would include a relatively large number of necessary attributes of democracy, and the combination of presence/absence/partial development of each attribute would almost make it possible to give each country a unique rank in the ordinal scale. Such a wide-ranging differentiation is an advantage for progress analysis. A binary scale does not give any information on the quality of democracy. Hence, it would be better to speak of measuring democratization rather than measuring democracy. And democratization not only in the sense of a transition from authoritarian to democratic governance, but also in the sense of progress towards a deepening of democratic rule.

A minimalist definition of democracy necessarily focuses on just a few attributes of democracy. As a result, it is impossible to differentiate between degrees of “democraticness” of different countries. Moreover, the chosen attributes may not be typical of or exclusive to democracy (such as good governance or citizen influence), and may not include some necessary attributes. For example, Polity IV, perhaps the most widely used measure of democracy, does not sufficiently incorporate actual citizen participation, as opposed to the mere right of citizens to participate. I think it’s fair to say that a country that gives its citizens the right to vote but doesn’t actually have many citizens voting, can hardly be called a democracy.

Acceptability of the measurement vs controversy

A disadvantage of maximalism is that the measurement will be more open to controversy. The more attributes of democracy are included in the measure, the higher the risk of disagreement on the model of democracy. As said above, people have different ideas about the number and type of necessary attributes of a democracy, even of an ideal democracy. If the only attribute of democracy retained in the analysis is regular elections, then there will be no controversy since few people would reject this attribute.

Balancing

So we have to balance meaning against acceptability: a measurement system that is maximalist offers a lot of information and the possibility to compare countries beyond the simple dichotomy of democracy/non-democracy, but it may be rejected by those who claim that this system is not measuring democracy as they understand the word. A minimalist system, on the other hand, will measure something that is useful for many people – no one will contest that elections are necessary for democracy, for instance – but will also reduce the utility of the measurement results because it doesn’t yield a lot of information about countries.