Measuring Human Rights (26): Measuring Murder

Murder should be easy to measure. Unlike many other crimes or rights violations, the evidence is clear and painstakingly recorded: there is a body, at least in most cases; police seldom fail to notice a murder; and relatives or friends of the victim rarely fail to report the crime. So even if we are not always able to find and punish murderers, we should at least know how many murders there are.

And yet, even this most obvious of crimes can be hard to measure. In poorer countries, police departments may not have the means necessary to record homicides correctly and completely. Families may be weary of reporting homicides for fear of corrupt police officers entering their homes and using the occasion to extort bribes. Civil wars make it difficult to collect any data, including crime data. During wartime, homicides may not be distinguishable from casualties of the war.

And there’s more. Police departments in violent places may be under pressure to bring down crime stats and may manipulate the data as a result: moving some dubious murder cases to categories such as “accidents”, “manslaughter”, “suicide” etc.

Homicides usually take place in cities, hence the temptation to rank cities according to homicide rates. But cities differ in the way they determine their borders: suburbs may be included or not, or partially, and this affects homicide rates since suburbs tend to be less violent. Some cities have more visitors than other cities (more commuters, tourists, business trips) and visitors are usually not counted as “population” while they may also be at risk of murder.

In addition, some ideologies may cause distortions in the data. Does abortion count as murder? Honor killings? Euthanasia and  assisted suicide? Laws and opinions about all this vary between jurisdictions and introduce biases in country comparisons.

And, finally, countries with lower murder rates may not be less violent; they may just have better emergency healthcare systems allowing them to save potential murder victims.

So, if even the most obvious of human rights violations is difficult to measure, you can guess the quality of other indicators.

Measuring Human Rights (24): Measuring Racism, Ctd.

Measuring racism is a problem, as I’ve argued before. Asking people if they’re racist won’t work because they don’t answer this question correctly, and understandably so. This is due to the social desirability bias. Surveys may minimize this bias if they approach the subject indirectly. For example, rather than simply asking people if they are racist or if they believe blacks are inferior, surveys could ask some of the following questions:

  • Do you believe God has created the races separately?
  • What do you believe are the reasons for higher incarceration rates/lower IQ scores/… among blacks?
  • Etc.

Still, no guarantee that bias won’t falsify the results. Maybe it’s better to dump the survey method altogether and go for something even more indirect. For example, you can measure

  • racism in employment decisions, such as numbers of callbacks received by applicants with black sounding names
  • racism in criminal justice, for example the degree to which black federal lower-court judges are overturned more often than cases authored by similar white judges, or differences in crime rates by race of the perpetrator, or jury behavior
  • racial profiling
  • residential racial segregation
  • racist consumer behavior, e.g. reluctance to buy something from a black seller
  • the numbers of interracial marriages
  • the numbers and membership of hate groups
  • the number of hate crimes
  • etc.

A disadvantage of many of these indirect measurements is that they don’t necessarily reflect the beliefs of the whole population. You can’t just extrapolate the rates you find in these measurements. It’s not because some judges and police officers are racist that the same rate of the total population is racist. Not all people who live in predominantly white neighborhoods do so because they don’t want to live in mixed neighborhoods. Different crime rates by race can be an indicator of racist law enforcement, but can also hide other causes, such as different poverty rates by race (which can themselves be indicators of racism). Higher numbers of hate crimes or hate groups may represent a radicalization of an increasingly small minority. And so on.

Another alternative measurement system is the Implicit Association Test. This is a psychological test that measures implicit attitudes and beliefs that people are either unwilling or unable to report.

Because the IAT requires that users make a series of rapid judgments, researchers believe that IAT scores may reflect attitudes which people are unwilling to reveal publicly. (source)

Participants in an IAT are asked to rapidly decide which words are associated. For example, is “female” or “male” associated with “family” and “career” respectively? This way, you can measure the strength of association between mental constructs such as “female” or “male” on the one hand and attributes such as “family” or “career” on the other. And this allows you to detect prejudice. The same is true for racism. You can read here or here how an IAT is usually performed.

Yet another measurement system uses evidence from Google search data, such as in this example. The advantage of this system is that it avoids the social desirability bias, since Google searches are done alone and online and without prior knowledge of the fact that the search results will be used to measure racism. Hence, people searching on Google are more likely to express social taboos. In this respect, the measurement system is similar to the IAT. Another advantage of the Google method, compared to traditional surveys, is that the Google sample is very large and more or less evenly distributed across all areas of a country. This allows for some fine grained geographical breakdown of racial animus.

More specifically, the purpose of the Google method is to analyze trends in searches that include words like “nigger” or “niggers” (not “nigga” because that’s slang in some Black communities, and not necessarily a disparaging term). In order to avoid searches for the term “nigger” by people who may not be racially motivated – such as researchers (Google can’t tell the difference) – you could refine the method and analyze only searches for phrases like “why are niggers lazy”, “Obama+nigger”, “niggers/blacks+apes” etc. If you find that those searches are more common in some locations than others, or that they become more common in some locations, then you can try to correlate those findings with other, existing indicators of racism such as those cited above, or with historic indicators such as prevalence of slavery or lynchings.

More posts in this series are here.

Measuring Human Rights (20): What is More Important, the Number or Percentage of People Suffering Human Rights Violations?

Take just one human right, the right not to suffer poverty: if we want to measure progress for this human right, we get something like the following fact:

[N]ever in the world have there been so many paupers as in the present times. But the reason of this is that there have never been so many people around. Indeed never in the history of the world has been the percentage of poor people been so low. (source)

So, is this good news or bad news? If it’s more important to reduce the share of the world population suffering a particular type of rights violation, then this is good news. On the other hand, there are now more people – in absolute, not in relative numbers – suffering from poverty. If we take individuals and the distinctions between persons seriously, we should conclude that this is bad news and we’re doing worse than before.

Thomas Pogge has argued for the latter view. Take another example: killing a given number of people doesn’t become less troubling if the world’s population increases. If we would discover that the real number of the world’s population at the time of the Holocaust was twice as large as previously assumed, that wouldn’t diminish the importance of the Holocaust. What matters is the absolute number of people suffering.

On the other hand, if we see that policies and interventions lead to a significant lowering of the proportion of people in poverty – or suffering from any other type of rights violation – between times t and t+n, then we would welcome that, and we would certainly want to know it. The fact that the denominator – total world population – has increased in the mean time, is probably something that has happened independently of those policies. In the specific case of poverty, a growing population can even make a decrease in relative numbers of people suffering from poverty all the more admirable. After all, many still believe (erroneously) in the Malthusian trap theory, which states that population growth necessarily leads to increases in poverty in absolute numbers.

More posts in this series are here.

Measuring Human Rights (17): Human Rights and Progress

We’re all aware of the horrors of recent history. The 20th century doesn’t get a good press. And yet, most of us still think that humanity is, on average, much better off today  than it was some centuries or millennia ago. The holocaust, Rwanda, Hiroshima, AIDS, terrorism etc. don’t seem to have discouraged the idea of human progress in popular imagination. Those have been disasters of biblical proportions, and yet they are seen as temporary lapses, regrettable but exceptional incidents that did not jeopardize the overall positive evolution of mankind. Some go even further and call these events instances of “progressive violence”: disasters so awful that they bring about progress. Hitler was necessary in order to finally make Germany democratic. The Holocaust was necessary to give the Jews their homeland and the world the Universal Declaration. Evil has to become so extreme that it finally convinces humanity that evil should be abolished.

While that is obviously ludicrous, it’s true that there has been progress:

  • we did practically abolish slavery
  • torture seems to be much less common and much more widely condemned, despite the recent uptick
  • poverty is on the retreat
  • equality has come within reach for non-whites, women and minorities of different kinds
  • there’s a real reduction in violence over the centuries
  • war is much less common and much less bloody
  • more and more countries are democracies and freedom is much more widespread
  • there’s more free speech because censorship is much more difficult now thanks to the internet
  • health and labor conditions have improved for large segments of humanity, resulting in booming life expectancy
  • etc.

So, for a number of human rights, things seem to be progressing quite a lot. Of course, there are some areas of regress: the war on terror, gendercide, islamism etc. Still, those things don’t seem to be weighty enough to discourage the idea of progress, which is still quite popular. On the other hand, some human rights violations were caused by elements of human progress. The Holocaust, for example, would have been unimaginable outside of our modern industrial society. Hiroshima and Mutually Assured Destruction are other examples. Both nazism and communism are “progressive” philosophies in the sense that they believe that they are working for a better society.

Whatever the philosophical merits of the general idea of progress, progress in the field of respect for human rights boils down to a problem of measurement. How doe we measure the level of respect for the whole of the set of human rights? It’s difficult enough to measure respect for the present time, let alone for previous periods in human history for which data are incomplete or even totally absent. Hence, general talk about progress in the field of human rights is probably impossible. More specific measurements of parts of the system of human rights are more likely to succeed, but only for relatively recent time frames.

A Killer Argument Against the Quantitative Approach to Human Rights?

Joseph Stalin wasn’t a very nice man. Among his lesser sins was his disdain for statistics: “kill a man and it’s a tragedy, kill a million and it’s a statistic”. What he meant of course was not just that it’s a statistic, but also that it’s not very important. Who cares if Stalin or anyone else killed one million, 10 million or 6,321,012? People care about actual persons, not numbers. (Actually, it’s a misquote; he never really said it).

Regular readers of this blog immediately recognize this as a frontal attack on our main project, the quantitative approach to human rights. I believe that it’s very important to have statistics and other quantitative data on human rights violations if we want to measure progress on human rights. In other words, I do care about numbers. We need to know how many people die of hunger, how many live in poverty etc. so that we can assess the quality and impact of our policies.

Now, I have to admit that Stalin was on to something. Numbers don’t carry a lot of meaning and don’t engender empathy. Powerful anecdotes about the fate of individual persons, testimonies and other narratives about concrete cases make it more likely that people start to care. If you tell school children for example that an estimated 850,000 people died during the Rwandan genocide or that less than 20% of China’s citizens now live on less than $1 a day compared to 80% 30 years ago, they will probably register this information, but they will only really start to care about genocide or poverty when they read about the stories of individuals. If you focus on human rights violations as quantities you may end up viewing human beings as quantities as well, and then you lose the motivating power of the individual story. There’s no room for differences between cases if you focus on numbers, and there are no individual and motivating stories without differences between cases.

The same argument against the abstraction and lack of meaning in numbers can be used against human rights talk in general, and not just quantitative talk. Human rights talk, like number talk, is abstract, devoid of specific personal stories. It’s talk about a biological species and the rights that it has, not about persons. The lists of human rights in treaties and declarations are very general and abstract sentences separated from specific circumstances and people, as they have to be. Human rights make differences between people morally irrelevant, and they have to do so otherwise you end up with privileges instead of human rights. However, we may end up not with the desired equality of rights but with sameness and interchangeable specimens of a biological species. And then we lose the motivating power of very specific and personal stories about suffering and oppression.

The answer to this challenge against number talk and rights talk is obvious, however: one approach doesn’t exclude the other. Numbers and abstractions may not be very motivating but they can help to assess the success of people who are otherwise motivated. And some of us may be motivated by numbers after all.

Measuring Human Rights (15): Measuring Segregation Using the Dissimilarity Index

If people tend to live, work, eat or go to school together with other members of their group – race, gender etc. – then we shouldn’t automatically assume that this is caused by discrimination, forced separation, restrictions on movement or choice of residence, or other kinds of human rights violations. It can be their free choice. However, if it’s not, then we usually call it segregation and we believe it’s a moral wrong that should be corrected. People have a right to live where they want, go to school where they want, and move freely about (with some restrictions necessary to protect the property rights and the freedom of association of others). If they are prohibited from doing so, either by law (e.g. Jim Crow) or by social pressure (e.g. discrimination by landlords or employers), then government policy and legislation should step in in order to better protect people’s rights. Forced desegregation is then an option, and this can take various forms, such as anti-discrimination legislation in employment and rent, forced integration of schools, busing, zoning laws, subsidized housing etc.

There’s also some room for intervention when segregation is not the result of conscious, unconscious, legal or social discrimination. For example, poor people tend to be segregated in poor districts, not because other people make it impossible for them to live elsewhere but because their poverty condemns them to certain residential areas. The same is true for schooling. In order to avoid poverty traps or membership poverty, it’s better to do something about that as well.

In all such cases, the solution should not necessarily be found in physical desegregation, i.e. forcibly moving people about. Perhaps the underlying causes of segregation, rather than segregation itself, should be tackled. For example, rather than moving poor children to better schools or poor families to better, subsidized housing, perhaps we should focus on their poverty directly.

However, before deciding what to do about segregation, we have to know its extent. Is it a big problem, or a minor one? How does it evolve? Is it getting better? How segregated are residential areas, schools, workplaces etc.? And to what extent is this segregation involuntary? The latter question is a hard one, but the others can be answered. There are several methods for measuring different kinds of segregation. The most popular measure of residential segregation is undoubtedly the so-called index of dissimilarity. If you have a city, for example, that is divide into N districts (or sections, census tracts or whatever), the dissimilarity index measures the percentage of a group’s population that would have to change districts for each district to have the same percentage of that group as the whole city.

The dissimilarity index is not perfect, mainly because it depends on the sometimes arbitrary way in which cities are divided into districts or sections. Which means that modifying city partitions can influence levels of “segregation”, which is not something we want. Take this extreme example. You can show the same city twice, with two different partitions, A and B situation. No one has moved residency between situations A and B, but the district boundaries have been altered radically. In situation A with the districts drawn in a certain way, there is no segregation (dissimilarity index of 0). But in situation B, with the districts drawn differently, there is complete segregation (index = 1), although no one has physically moved. That’s why other, complementary measures are probably necessary for correct information about levels of segregation. Some of those measures are proposed here and here.

Measuring Human Rights (13): When More Means Less and Vice Versa

Human rights violations can make it difficult to measure human rights violations, and can distort international comparisons of the levels of respect for human rights. Country A, which is generally open and accessible and on average respects basic rights such as speech, movement and press fairly well, may be more in the spotlight of human rights groups than country B which is borderline totalitarian. And not just more in the spotlight: attempts to quantify or measure respect for human rights may in fact yield a score that is worse for A than for B, or at least a score that isn’t much better for A than for B. The reason is of course the openness of A:

  • Human rights groups, researchers and statisticians can move and speak relatively freely in A.
  • The citizens of A aren’t scared shitless by their government and will speak to outsiders.
  • Country A may even have fostered a culture of public discourse, to some extent. Perhaps its citizens are also better educated and better able to analyze political conditions.
  • As Tocqueville has famously argued, the more a society liberates itself from inequalities, the harder it becomes to bear the remaining inequalities. Conversely, people in country B may not know better or may have adapted their ambitions to the rule of oppression. So, citizens of A may have better access to human rights groups to voice their complaints, aren’t afraid to do so, can do so because they are relatively well educated, and will do so because their circumstances seem more outrageous to them even if they really aren’t. Another reason to overestimate rights violations in A and underestimate them in B.
  • The government administration of A may also be more developed, which often means better data on living conditions. And better data allow for better human rights measurement. Data in country B may be secret or non-existent.

I called all this the catch 22 of human rights measurement: in order to measure whether countries respect human rights, you already need respect for human rights. Investigators or monitors must have some freedom to control, to engage in fact finding, to enter countries and move around, to investigate “in situ”, to denounce etc., and victims should have the freedom to speak out and to organize themselves in pressure groups. So we assume what we want to establish. (A side-effect of this is that authoritarian leaders may also be unaware of the extent of suffering among their citizens).

You can see the same problem in the common complaints that countries such as the U.S. and Israel get a raw deal from human rights groups:

[W]hy would the watchdogs neglect authoritarians? We asked both Human Rights Watch and Amnesty, and received similar replies. In some cases, staffers said, access to human rights victims in authoritarian countries was impossible, since the country’s borders were sealed or the repression was too harsh (think North Korea or Uzbekistan). In other instances, neglected countries were simply too small, poor, or unnewsworthy to inspire much media interest. With few journalists urgently demanding information about Niger, it made little sense to invest substantial reporting and advocacy resources there. … The watchdogs can and do seek to stimulate demand for information on the forgotten crises, but this is an expensive and high risk endeavor. (source)

So there may also be a problem with the supply and demand curve in media: human rights groups want to influence public opinion, but can only do so with the help of the media. If the media neglect certain countries or problems because they are deemed “unnewsworthy”, then human rights groups will not have an incentive to monitor those countries or problems. They know that what they will be able to tell will fall on deaf ears anyway. So better focus on the things and the countries which will be easier to channel through the media.

Both the catch 22 problem and the problems caused by media supply and demand can be empirically tested by comparing the intensity of attention given by human rights monitoring organizations to certain countries/problems to the intensity of human rights violations (the latter data are assumed to be available, which is a big assumption, but one could use very general measures such as these). It seems that both effects are present but not much:

[W]e subjected the 1986-2000 Amnesty [International] data to a barrage of statistical tests. (Since Human Rights Watch’s early archival procedures seemed spotty, we did not include their data in our models.) Amnesty’s coverage, we found, was driven by multiple factors, but contrary to the dark rumors swirling through the blogosphere, we discovered no master variable at work. Most importantly, we found that the level of actual violations mattered. Statistically speaking, Amnesty reported more heavily on countries with greater levels of abuse. Size also mattered, but not as expected. Although population didn’t impact reporting much, bigger economies did receive more coverage, either because they carried more weight in global politics and economic affairs, or because their abundant social infrastructure produced more accounts of abuse. Finally, we found that countries already covered by the media also received more Amnesty attention. (source)

More posts in this series are here.

Religion and Human Rights (28): Is Religion Particularly Violent?

9/11 and other terrorist attacks apparently motivated by Islamic beliefs has led to an increased hostility towards Islam, but also towards religion in general. Perhaps in an effort to avoid the charge of islamophobia, many anti-jihadists have taken a new look at the violent history of other religions, particularly Christianity, and concluded that religion per se, because of the concomitant belief in the absolute truth of God’s words and rules, automatically leads to the violent imposition of this belief on unwilling fellow human beings, or – if that doesn’t work – the murderous elimination of persistent sinners. This has given rise to a movement called the new atheists. The charge of fanatical and violent absolutism inherent in religion is of course an old one, but it has been revitalized after 9/11 and the war on terror. I think it’s no coincidence that many of the new atheists are also anti-jihadists (take Christopher Hitchens for example).

There are many things wrong with question in the title of this blogpost. (And – full disclosure – this isn’t part of a self-interested defense of religion, since I’m an agnostic). First of all, it glosses over the fact that there isn’t such a thing as “religion”. There are many religions, and perhaps it can be shown that some of them produce a disproportionate level of violence, but religion as such is a notoriously vague concept. Nobody seems to agree on what it is. Even the God-entity isn’t a required element of the definition of religion, except if you want to take the improbable position that Buddhism isn’t a religion. All sorts of things can reasonably be put in the container concept of “religion” – the Abrahamic religions as well as Wicca and Jediism. The claim that “religion is violent” implies that all or most religions are equally violent, which is demonstrably false.

That leaves the theoretical possibility that some religions are more violent than others. If that claim can be shown to be true, islamophobia may perhaps be a justified opinion, but not the outright rejection of religion inherent in new atheism (which, of course, has other arguments against religion besides religion’s supposed violent character). However, how can it be shown empirically and statistically that a certain religion – say Islam – is relatively more violent than other religions? In order to do so you would need to have data showing that Islam today (or, for that matter, Christianity in the age of the crusades and the inquisition) is the prime or sole motive behind a series of violent attacks. But how do you know that the violent actor was motivated solely or primarily by his religious beliefs? Because he has a Muslim name? Speaks Arabic? Looks a certain way? Professes his religious motivation? All that is not enough to claim that he wasn’t motivated by a combination of religious beliefs and political or economic grievances for instance, or by something completely unconnected to religion, despite his statements to the contrary.

Now let’s assume, arguendo, that this isn’t a problem, and that it is relatively easy and feasible to identify a series of violent attacks that are indisputably motivated solely or primarily by certain religious beliefs. How can you go from such a series to a quantified comparison that says “the religion behind this series of attacks – say again Islam – is particularly violent”? That seems to be an unwarranted generalization based on a sample that is by definition very small (given the long history of most religions and the lack of data on motivations, especially for times that have long since passed). Also, it supposes a comparison with other causes of violence, for example other religions, other non-religious belief systems, character traits, economic circumstances etc. After all, the point of this hypothetical study is not to show that (a) religion can lead to bad things. That’s seldom disputed. Everything can lead to bad things, including fanatical atheism (and don’t tell me communism and fascism were “really” religions; the word “religion” is vague, but probably not as vague as that – which doesn’t mean that there aren’t any religious elements in those two world-views). The claim we’re discussing here is that (a) religion – because of its fanatical absolutism and trust in God’s truth – is particularly violent, i.e. more violent than other belief systems, and hence very dangerous and to be repudiated.

I think it’s useless, from a purely mathematical and scientific point of view, to engage in such a comparative quantification, given the obvious problems of identifying true motivations, especially for long periods of time in the past. There’s just no way that you can measure religious violence, compare it to “other violence”, and claim it is more (or less) violent. So the question in the title is a nonsensical one, I think, even if you limit it to one particular religion rather than to religion in general. That doesn’t mean it can’t be helpful to know the religious motives of certain particular acts of violence. It’s always good to know the motives of violence if you want to do something about it. What it means is that such knowledge is no reason to generalize on the violent nature of a religion, let alone religion as such. That would not only obscure other motives – which is never helpful – but it would also defy our powers of quantification.

Measuring Human Rights (9): When “Worse” Doesn’t Necessarily Mean “Worse”

I discussed in this older post some of the problems related to the measurement of human rights violations, and to the assessment of progress or deterioration. One of the problems I mentioned is caused by improvements in measurement methods. Such improvements can in fact result in a statistic showing increasing numbers of rights violations, whereas in reality the numbers may not be increasing, and perhaps even decreasing. Better measurement means that you now compare current data that are more complete and better measured, with older numbers of rights violations that were simply incomplete.

The example I gave was about rape statistics: better statistical and reporting methods used by the police, combined with less social stigma etc. result in statistics showing a rising number of rapes, but this increase was due to the measurement methods (and other effects), not to what happened in real life.

I now came across another example. Collateral damage – or the unintentional killing of civilians during wars – seems to be higher now than a century ago (source). This may also be the result of better monitoring hiding a totally different trend. We all know that civilian deaths are much less acceptable now than they used to be, and that journalism and war reporting are probably much better (given better communication technology). Hence, people may now believe that it’s more important to count civilian deaths, and have better means to do so. As a result, the numbers of civilian deaths showing up in statistics will rise compared to older periods, but perhaps the real numbers don’t rise at all.

Of course, the increase of collateral damage may be the result of something else than better measurement: perhaps the lower level of acceptability of civilian deaths forces the army to classify some of those deaths as unintentional, even if they’re not (and then we have worse rather than better measurement). Or perhaps the relatively recent development of precision-guided munition has made the use of munition more widespread so that there are more victims: more bombs, even more precise bombs, can make more victims than less yet more imprecise bombs. Or perhaps the current form of warfare, with guerilla troops hiding among populations, does indeed produce more civilian deaths.

Still, I think my point stands: better measurement of human rights violations can give the wrong impression. Things may look as if they’re getting worse, but they’re not.

Lies, Damned Lies, and Statistics (18): Comparing Apples and Oranges

Before the introduction of tin helmets during the First World War, soldiers only had cloth hats to wear. The strange thing was that after the introduction of tin hats, the number of injuries to the head increased dramatically. Needless to say, this was counter-intuitive. The new helmets were designed precisely to avoid or limit such injuries.

Of course, people were comparing apples with oranges, namely statistics on head injuries before and after the introduction of the new helmets. In fact, what they should have done, and effectively did after they realized their mistake, was to include in the statistics, not only the injuries, but also the fatalities. After the introduction of the new helmets, the number of fatalities dropped dramatically, but the number of injuries went up because the tin helmet was saving soldiers’ lives, but the soldiers were still injured.

Measuring Human Rights (8): Measurement of the Fairness of Trials and of Expert Witnesses

An important part of the system of human rights are the rules intended to offer those accused of crimes a fair trial in court. We try to treat everyone, even suspected criminals, with fairness, and we have two principal reasons for this:

  • We only want to punish real criminals. A fair trial is one in which everything is done to avoid punishing the wrong persons. We want to avoid miscarriages of justice.
  • We also want to use court proceedings only to punish criminals and deter crime, not for political or personal reasons, as is often the case in dictatorships.

Most of these rules are included in, for example, articles 9, 10, 14 and 15 of the International Covenant on Civil and Political Rights, article 10 of the Universal Declaration, article 6 of the European Convention of Human Rights, and the Sixth Amendment to the United States Constitution.

Respect for many of these rules can be measured statistically. I’ll mention only one here: the rule regarding the intervention of expert witnesses for the defense or the prosecution. Here’s an example of the way in which this aspect of a fair trial can measured:

In the late 1990s, Harris County, Texas, medical examiner [and forensic specialist] Patricia Moore was repeatedly reprimanded by her superiors for pro-prosecution bias. … In 2004, a statistical analysis showed Moore diagnosed shaken baby syndrome (already a controversial diagnosis) in infant deaths at a rate several times higher than the national average. … One woman convicted of killing her own child because of Moore’s testimony was freed in 2005 after serving six years in prison. Another woman was cleared in 2004 after being accused because of Moore’s autopsy results. In 2001, babysitter Trenda Kemmerer was sentenced to 55 years in prison after being convicted of shaking a baby to death based largely on Moore’s testimony. The prosecutor in that case told the Houston Chronicle in 2004 that she had “no concerns” about Moore’s work. Even though Moore’s diagnosis in that case has since been revised to “undetermined,” and Moore was again reprimanded for her lack of objectivity in the case, Kemmerer remains in prison. (source)

Lies, Damned Lies, and Statistics (11): Polarized Statistics as a Result of Self-Selection

One of the most important things in the design of an opinion survey – and opinion surveys are a common tool in data gathering in the field of human rights – is the definition of the sample of people who will be interviewed. We can only assume that the answers given by the people in the sample are representative of the opinions of the entire population if the sample is a fully random subset of the population – that means that every person in the population should have an equal chance of being part of the survey group.

Unfortunately, many surveys depend on self-selection – people get to decide themselves if they cooperate – and self-selection distorts the randomness of the sample:

Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. (source)

Self-selection is almost always a problem in online surveys (of the PollDaddy variety), phone-in surveys for television or radio shows, and so-called “red-button” surveys in which people vote with the remote control of their television set. However, it can also occur in more traditional types of surveys. When you survey the population of a brutal dictatorial state (if you get the chance) and ask the people about their freedoms and rights, many will deselect themselves: they will refuse to cooperate with the survey for fear of the consequences.

When we limit ourselves to the effects of self-selection (or self-deselection) in democratic states, we may find that this has something to do with the often ugly and stupid “us-and-them” character of much of contemporary politics. There seems to be less and less room for middle ground, compromise or nuance.

Lies, Damned Lies, and Statistics (10): How (Not) to Frame Survey Questions

I’ve mentioned before that information on human rights depends heavily on opinion surveys. Unfortunately, surveys can be wrong and misleading for so many different reasons that we have to be very careful when designing surveys and when using and interpreting survey data. One reason I haven’t mentioned before is the framing of the questions.

Even very small differences in framing can produce widely divergent answers. And there is a wide variety of problems linked to the framing of questions:

  • Questions can be leading questions, questions that suggests the answer. For example: “It’s wrong to discriminate against people of another race, isn’t it?” Or: “Don’t you agree that discrimination is wrong?”
  • Questions can be put in such a way that they put pressure on people to give a certain answer. For example: “Most reasonable people think racism is wrong. Are you one of them?” This is also a leading question of course, but it’s more than simply “leading”.
  • Questions can be confusing or easily misinterpreted. Such questions often include a negative, or, worse, a double negative. For example: “Do you agree that it isn’t wrong to discriminate under no circumstances?” Needless to say that your survey results will be infected by answers that are the opposite of what they should have been.
  • Questions can be wordy. For example: “What do you think about discrimination (a term that refers to treatment taken toward or against a person of a certain group that is based on class or category rather than individual merit) as a type of behavior that promotes a certain group at the expense of another?” This is obviously a subtype of the confusing-variety.
  • Questions can also be confusing because they use jargon, abbreviations or difficult terms. For example: “Do you believe that UNESCO and ECOSOC should administer peer-to-peer expertise regarding discrimination in an ad hoc or a systemic way?”
  • Questions can in fact be double or even triple questions, but there is only one answer required and allowed. Hence people who may have opposing answers to the two or three sub-questions will find it difficult to provide a clear answer. For example: “Do you agree that racism is a problem and that the government should do something about it?”
  • Open questions should be avoided in a survey. For example: “What do you think about discrimination?” Such questions do not yield answers that can be quantified and aggregated.
  • You also shouldn’t ask questions that exclude some possible answers, and neither should you provide a multiple-choice set of answers that doesn’t include some possible answers. For example: “How much did the government improve its anti-discrimination efforts relative to last year? Somewhat? Average? A lot?” Notice that such a framing of the question doesn’t allow people to respond that the effort had not improved or had worsened. Another example: failure to include “don’t know” as a possible answer.

Here’s a real-life example:

In one of the most infamous examples of flawed polling, a 1992 poll conducted by the Roper organization for the American Jewish Committee found that 1 in 5 Americans doubted that the Holocaust occurred. How could 22 percent of Americans report being Holocaust deniers? The answer became clear when the original question was re-examined: “Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?” This awkwardly-phrased question contains a confusing double-negative which led many to report the opposite of what they believed. Embarrassed Roper officials apologized, and later polls, asking clear, unambiguous questions, found that only about 2 percent of Americans doubt the Holocaust. (source)

Measuring Human Rights (7): Don’t Let Governments Make it Easy on Themselves

In many cases, the task of measuring respect for human rights in a country falls on the government of that country. It’s obvious that this isn’t a good idea in dictatorships: governments there will not present correct statistics on their own misbehavior. But if not the government, who else? Dictatorships aren’t known for their thriving and free civil societies, or for granting access to outside monitors. As a result, human rights protection can’t be measured.

The problem, however, of depending on governments for human rights measurement isn’t limited to dictatorships. I also gave examples of democratic governments not doing a good job in this respect. Governments, also democratic ones, tend to choose indicators they already have. For example, number of people benefiting from government food programs (they have numbers for that), neglecting private food programs for which information isn’t readily available. In this case, but in many other cases as well, governments choose indicators which are easy to measure, rather than indicators which measure what needs to be measured but which require a lot of effort and money.

Human rights measurement also fails to measure what needs to be measured when the people whose rights we want to measure don’t have a say on which indicators are best. And that happens a lot, even in democracies. Citizen participation is a messy thing and governments tend to want to avoid it, but the result may be that we’re measuring the wrong thing. For example, we think we are measuring poverty when we count the number of internet connections for disadvantaged groups, but these groups may consider the lack of cable TV or public transportation a much more serious deprivation. The reason we’re not measuring what we think we are measuring, or what we really need to measure, is not – as in the previous case – complacency, lack of budgets etc. The reason is a lack of consultation. Because there hasn’t been consultation, the definition of “poverty” used by those measuring human rights is completely different from the one used by those whose rights are to be measured. And, as a result, the indicators that have been chosen aren’t the correct ones, or they don’t show the whole picture. Many indicators chosen by governments are also too specific, measuring only part of the human right (e.g. free meals for the elderly instead of poverty levels for the elderly).

However, even if the indicators that are chosen are the correct ones – i.e. indicators that measure what needs to be measured, completely and not partially – it’s still the case that human rights measurement is extremely difficult, not only conceptually, but also and primarily on the level of execution. Not only are there many indicators to measure, but the data sources are scarce and often unreliable, even in developed countries. For example, let’s assume that we want to measure the human right not to suffer poverty, and that we agree that the best and only indicator to measure respect for this right is the level of income.* So we cleared up the conceptual difficulties. The problem now is data sources. Do you use tax data (taxable income)? We all know that there is tax fraud. Low income declared in tax returns may not reflect real poverty. Tax returns also don’t include welfare benefits etc.

Even if you manage to produce neat tables and graphs you always have to stop and think about the messy ways in which they have been produced, about the flaws and lack of completeness of the chosen indicators themselves, and about the problems encountered while gathering the data. Human rights measurement will always be a difficult thing to do, even under the best circumstances.

* This isn’t obvious. Other indicators could be level of consumption, income inequality etc. But let’s assume, for the sake of simplicity, that level of income is the best and only indicator for this right.

Lies, Damned Lies, and Statistics (6): Statistical Bias in the Design and Execution of Surveys

Statisticians can – wittingly or unwittingly – introduce bias in their work. Take the case of surveys for instance. Two important steps in the design of a survey are the definition of the population and the selection of the sample. As it’s often impossible (and undesirable) to question a whole population, statisticians usually select a sample from the population and ask their questions only to the people in this sample. They assume that the answers given by the people in the sample are representative of the opinions of the entire population.

Bias can be introduced

  • at the moment of the definition of the population
  • at the moment of the selection of the sample
  • at the moment of the execution of the survey (as well as at other moments of the statistician’s work, which I won’t mention here).

Population

Let’s take a fictional example of a survey. Suppose statisticians want to measure public opinion regarding the level of respect for human rights in the country called Dystopia.

First, they set about defining their “population”, i.e. the group of people whose “public opinion” they want to measure. “That’s easy”, you think. So do they, unfortunately. It’s the people living in this country, of course, or is it?

Not quite. Suppose the level of rights protection in Dystopia is very low, as you might expect. That means that probably many people have fled the country. Including in the survey population only the residents of the country will then overestimate the level of rights protection. And there is another point: dead people can’t talk. We can assume that many victims of rights violations are dead because of them. Not including these dead people in the survey will also artificially push up the level of rights protection. (I’ll mention in a moment how it is at all possible to include dead people in a survey; bear with me).

Hence, doing a survey and then assuming that the people who answered the survey are representative for the whole population, means discarding the opinions of refugees and dead people. If those opinions were included the results would be different and more correct. Of course, in the case of dead people it’s obviously impossible to include their opinions, but perhaps it would be advisable to make a statistical correction for it. After all, we know their answers: people who died because of rights violations in their country presumably wouldn’t have a good opinion of their political regime.

Sample

And then there are the problem linked to the definition of the sample. An unbiased sample should represent a fully random subset of the entire and correctly defined population (needless to say that if the population is defined incorrectly, as in the example above, then the sample is by definition also biased even if no sampling mistakes have been made). That means that every person in the population should have an equal chance of being chosen. That means that there shouldn’t be self-selection (a typical flaw in many if not all internet surveys of the “Polldaddy” variety) or self-deselection. The latter is very likely in my Dystopia example. People who are too afraid to talk won’t talk. The harsher the rights violations, the more people who will fail to cooperate. So you have a perverse effect that very cruel regimes may score better on human rights surveys that modestly cruel regimes. The latter are cruel, but not cruel enough to scare the hell out of people.

The classic sampling error is from a poll on the 1948 Presidential election in the U.S.

On Election night, the Chicago Tribune printed the headline DEWEY DEFEATS TRUMAN, which turned out to be mistaken. In the morning the grinning President-Elect, Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a phone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (source)

Execution

Another reason why bias in the sampling may occur is the way in which the surveys are executed. If the government of Dystopia allows statisticians to operate on its territory, it will probably not allow them to operate freely, or circumstances may not permit them to operate freely. So the people doing the interviews are not allowed to, or don’t dare to, travel around the country. Hence they themselves deselect entire groups from the survey, distorting the randomness of the sample. Again, the more repressive the regime, the more this happens. With possible adverse effects. The people who can be interviewed are perhaps only those living in urban areas, close to the residence of the statisticians. And those living there may have a relatively large stake in the government, which makes them paint a rosy image of the regime.

Measuring Human Rights (6): Don’t Make Governments Do It

In the case of dictatorial governments or other governments that are widely implicated in the violation of the rights of their citizens, it’s obvious that the task of measuring respect for human rights should be – where possible – carried out by independent non-governmental organizations, possibly even international or foreign ones (if local ones are not allowed to operate). Counting on the criminal to report on his crimes isn’t a good idea. Of course, sometimes there’s no other way. It’s often impossible to estimate census data, for example, or data on mortality, healthcare providers etc. without using official government information.

All this is rather trivial. The more interesting point, I hope, is that the same is true, to some extent, of governments that generally have a positive attitude towards human rights. Obviously, the human rights performance of these governments also has to be measured, because there are rights violations everywhere, and a positive attitude doesn’t guarantee positive results. However, even in such cases, it’s not always wise to trust governments with the task of measuring their own performance in the field of human rights. An example from a paper by Marilyn Strathern (source, gated):

In 1993, new regulations [required] local authorities in the UK … to publish indicators of output, no fewer than 152 of them, covering a variety of issues of local concern. The idea was … to make councils’ performance transparent and thus give them an incentive to improve their services. As a result, however,… even though elderly people might want a deep freeze and microwave rather than food delivered by home helps, the number of home helps [was] the indicator for helping the elderly with their meals and an authority could only improve its recognised performance of help by providing the elderly with the very service they wanted less of, namely, more home helps.

Even benevolent governments can make crucial mistakes like these. This example isn’t even a measurement error; it’s measuring the wrong thing. And the mistake wasn’t caused by the government’s will to manipulate, but by a genuine misunderstanding of what the measurement should be all about.

I think the general point I’m trying to make is that human rights measurement should take place in a free market of competing measurements – and shouldn’t be a (government) monopoly. Measurement errors are more likely to be identified if there is a possibility to compare competing measurements of the same thing.

Measuring Democracy (3): But What Kind of Democracy?

Those who want to measure whether countries are democratic or not, or want the measure to what degree countries are democratic, necessarily have to answer the question “what is democracy?”. You can’t start to measure democracy until you have answered this question, as in general you can’t start to measure anything until you have decided what it is you want to measure.

Two approaches to measuring democracy

As the concept of democracy is highly contestable – almost everyone has a different view on what it means to call a country a democracy, or to call it more or less democratic than another – it’s not surprising to see that most of the research projects that have attempted to measure democracy – such as Polity IV, Freedom House etc. – have chosen a different definition of democracy, and are, therefore, actually measuring something different. I don’t intend to give an overview of the differences between all these measures here (this is a decent attempt). What I want to do here is highlight the pros and cons of two extremely different approaches: the minimalist and the maximalist one. The former could, for example, view democracy as no more than a system of regular elections, and measure simply the presence or absence of elections in different countries. The latter, on the other hand, could include in its definition of democracy stuff like rights protections, freedom of the press, division of powers etc., and measure the presence or absence of all of these things, and aggregate the different scores in order to decide whether a country is democratic or not, and to what extent.

When measuring the democratic nature of different countries (and of course comparing them), should we use a minimalist or maximalist definition of democracy? Here are some pros and cons of either approach.

Differentiation

A minimalist definition makes it very difficult to differentiate between countries. It would make it possible to distinguish democracies (minimally defined) from non-democracies, but it wouldn’t allow to measure the degree of democracy of a given country. I believe an ordinal scale with different ranks for different levels of quality of democracy in different countries (ranging from extremely poor quality, i.e. non-democracies, to perfect democracies) is more interesting than a binary scale limited to democracy/non-democracy. The use of a maximalist definition of democracy would make it possible to rank all types of regimes on such an ordinal scale. A maximalist definition of democracy would include a relatively large number of necessary attributes of democracy, and the combination of presence/absence/partial development of each attribute would almost make it possible to give each country a unique rank in the ordinal scale. Such a wide-ranging differentiation is an advantage for progress analysis. A binary scale does not give any information on the quality of democracy. Hence, it would be better to speak of measuring democratization rather than measuring democracy. And democratization not only in the sense of a transition from authoritarian to democratic governance, but also in the sense of progress towards a deepening of democratic rule.

A minimalist definition of democracy necessarily focuses on just a few attributes of democracy. As a result, it is impossible to differentiate between degrees of “democraticness” of different countries. Moreover, the chosen attributes may not be typical of or exclusive to democracy (such as good governance or citizen influence), and may not include some necessary attributes. For example, Polity IV, perhaps the most widely used measure of democracy, does not sufficiently incorporate actual citizen participation, as opposed to the mere right of citizens to participate. I think it’s fair to say that a country that gives its citizens the right to vote but doesn’t actually have many citizens voting, can hardly be called a democracy.

Acceptability of the measurement vs controversy

A disadvantage of maximalism is that the measurement will be more open to controversy. The more attributes of democracy are included in the measure, the higher the risk of disagreement on the model of democracy. As said above, people have different ideas about the number and type of necessary attributes of a democracy, even of an ideal democracy. If the only attribute of democracy retained in the analysis is regular elections, then there will be no controversy since few people would reject this attribute.

Balancing

So we have to balance meaning against acceptability: a measurement system that is maximalist offers a lot of information and the possibility to compare countries beyond the simple dichotomy of democracy/non-democracy, but it may be rejected by those who claim that this system is not measuring democracy as they understand the word. A minimalist system, on the other hand, will measure something that is useful for many people – no one will contest that elections are necessary for democracy, for instance – but will also reduce the utility of the measurement results because it doesn’t yield a lot of information about countries.