Does government 'juke the stats'? Faulty databases affect nearly every corner of public policy, from US crime statistics to VA logs. In short,  we need better data.

The controversy in Ferguson has raised questions about the use of deadly force by police. But as USA Today recently reported, we don’t have good statistics about it. The FBI database on “justifiable homicides” depends on self-reported information from less than one percent of law enforcement agencies. Experts believe that the resulting numbers provide a sketchy picture.

This faulty database is just one example of a problem that affects nearly every corner of public policy. One issue after issue, many relevant statistics are either flawed or nonexistent.

Sometimes, the culprit is deliberate manipulation. Fans of the classic television series The Wire may recall episodes that revolved around “juking the stats.” As one character put it, there are “certain processes” by which police can reduce felony rates, such as reclassifying aggravated assaults or writing up robbery reports as unfounded. It doesn’t just happen on HBO. According to The Los Angeles Times, the LAPD misclassified nearly 1,200 violent crimes between 2012 and 2013, including hundreds of stabbings, beatings, and robberies.

Many other kinds of government organizations juke the stats. The most notorious recent example is the Department of Veterans Affairs, where bureaucrats twisted appointment data to hide the long (and occasionally lethal) delays confronting veterans who needed medical help.

Even when officials are not skewing the numbers on purpose, the statistics might still be questionable. That’s especially true if they involve illegal or controversial behavior. For instance, data on drug arrests only tell us about people who get caught, not about sellers and users who elude law enforcement. Surveys may provide some additional information, but hardly furnish the whole story. People who have committed drug felonies may be reluctant to share that information with survey researchers.

The Census Bureau provides a wealth of information that underlies many statistical studies. Although federal law sets fines for refusing to answer census questions or knowingly providing false information, the bureau does not actually punish anybody. Consequently, census undercounts are a perennial problem.

The bureau’s American Community Survey is the only source of local statistics for dozens of topics, such as educational attainment, housing, employment, commuting, language spoken at home, and ancestry. Though it has a very good response rate, it still suffers from some shortcomings. Among other things, it does not verify answers, so it depends on the memory, knowledge, and candor of respondents. When people know little about their ancestry, or give inaccurate guesses about their commuting times, misinformation creeps into the stats.

Some data simply do not exist. Autism researchers would like to know how much prevalence rates have changed over the years. How does the rate in 2014 compare with 1954 or 1974?  We have no way of knowing. For decades after Leo Kanner first identified autism in 1943, there was little systematic effort to measure it. The prevalence estimates of the Centers for Disease Control only date back to 2002, and changes in these figures might result from growing awareness on the part of parents and doctors, not from an underlying shift.

Alexis deTocqueville wrote:  “When statistics are not based on strictly accurate calculations, they mislead instead of guide. The mind easily lets itself be taken in by the false appearance of exactitude which statistics retain even in their mistakes, and confidently adopts errors clothed in the forms of mathematical truth. Let us abandon figures, then, and try to find our proofs elsewhere.”

Tocqueville got the last part wrong:  We need more and better statistics, not fewer.  But he was right that we should be wary of accepting data at face value, and be humble about what we know and don’t know.

Jack Pitney writes his Looking for Trouble blog exclusively for the Monitor.


