Data mining is a loaded phrase these days. Though it has myriad benefits, it’s the drawbacks – as in NSA spying and stalking by advertisers – that tend to get the most attention.
But having a glut of data that’s systematically collected by the governments, research organizations, and by corporations is a symptom of wealth and privilege.
Many developing countries have the opposite problem, and it’s making life difficult for national governments that in some cases have a murky view of even basic population statistics when trying to build roads or hospitals, or to decide how much money to allocate to provincial governments.
A dearth of data also makes it hard for international financial organizations such as the World Bank to determine how much aid to distribute to countries with the highest poverty levels, or for humanitarian organizations to figure out where their services are most desperately needed.
“There are serious decisions made based on bad data,” Joshua Blumenstock, a data scientist at the University of Washington in Seattle told The Christian Science Monitor.
In some African countries, he says, gross domestic product estimates, which measure a nation’s economic activity, can be off by 50 percent.
In a paper published on Friday in the journal Science, Dr. Blumenstock and data science colleagues from his own university and from University of California in Berkeley propose a cheap and efficient method for tracking poverty and wealth in developing countries: mining anonymized data from mobile phones.
This might be the best source of reliable statistics on how people are faring in the world’s poorest countries, they write, where mobile phone use is widespread and where there’s no Facebook and Twitter to mine for trends. In some of these countries conducting a national census is prohibitively expensive, and there aren’t thousands of organizations constantly collecting and analyzing data about the population and the economy.
Researchers can glean massive amounts of information about the economic conditions of even the most remote regions of the world by analyzing phone data related to how many people mobile phone users talk to every day (the more calls, the bigger the social network) or whether they make calls from different locations every day (more locations means there’s money available for daily travel) what time of day they’re making most of their calls (if most calls are falling between 9 AM and 5 PM, that might say something about the type of job someone has, or if they even have a job).
“Literally hundreds of thousands of patterns exist in these data,” Blumenstock says.
In a study conducted in 2009, the researchers used mobile phone data, provided by Rwanda’s largest mobile phone network operator and stripped of names and addresses, from 1.5 million people in Rwanda.
After analyzing four years worth of this data and identifying phone usage patterns that they thought might be significant, the researchers partnered with students from Kigali Institute of Science and Technology, in the capital city of Kigali, who conducted phone surveys with 856 randomly selected mobile phone subscribers, paying them the equivalent of one US dollar to participate.
The students also got written consent from most people to view their phone records. When the researchers compared the phone data with the information people volunteered in the phone surveys and some household data collected by National Institute of Statistics of Rwanda, they saw that the phone data was accurate at predicting economic conditions of individuals or small groups, and that the predictions could be extrapolated nationally.
A national household survey can cost $1 million and take up to 18 months to conduct. By comparison, Dr. Blumenstock’s study cost $12,000 and took four weeks.
“This is more where I see the future,” he says. “Precise identification of where the need is by using these sort of highly disaggregated sources of data.”
But there are some kinks to work out before this method can be widely used in the future. For one, there has to be a way to ensure that phone data is used by researchers and governments in a “privacy-conscientious ways,” said Yves-Alexandre de Montjoye, a computer privacy researcher at Harvard University.
His research earlier this year showed that even with just a few pieces of anonymized data, he could quickly identify specific people.
Another challenge, according to Blumenstock, is applying this model in other countries, where people might not be as willing as Rwandans were to hand over their personal phone data.
In Afghanistan, for instance, where Blumenstock is working now, participation rates have been lower.
“There’s more inherent suspicion of people collecting data” in the country, he says.