"Big data" has become quite the buzzword this year, especially after reports surfaced that the National Security Agency (NSA) has collected and stored individuals' phone records and Internet browsing histories, opening the debate about what should (and shouldn’t) be done with all of that information. But the possibilities for these massive data sets carry far beyond the hunt for terrorists. Public health officials use the same kind of data for a similar goal: saving lives.
Researchers can now pull together huge amounts of information – think Google Trends, Twitter messages about flu symptoms, or frequency of visits to WebMD – with the aim of tracking diseases. Officials have sought after such information for decades, but the increasing availability of big data makes the hunt faster and more accurate than ever before.
In the United States, researchers at Johns Hopkins University have been working on ways to track the flu by aggregating tweets. The program TwitterHose allows anyone to download about 1 percent of the tweets made in an hour, selected at random, giving researcher a nice cross section of Twitter users. Paid helpers sift through the tweets, flagging any that mention getting the flu or feeling flu-like symptoms. Researches then use location data to figure out where individual Twitter-users are reporting being sick. When matched with the Center for Disease Control statistics on flu outbreak – which usually run about two weeks behind real-time – the Johns Hopkins team found that they were able to accurately predict the CDC reports well before they were released.
But in locations where the population doesn’t have access to smart-phone technology, or use Twitter on a regular basis, researchers have had to be even more creative.
In Kenya, Caroline Buckee, a professor at Harvard University’s Center for Communicable Disease, worked with a team of researchers using cellphone data to track the spread of malaria. “In the past, we’ve had to rely on road networks or travel surveys to figure out how places are connected,” Ms. Buckee says. But the ubiquity of cellphones in Kenya has offered researchers an alternate method of tracking malaria. (According to a 2012 report by the Communications Commission of Kenya, there are approximately 30 million mobile subscribers in the country, or about 70 percent of the population.)
Buckee worked with her husband, cellphone-data expert Nathan Eagle, to devise a mathematical model based on cellphone call records. At the time, Mr. Eagle was already working with Kenyan cellphone providers, harvesting data to predict when a customer might try to change phone companies. The couple figured out that the disparate data sets that Eagle used to track consumer behavior could also be used to provide a new method to look at the spread of disease.
Every time an individual calls or sends a text message from a mobile phone, the cellular network logs the caller’s approximate location. In Buckee's study, individuals were assigned a specific area based on the location from which they placed most of their calls. Those same areas were assigned malaria risk ratings, based on how many cases of the disease were reported there.
With cellphone data and malaria rates in hand, researchers devised a mathematical model to predict the probability of people being infected or becoming infected in each region. The cellphone records used in the study were from mid-2008 to mid-2009, so the team was able to confirm its calculations with recorded malaria statistics from the same period. (The number of cellphone users during the time of the survey was approximately 60 percent of the population, Buckee says.)
“As Kenya moves towards malaria elimination, one of the critical factors is trying to figure out” how the disease spreads, Bukcee says. Public health workers have already determined how to combat malaria when the disease is located, but the cellphone data would allow health workers to target disease control programs more efficiently, she says.
But there are several problems with this kind of research. The companies that own the data have to agree to hand it over. And before they do, the data needs to be scrubbed of any personal information, says Mr. Eagle. Public health ethics do not allow for targeted tracking of individuals without their consent.
"The operators need to feel good about" giving researchers access to the their anonymized data sets, he says. And although any personal information that could connect individuals to their records would stay on the phone company's server, it is still a hassle for company's to anonymize this data and give it to public health workers. Plus, health researchers need to be trusted with not overreaching, even though richer, more specific information could make their work more effective.