- $1 billion Empire State Building IPO: why it won't be like Facebook IPO
- In surprise move, GOP leaders admit defeat in payroll tax battle
- More than 30,000 Germans turn out against anti-piracy treaty ACTA
- Does Obama blueprint reduce budget deficit fast enough? (+video)
- Pentagon budget: Does it pit active-duty forces against retirees? (+video)
- Murdoch media crisis deepens with five new arrests
- How Pinterest combines the best parts of Facebook, Tumblr, and Etsy
- US, China face 'trust deficit' as China's heir apparent visits
Fishing for data
To harness the vast information flow generated each day, scientists are developing sophisticated software that can instantly mine streaming data, such as videos, without ever needing to archive it.
(Page 2 of 2)
"You're getting a continuous flow of data, and you have a limited amount of time to analyze large numbers of data points quickly," he says. Systems also must be designed to use battery power sparingly and deal with communications links that can carry substantially less information than do fiber-optic cables or copper wire.
His team's stock-market monitor, which can run on a wireless Palm Pilot or other personal digital assistants, is testing software approaches to meeting those requirements, he says. The team also has developed a system for monitoring truck shipments, providing more information on the condition of the vehicle and cargo than merely receiving periodic updates on a truck's position via navigation satellites.
Others, such as Washington University's Indeck, are taking hardware approaches to boosting database search speed. Typically, he says, a database on a storage device such as a hard drive must cross from the drive to the computer's main memory and processor for the search to take place, substantially slowing the search time. The interconnection, called a bus, is basically an "electronic water pipe," Indeck says, and has a fixed carrying capacity.
Indeck and colleagues have developed a hard drive that contains its own processing circuitry, so the only signals that must cross the bus are the initial query and the answers, not the entire contents of the database itself. By using this configuration, he says, searches that once might have taken days can be concluded in "many seconds." Overall, his team estimated that the approach can run searches 200 times faster than existing technologies.
These and other technologies are likely to be high on the shopping list for the federal government's Total Information Awareness project, spearheaded by the Defense Advanced Research Projects Agency (DARPA). The program, which some have dubbed "the mother of all data-mining projects," kicked off last year with the fiscal 2002 budget. The R&D program aims to "detect, classify, identify, and track terrorists so that we may understand their plans and act to prevent them from being executed," according to John Poindexter, the project's director.
Speaking at a meeting on the project last summer, Dr. Poindexter noted that much of the effort will focus on unifying and probing databases that carry information on financial transactions.
Maryland's Kargupta notes that researchers are working to ensure that privacy can be maintained by designing software that will randomly mask characteristics of individuals in a monitored group so that the group's activities can be monitored as a whole without revealing any one individual's identity. If the need arises, however, that safeguard can be lifted for any individual in the group.
Indeck adds that systems can be established that give users access to the gross output of a search or query, but not to the raw information from which the output was derived.
Researchers acknowledge that as data-mining technologies improve, the software they write will have to reflect existing privacy laws and be easy to adjust as legal rulings on privacy issues emerge.
But some privacy advocates doubt that those efforts will be sufficient to ensure that civil liberties will be maintained.
"The problem overall is that so much emphasis is being put on the data-mining aspect with little being said about controls," says Lee Tien, senior staff attorney with the Electronic Frontier Foundation in San Francisco, referring to DARPA's push. The project's efforts to improve the human-computer interactions and use technology to boost collaboration between federal agencies "is hard to argue with," he says, especially in light of the missed clues that might have heightened alerts in advance of last year's terrorist attacks on the World Trade Center and the Pentagon.
But the emphasis on surveillance, he says, raises questions of accountability, from the software engineers who design the programs to the people who would provide human checks on the automated results - a challenge common to many envisioned data-mining schemes.
"The big questions are how do you define privacy and how will you maintain it?" agrees Indeck. "We have to engage the privacy issue and embed it properly into our systems."
Page:
1 | 2



