Controlling the flood of genetic information

With the completion of a "rough draft" of the entire human genome in June, government and corporate scientists have assembled enough genetic code to fill 2,000 computer diskettes.

Now, the scientific footrace shifts from spelling out the seemingly endless string of "A's," "C's," "T's," and "G's" that make up our DNA, to actually understanding what it means.

That's where a new field of "bioinformatics" comes in.

Bioinformatics is a marriage of computer science and biology, in which powerful new supercomputers are being trained on our genome - as well as the genomes of experimental yeast, bacteria, fruit flies, and mice - in order to find out how genes make proteins, regulate cell and tissue function, and linked to certain diseases.

The ultimate goal of bioinformatics is a clearer understanding of what makes us human, while the payoff is in new biomedical advances and drug treatments.

These computers are dealing with a flood of genetic information that doubles every 12 to 14 months. That flood keeps rising as automated gene sequencers at government and commercial labs spit out more information at an increasingly faster pace.

GenBank, the government-sponsored database of genetic data housed at the National Library of Medicine, now contains 9.5 billion base pairs of genetic code from various genomes.

Celera Genomics Corp, the firm that announced completion of the human genome project draft, currently has storage capacity of 50 terabytes (that's equivalent to the data stored on 80,000 compact discs).

So what becomes of all this data?

Daniel Masys likens the task of interpreting genetic information to drinking from a fire hose.

"There's so much biological information that nobody is prepared to understand it," said Dr. Masys, director of biomedical informatics and professor of medicine at the University of California at San Diego.

Corporations are also jumping into bioinformatics.

Wall Street analysts estimate that it's a $300 million industry of hardware, software, and analysis that will jump to a $2 billion business within five years, according to the investment banking firm Oscar Gruss & Son in New York.

Earlier this year, IBM announced plans to build a $100 million "blue gene" computer that could perform more than a billion billion (yes, that's billion twice) operations per second, 500 times more powerful than the world's fastest computers today and 2 million times more powerful than a desktop computer.

That kind of processing power will be thrown at unraveling how proteins fold into complex geometrical shapes that allow them to perform their biological functions.

However, not all efforts are on such a grand scale.

Researchers at the University of Idaho are building their own "super" computer using parts from 40 to 100 desktop PCs. The hardware will cost only about $44,000, but with the proper connections, the system will be sophisticated enough to run experiments on "jumping genes," bits of genetic material that migrate along the DNA double helix like microscopic hitchhikers.

James Foster, the computer scientist at the University of Idaho directing the project, said that building better programs called algorithms is more important to understanding biological data than just building bigger computers.

"Nature can always defeat brute-force approaches," says James Limpan, director of the National Center for Biotechnology Information in Rockville, Md.

"We need the most clever ways of measuring things and not more CPU power," he says.

(c) Copyright 2000. The Christian Science Publishing Society

You've read  of  free articles. Subscribe to continue.
QR Code to Controlling the flood of genetic information
Read this article in
https://www.csmonitor.com/2000/0914/p16s2.html
QR Code to Subscription page
Start your subscription today
https://www.csmonitor.com/subscribe