After fitting 16 billion separate fragments together, scientists have finally managed to sequence the genome of the loblolly pine tree, the largest ever genome sequenced so far.
To obtain the DNA, the scientists first had to remove the embryo from the seed, says Indiana University's Keithanne Mockaitis, an author on the paper. What remains is then a haploid, whose cells have just one set of chromosomes.
Using next-generation sequencing technology, researchers obtained billions of shorter sequence of bases. The challenge now was to sift through the data, identify the overlapping sequences, and assemble them together – a computational puzzle called "genome assembly."
In the case of loblolly pine, the huge size of the genome made this process difficult.
The "challenge isn't just collecting all the sequence data. The problem is assembling that sequence into order," said David Neale, a professor of plant sciences at the University of California, Davis, who led the loblolly pine genome project.
"You have this big pile of tiny pieces and now you have to reassemble the book," said Steven Salzberg, professor of medicine and biostatistics at Johns Hopkins University, one of the directors of the loblolly genome assembly team, who was also an author on the papers.
As a solution, researchers developed a kind of software that eliminates repetitive base pairs from the original data, so that it can all fit within the memory of a supercomputer.
Getting rid of the redundancies is important because it leaves the computer with 100 times less sequence data to deal with, say researchers.
The loblolly will serve as a good "reference" genome because "the size of the pieces of consecutive sequence that we assembled are orders of magnitude larger than what's been previously published," said Dr. Neale.
The tree is a source of most American paper products. It is also an important feedstock for biofuel.
"In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied, " say researchers.