The field narrows for e-books
As Microsoft backs away from digitizing old texts, some worry that a single company could privatize world knowledge.
(Page 2 of 2)
Google has partnered with more than two dozen libraries, including those at Harvard, Stanford, Oxford, and Princeton universities and the New York Public Library. The company uses what amounts to a VIP library card – taking books on loan, scanning them, and then returning them to the library unharmed, says Jon Orwant, engineering manager of Google Book Search. The digitization costs the libraries nothing.Skip to next paragraph
Subscribe Today to the Monitor
In a separate deal with book publishers, Google scans new books with a less gentle approach. The spines are chopped off and the pages fed through an optical scanner.
Google won’t say how many books it has scanned so far, but it’s certainly in the millions. The company estimates there may be more than 100 million book titles in the world today.
So far, Google isn’t aggressively trying to make money off its book pages, though a few ads and links to buy hard copies from the publisher do appear. Keeping users inside Google’s online “universe” seems to be the company’s long-term motive.
Books published before 1923 have gone out of copyright and can be scanned freely, downloaded, or printed. Google obtains permission from publishers regarding how much of a new book it can display. Though only short “snippets” of these books usually can be viewed, the whole text is still searchable, helping readers decide if it contains information that is useful to them.
Another controversial aspect of Google’s stewardship involves the quality of the digitization. After books are scanned, a process called optical character recognition (OCR) converts each page into a digital file whose words can be read by a computer, which makes it searchable.
Computer programs do a good job with OCR on new titles, but older books with yellowed pages, faded print, or graffiti can prove to be a problem. Google’s final product is “less than 100 percent” accurate, Mr. Orwant concedes.
“Google is doing a very, very poor job.... Their OCR is very inaccurate, the image quality is very poor,” says Lotfi Belkhir, CEO of Kirtas Technologies. The company, in Victor, N.Y., bills itself as the world’s leader in converting books into digital form. “You find cutoff text.... You find dirty text. You find incomplete pages.”
He predicts that much of what Google has digitized so far will need to be rescanned someday to bring it up to acceptable quality.
Mr. Belkhir is contacting libraries that had been working with Microsoft and says they are receptive to letting Kirtas pick up where it left off.
Google’s Orwant defends his project. “We certainly believe we’re doing the world a very good service,” he says. “We’re digitizing all this content. We’re making it as open as the laws allow.”
Google always gives a digital copy back to its partners, Orwant says. “We’re never the only people with a copy.” And because Google’s contracts with the libraries are nonexclusive, the libraries are free to work with others to scan their collections as well.
But that’s not enough for critics. “I don’t blame the company, but the question is ‘What do we as citizens want out of our information system?’ ” says Mr. Vaidhyanathan at the University of Virginia.
“If we assume that a healthy, diverse, and accessible body of information is essential to science, politics, creativity, literature,” he says, “then we really have to step back and say, ‘Do we really want to put this one company in the position of being the filter for the world’s information?’ ”