Skip to: Content
Skip to: Site Navigation
Skip to: Search


reCaptcha: How to turn blather into books

Ten seconds of work has digitized libraries, whether the amateur translators know it or not.

By Patti LaneCorrespondent for The Christian Science Monitor / February 19, 2009

Von Ahn: The professor invented reCaptcha in 2007. Since then, its users have translated 5 billion words.

Gene J. Puskar/AP

Enlarge

Toronto

When you buy a concert ticket on Ticketmaster, post something for sale on Craigslist, or poke an old friend on Facebook, you may not know it, but you’re helping to put millions of books online in a vast free library.

Skip to next paragraph

To access these websites, you must decipher two squiggly words to prove that you’re not a computer program designed to spam the site. Once it knows you’re human, the website lets you continue.

Those two decoded words don’t disappear, however. In fact, your brain has deciphered words that had baffled the scanning software used for an enormous project to digitize every public domain book in the world.

“We can coordinate literally millions of people on the Internet to work together to do something that computers cannot do,” says Luis von Ahn, an assistant professor of computer science at Carnegie Mellon University in Pittsburgh.

Mr. von Ahn helped develop the first version of these security puzzles in 2000, stringing together random combinations of words and numbers then distorting the text to make it impossible for automated spammers to decode.

Some 200 million of these words, dubbed “Captchas” for Completely Automated Public Turing test to tell Computers and Humans Apart, are typed every day by people around the world.

“At first, it made me feel good to look at the impact my research has had,” says von Ahn, who grew up in Guatemala.
Then he did the math: “It takes about 10 seconds to type each Captcha. I realized that humanity as a whole is wasting 500,000 hours every day typing Captchas.”

When von Ahn compared that to the 7 million hours it took to build the Empire State Building or the 20 million hours spent constructing the Panama Canal, he wondered, “Is there a way we can make good use of this time?”

In 2007, he came up with reCaptchas. Now, instead of frittering away their time typing random characters, Internet users spell actual words plucked from old books that computers have trouble reading.

The Open Content Alliance, a nonprofit group based in a San Francisco, has enlisted about 150 libraries and research centers to digitize as many printed works as it legally can and post them online for anyone in the world to read.

“Everything on the Internet Archive [archive.org] is free to use and free to download,” says Gabe Juszel, coordinator for the project’s largest scanning center that occupies a dim office at the University of Toronto. “We want to make sure a person in China has the same resources as a grad student here at U of T. After all, there are more Internet cafes than there are libraries in the world.”

Permissions