Speak up, dear. Your computer is listening
It's early Monday morning. You climb into your car, turn on the engine, and the first thing you hear is, "Good morning, dear." It's the voice of your wife. Only she's not in the car. It's the voice-enabled computer in your car that acts as your personal assistant - you've just programmed it with your wife's voice.
After returning the greeting - after all, one has to be polite to one's spouse, even when she's computer-generated - you ask for your daily appointments and any important future meetings that week. Your personal assistant says she'll be just a minute and soon returns with your entire day's appointments, along with last night's Major League Baseball scores.
Your first appointment is an important phone call to your distributor in Germany. You voice-activate your cellphone, tell it to turn on voice translation, so when your distributor speaks in German, you hear an understandable and smooth English translation. When you start to talk, your German friend will hear you speaking in German, translated on the fly, in a voice that she's already chosen to represent all English-language speakers.
Think it's all science fiction? Read on.
Welcome to the amazing world of voice recognition, one of technology's most intriguing, most promising, and yet most-challenging new sectors.
The idea behind voice recognition has been around for years. Who isn't familiar with the image of Captain Picard ordering a cup of Earl Grey, hot. But it wasn't until 1997, when Dragon Systems introduced Naturally Speaking, the first commercially available continuous-speech software (which means you don't have to pause between words), that voice-recognition technology became available to the public.
Last week, the field received a dramatic boost when Lernout and Hauspie (L&H) of Burlington, Mass., one of the leading voice-recognition companies, announced that it would acquire Dragon Systems. The move brings together some of the best scientists working on voice recognition, with an eye to getting the technology into as many different computers and Internet devices as possible.
"Frankly, we've been challenged because of a lack of manpower," says Bill Destefanis, senior director of project management for L&H. "Instead of competing with each other, we decided to work together because we feel we can get farther that way and faster."
So how realistic is the idea of using your voice to interact with a computer? Very realistic, if you listen to the engineers and scientists at L&H. That's why the company and its competitors (primarily Philips and IBM) see this emerging technology as one with an incredible "upside."
"We really do see speech being used in common human devices," says Mr. Destefanis. "Just take hands-free dialing. Six states have legislation that mandates cellular phones may only be used in cars if they are done so in a hands-free manner.
"Or take set-top TV boxes. They will be used to drive a fairly sophisticated computer on top of the television that will give people incredible access to information. If we don't use voice to control these set-top TV boxes, all that remains are those clunky remote controls. People will feel far more comfortable using their voice to program a computer than they will a remote control," he says.
This view is echoed by Ray Kurzweil, one of the most preeminent thinkers in the field of voice recognition. In his latest book, "The Age of Spiritual Machines," Dr. Kurzweil lays out a future where speech will be the primary means of interacting with technology.
"In the next decade, we will see translating telephones that provide real-time speech translation from one language to another, intelligent computerized personal assistants that can converse and rapidly search and understand the world's knowledge bases, and a profusion of other machines with increasingly broad and flexible intelligence."
But the idea of using your voice as the primary means to interact with either your computer or any Internet device is not endorsed by everyone. For instance, Jeff Hawkins, who designed the original Palm Pilot, is against speech technology. Mr. Hawkins has told the news media that he doubts that users of personal digital assistants will want to use their voice as the primary method of importing data into a PDA.
Another caution comes from Scott Relf, vice president for marketing at Sprint PCS. Currently, Sprint offers a variety of phones that enable people to use simple voice commands to dial numbers already programmed into digital cellphones. Yet Mr. Relf doesn't see a world that's completely ready for voice recognition yet.
"The problem is that right now, the technology really doesn't work well in a small handset," he says. "Meanwhile, our focus groups have shown us that most people learn to type simple text messages in about a month, using the keying-in method. We asked a group of young people in the United Kingdom and Japan to text key in the message, 'I have had a brilliant time tonight.' The faster kids were able to do it in 15 seconds."
The need for speed
Voice-recognition software requires computers with fast processors and a lot of random-access memory. Susan Fulton, an expert on speech-recognition software, says that using speech recognition requires "an investment of both time and money." On her Web site, www.out-loud.com, she writes that the minimum specifications that various vendors list are far below optimal. (Most call for about 64 megabites of memory and 200 megahertz.) She recommends a computer with a Pentium 2, 400 megahertz processor, and 128MB of RAM.
The scientists at L&H, in fact, are the first to tell you that technology is currently the biggest drawback to widespread acceptance of voice recognition as an everyday activity.
But they also point out that only two or three years ago, the kind of continuous-speech technology available today wasn't possible because machines weren't powerful enough. As processors grow faster and smaller and as memory gets more affordable, voice recognition will become increasingly more accessible.
"Voice recognition touches every part of our lives, moving beyond the desktop," according to Klaus Schleicher, director of product management, L&H's PC Applications Division. "We see the future as one where people will be able to get information anywhere, and they will be using their voice as the interface to get that information."
*For more on voice recognition, read Tom Regan's column on the Monitor's Web site: www.csmonitor.com
(c) Copyright 2000. The Christian Science Publishing Society