Scientists make gains in long road toward a computer that `listens'
Boston — At a General Electric plant in Binghamton, N.Y., workers inspecting circuit boards write out repair tickets by talking. When a defect is found, they recite it to a voice-controlled computer, which prints out the notice -- a big time-saver over hand-logging data. This fall, the Air Force will test a voice-activated computer in an F-16 jet that will allow pilots to shout commands in the cockpit (``arm missiles!''), leaving them more freedom to fly.
Police in Northern Ireland are soon to use similar equipment to help them quickly check out suspicious cars without taking their eyes off the road. Reciting a license number into their radios, officers will trigger a search in a centralized computer crime bank. Back will come a prerecorded message about the vehicle.
Man has been trying to endow machines with the ability to understand spoken commands for 40 years. While simple forms of automatic speech recognition are now practical, the day of computers comprehending more than a fraction of human language is still years away.
``It's going to take more than the year 2001 before we have HAL,'' says Hy Murveit, a speech-recognition expert at the Stanford Research Institute, a California think tank, referring to the mythical machine that converses with people in the 1968 space-odyssey film.
Nevertheless, the rudimentary voice-control systems already available are gaining in use for everything from quality control in factories to baggage handling at airports. Moreover, a tight race is now shaping up among several competing technologies to push systems far beyond present limits -- including (perhaps) the development of ``talkwriters,'' fanciful machines popularized in science fiction that transcribe speech.
Behind the enduring drive to teach computers to listen is the desire to make machines easier to use. Computers capable of responding to spoken words, for instance, would free users from laboring over a keyboard. Information could also be processed much more quickly.
But speech recognition is one of the toughest tasks in computer research today. The reason has to do more with the richness of language than with the blockheadedness of machines. The ambiguities of speech are not easily represented in the simple on-off code of computerspeak. Thus, speech-recognition systems often require vast amounts of processing power -- the average adult regularly uses as many as 10,000 words -- which can make them costly.
Many words sound alike (``team'' and ``teem''). Others can be pronounced in different ways (``Hahvahd'' vs. ``Harvard''). This is to say nothing of the different tonal qualities among speakers and the need to figure out broken sentences and filter out background noise (computers now have trouble differentiating a squeaky file cabinet from a voice).
Most current systems work by matching spoken words to patterns stored in the computer. In simplified terms, a spoken command is converted into an electrical signal that can be represented by ones and zeroes. These digitized sound waves are compared with ``templates,'' or patterns of sound, that have been prerecorded and make up the computer's vocabulary. When a sound wave is matched with the right template, the word is recognized and the machine responds.
Simple enough -- only it isn't. Most systems still have limited vocabularies (typically less than 100 words), have to be programmed for each individual speaker, and usually require the person to pause between words. Even then, the systems often make major blunders. For these reasons, the technology hasn't turned out to be the commercial bonanza that many were boldly predicting just a few years ago.
Even with these limitations, however, voice-recognition systems are finding more use. In industry, this includes speeding product inspection and aiding in sorting goods. Some groups use computers to answer phones and respond to simple requests. Personal computers exist that obey a few verbal instructions (``store''), and videogames are emerging that can be played by shouted commands. ``The field is starting to move rapidly now,'' says Bill Spain, a voice-recognition specialist at Probe Research Inc., a New Jersey research firm.
In the lab, a new generation of nimbler machines is being shaped. For instance, Speech Systems, a small Tarzana, Calif., company, expects to come out with a system within the next year that will take dictation from someone without a need for pauses between words. This would produce only a rough draft, though. A final draft would still have to be edited on a keyboard.
Even bolder predictions are coming from Massachusetts-based inventor Raymond Kurzweil, founder of Kurzweil Applied Intelligence. His company hopes to unveil a dictation-taking word processor for office use by 1986 that will have a vocabulary of 10,000 to 15,000 words and cost about $25,000. The Voice Writer departs somewhat from the template approach, using artificial-intelligence techniques to try to do such things as recognize words in context. Still, it will require brief pauses between words, and each speaker will have to talk into it in advance, so the machine has a profile of his or her voice.
IBM, meanwhile, is taking a different tack. Using an unusual statistical technique to match verbal commands with those in its vocabulary, scientists recently produced an experimental system that can recognize 5,000 words and do it accurately 95 percent of the time -- the result of 12 years of research. At this point, though, the company has no plans to develop a commercial product.
Other company and university researchers are exploring different approaches to enlarge vocabularies, producing systems that can understand speech spoken at a natural speed, and for many different speakers. Progress is being made, but most scientists say it may be 20 years before people can communicate with machines as easily as talking to each other. ``We are just barely scratching the surface now,'' says Dr. Raj Reddy, a speech-recognition expert at Carnegie-Mellon University. Sidebar: How a computer recognizes speech 1. A person says a word into a microphone. 2. The microphone converts the sound into a continually varying electrical current. 3. The computer converts the current into a digital pattern that can be represented by a series of ones and zeroes. 4. The machine then compares the digitized sound wave with other patterns (templates) stored in its memory until it finds a template the digitized sound wave matches. When a match is achieved. . . 5. . . .the word pops up on the screen.
An occasional feature