Scientists make gains in long road toward a computer that `listens'
At a General Electric plant in Binghamton, N.Y., workers inspecting circuit boards write out repair tickets by talking. When a defect is found, they recite it to a voice-controlled computer, which prints out the notice -- a big time-saver over hand-logging data. This fall, the Air Force will test a voice-activated computer in an F-16 jet that will allow pilots to shout commands in the cockpit (``arm missiles!''), leaving them more freedom to fly.Skip to next paragraph
Subscribe Today to the Monitor
Police in Northern Ireland are soon to use similar equipment to help them quickly check out suspicious cars without taking their eyes off the road. Reciting a license number into their radios, officers will trigger a search in a centralized computer crime bank. Back will come a prerecorded message about the vehicle.
Man has been trying to endow machines with the ability to understand spoken commands for 40 years. While simple forms of automatic speech recognition are now practical, the day of computers comprehending more than a fraction of human language is still years away.
``It's going to take more than the year 2001 before we have HAL,'' says Hy Murveit, a speech-recognition expert at the Stanford Research Institute, a California think tank, referring to the mythical machine that converses with people in the 1968 space-odyssey film.
Nevertheless, the rudimentary voice-control systems already available are gaining in use for everything from quality control in factories to baggage handling at airports. Moreover, a tight race is now shaping up among several competing technologies to push systems far beyond present limits -- including (perhaps) the development of ``talkwriters,'' fanciful machines popularized in science fiction that transcribe speech.
Behind the enduring drive to teach computers to listen is the desire to make machines easier to use. Computers capable of responding to spoken words, for instance, would free users from laboring over a keyboard. Information could also be processed much more quickly.
But speech recognition is one of the toughest tasks in computer research today. The reason has to do more with the richness of language than with the blockheadedness of machines. The ambiguities of speech are not easily represented in the simple on-off code of computerspeak. Thus, speech-recognition systems often require vast amounts of processing power -- the average adult regularly uses as many as 10,000 words -- which can make them costly.
Many words sound alike (``team'' and ``teem''). Others can be pronounced in different ways (``Hahvahd'' vs. ``Harvard''). This is to say nothing of the different tonal qualities among speakers and the need to figure out broken sentences and filter out background noise (computers now have trouble differentiating a squeaky file cabinet from a voice).
Most current systems work by matching spoken words to patterns stored in the computer. In simplified terms, a spoken command is converted into an electrical signal that can be represented by ones and zeroes. These digitized sound waves are compared with ``templates,'' or patterns of sound, that have been prerecorded and make up the computer's vocabulary. When a sound wave is matched with the right template, the word is recognized and the machine responds.