Voice-Recognition Systems Boom
PHONES you can dial simply by repeating someone's name and typewriters that type what you say to them may be commonplace by the middle of the next decade, thanks to recent developments in computerized speech-recognition technology. Word processors that can listen to a person talk and produce a printed transcript have been on the market for almost a year. Although they are expensive and require the speaker to pause between each word, the machines have made dramatic differences in the lives of people with physical disabilities and promise to save time and money in clerically intensive fields such as commodities trading and medical reporting.Skip to next paragraph
Subscribe Today to the Monitor
``I [can] dictate a letter ... and send it over a network to another computer and have it printed,'' says Frank Whitney, a computer programmer at the United States Department of Defense who is able to use only one finger. ``If I were doing it using my finger, I would be halfway through the first paragraph,'' in the same amount of time, he says.
``I have spoken to half-a-dozen handicapped folks, paraplegics, and others who are using these systems,'' says Christopher R. Seelbach, an analyst at Probe Research, a market-research firm that follows the voice-processing industry. ``For those who can afford the $10,000 to $15,000 for a system, it basically changes their lives.''
The concept is not new. Computers designed to recognize 50 to 100 spoken words have been around for nearly 15 years, says Janet M. Baker, president of Dragon Systems, a Boston-area company that sells voice-recognition equipment and software. ``The early systems didn't work very well,'' often making mistakes, and they were unable to tell the difference between background noise and speech, says Dr. Baker.
By the mid-1980s, however, the accuracy of these small-vocabulary systems had improved. Companies started using them for inventory and quality control.
``Three years ago, Xerox Corporation was able to conduct a cost-effective, 100 percent audit of 2.2 million parts in two months [using such a system],'' Baker says.
Small-vocabulary systems are speaker-dependent. They must be ``trained'' to recognize the user's voice in a 10-minute session, during which the computer flashes words on the screen and the user repeats them. Both Dragon and Kurzweil Applied Intelligence, another Boston-area firm, have recently developed large-vocabulary, speaker-independent systems that do not require training for each new user. Dragon sells a system that Baker says can recognize 30,000 spoken words. In March, Kurzweil plans to introduce a system for medical dictation that will recognize up to 10,000 words, says Vladimir Sejnoha, a research engineer with the company.
Although the specific recognition techniques employed by Dragon and Kurzweil are different, basic speech recognition involves converting sound picked up by a microphone into a series of acoustic frames or segments, each 1/100th of a second long. Each frame is analyzed and a set of mathematical constants representing tone and change in volume is extracted. The constants are in turn translated into phones, the smallest distinctive element of spoken language. Silences between phones are used to signify breaks between the words. The phones are matched against a phonetic dictionary and then changed into standard English spelling.
The actual systems are much more complicated, Baker stresses. ``Just doing a phone identification, and then doing a look-up on that, does not work.... You need to make use of many kinds of information simultaneously.'' For example, the software considers the context of the spoken word in the sentence to determine probability of a match against words in the phonetic dictionary. Such techniques also help the system decide between homonyms like ``through'' and ``threw.''