Baby Steps for Speech Recognition
MURRAY HILL, N.J. — HELLO and welcome to our visitor from The Christian Science Monitor.Hal, is that you? I am the system for real-time translation of unrestricted text into speech, developed in the Information Principles Laboratory at AT&T Bell Labs.... My inventors think my word pronunciation and intelligibility are the best around. OK, maybe you're not Hal, the famous talking computer of "2001: A Space Odyssey." But you're a distant cousin. You tell me a children's nursery rhyme in Mandarin Chinese. You pronounce foreign names. You even get my name right (at least the American pronunciation), even though you've never seen it before. Of course, that's the easy part. You'll say whatever Richard Sproat of Bell Labs types onto your screen. But there's more. You also understand what people say. This is the AT&T Bell Laboratory's Flight Information System. How may I help you? I would like to make a reservation please. Please specify your travel plans. I want to go from New York to Chicago on Saturday morning. United Airlines Flight 3 leaves New York-La Guardia 10 a.m., arrives in Chicago-O'Hare at 11:18 a.m. How much is the fare? The round-trip coach fare is $479. I will take the flight. Your reservation on United Airlines Flight 3 to Chicago-O'Hare on Saturday is confirmed. Thank you. Bell Labs is one of the leading research groups in natural- language computers. The leading edge in this technical advance is less than it sounds. For decades, people have predicted the development of speech-proficient computers. The current systems are still crude. Bell Labs' airline-reservation system, for example, knows only 132 words. It is speaker-dependent, which means it can understand Bell Labs' David Roe. But it might not understand a foreign accent. SRI International, the research group in Menlo Park, Calif., is probably two years away from a commercial product that could understand Japanese people saying English words. But it has less than a 500-word vocabulary so far. Even these systems are a great technological leap beyond the speech-recognition systems now on the market. IBM, for example, has just introduced Voice Type, which has a vocabulary of 5,000 words plus the capacity to learn 2,000 more. It is speaker-dependent and, unlike the Bell Labs and SRI systems, it doesn't understand continuous speech. You / have / to / talk / like / this to make sure it gets it right, leaving a quarter-second pause between each word. That's fine for people with a real incentive to talk to their computers. IBM is aiming Voice Type at disabled people, giving individuals who can't type new opportunities to earn a living as data-entry clerks and even programmers. Dragon Systems Inc. in Newton, Mass., offers a similar 30,000-word system, used by disabled people and doctors and lawyers who write technical documents with repetitive paragraphs. They can tell the machine "legal paragraph one" and it will type out a preprogrammed set of sentences. They can say "Save" and the computer will store their document. Getting to a system with a 30,000-word system that understands continuous speech will take massive computing power - a machine at least 10-times more powerful than today's top-of-the-line desktop. Most people don't realize how complex spoken communication is because they do it so naturally. They pick out words from a continuous string of sound. They pick up innumerable clues from the way things are said. No computer can do that today. Scientists expect to make great strides in speech recognition in the next 10 years. Computers that understand speech will be much easier to use. But catching up with the Space Odyssey by 2001? Sorry, Hal. You're probably a pipe dream.