Skip to: Content
Skip to: Site Navigation
Skip to: Search

Hearing voices

Evolving science of vocal tones catches up to what baby knows

By Mary WiltenburgStaff writer of The Christian Science Monitor / February 13, 2003

They can say they want to live. But for Kim Cates, founder of a Boston-based suicide hotline, the question of whether her callers are a danger to themselves really comes down to how they say it.

Skip to next paragraph

"Because stuff comes out that they don't even know themselves," she says. When they're too upset to be lucid, their tone "gives them away."

Though the stakes are rarely this high, we all make such judgments about strangers based on their voices. Every conversation we have carries a subtext that would be invisible to someone reading its script: the uptilt to a question, the long sneer of sarcasm, or the quaver of uncertainty. Only 7 percent of the meaning of what people say comes across in the words they choose, says psychologist Albert Mehrabian, who has spent the past four decades researching communication. More than five times as important is what their voices convey.

The study of these vocal cues has lately come to the fore because of a growing commercial demand for speech-recognition software. Used for everything from taking basic dictation to unlocking sophisticated security systems, these programs rely on an understanding of what most of the linguists, engineers, computer scientists, and speech therapists involved in its study term "prosody": the intonation, stress, and rhythm that make up the music of a voice.

Even now, some computer systems well- attuned to prosodic cues can speak and "listen" in an eerily "natural" way. Last year, Amtrak replaced a touch-tone phone system, which had driven callers crazy, with "Julie," a software package whose designers and users talk about as though "she" were human. "She's been very popular," says Amtrak spokeswoman Karina VanVeen. "Some people don't realize she's a computer 'til halfway into the call."

But even before computer research forced the issue, the rest of us were subliminally studying prosody. Most kids begin to learn this language - more vital than the vocabulary it underlies - as they're learning to talk, by mimicking the adults around them. From there, it runs its mostly unconscious course right into adulthood, every year adding increasingly complex layers of meaning to conversations with family, friends, and co-workers - and worlds of conjecture to phone conversations with strangers.

"We're always listening with a third ear," says professor of speech and hearing sciences Moya Andrews, "for anger, for the subtext of a conversation, for a sympathetic voice."

Before she went blind a quarter-century ago Cheryl Linnear painted portraits. Even today, she says, when she meets a new person and hears their voice for the first time, she pictures them: "Do they have those lines on their forehead that mean they frown a lot, or those worry lines around their mouth? You learn a lot by looking at people."

But over the years, Ms. Linnear says, she's come to regard those visual cues as secondary to her understanding of people - and not just because she can't see them. If you're really listening, she explains, "it's like the voice gets inside the soul - it leads you there a lot more than if you just looked at someone."

Speech therapists insist that visual cues are for most people a major component of communication. (In his book "Silent Messages," Dr. Mehrabian finds that gesture and facial expression account for 55 percent of the meaning of speech, prosodic cues for 38.) But all agree that a sensitivity to vocal cues is critical to emotional understanding.

"Think, for example, about how many different ways you can say 'I love you,' " says Dr. Andrews. "You can say it scornfully, you can say it playfully. So many different ways, and it all depends on vocal behaviors that are not included in the text."

Don't use that tone with me

When the first computer speech-simulation programs came out in the 1950s and '60s, they were without even a nod to prosodic subtleties. Every syllable had the same length, emphasis, and tone: The result was "that flat robotic monotone from early sci-fi movies," remembers Robert Ladd, professor of linguistics at the University of Edinburgh in Scotland.