First Look

Can scientists create an AI that that sees, hears, and...understands?

IBM and MIT will collaborate on the Laboratory for Brain-inspired Multimedia Machine Comprehension – a bid to help AI interpret what it sees and hears, the same way human beings do.

Mary Knox Merrill/ The Christian Science Monitor

The Ray and Maria Stata Center for Computer, Information and Intelligence Sciences at MIT was designed by Frank O. Gehry. The Stata Center houses the Computer Science and Artificial Intelligence Laboratory (CSAIL), the Laboratory for Information and Decision Systems (LIDS) and the Department of Linguistics and Philosophy.

By Ben Rosen Staff

September 20, 2016

It’s easy to take for granted the complex mental tasks human beings are constantly performing.

When we watch a baseball game, we can easily distinguish the pitcher from the mound he stands on, describe how he winds up before he flings the ball towards the plate, and even predict if the next pitch will be a hanging curveball or 100-mph fastball.

There isn't yet a machine that can comprehend such tasks that are simple to us.

IBM and the Massachusetts Institute of Technology in Cambridge hope to change that. The two organizations announced a partnership Tuesday for machines to see, hear, and interpret like humans do. The IBM-MIT Laboratory for Brain-inspired Multimedia Machine Comprehension, BM3C, for short, is a multi-year collaboration, said IBM in a news release.

The “brain-inspired” laboratory is just one of a number of partnerships IBM is pursuing to advance artificial intelligence (AI), as the scientific community makes headway in improving machines’ abilities to think as human beings do, and, in some cases, even outperform humans. In June, researchers announced they have developed a way for a machine to predict whether two humans will greet each other with a handshake or a hug. And last year, computer scientists created a machine that’s better at creating predictive algorithms than two-thirds of its human competitors. But being able to teach a machine to see and hear like humans has so far been out of reach.

The problem is human command of sights and sounds spans multiple cognitive disciplines, explains TechCrunch’s Devin Coldewey:

Say your camera is good enough to track objects minutely – what good is it if you don’t know how to separate objects from their background? Say you can do that – what good is it if you can’t identify the objects? Then you need to establish relationships between them, intuit physical rules … all stuff our brains are especially good at.

Because the human mind has already mastered this skill, researchers plan to model machine vision on virtual neural networks based on the real thing.

They expect that a machine-vision system could have big applications for industries such as healthcare, education, and entertainment.

Other MIT researchers have already made advances in computer prediction and comprehension. Researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) taught a machine to predict how humans would greet each other. After showing the system 600 hours of raw footage from YouTube videos and television shows, it correctly guessed how people would greet each other 43 percent of the time, as Eva Botwin-Kowacki wrote for The Christian Science Monitor in June. In the same experiment, humans guessed right 71 percent of the time.

Two other researchers at MIT also created a Data Science Machine that can find patterns and select which data points are relevant, all without the problem-solving help of humans, wrote Kelsey Warner for the Monitor in October 2015.

“But the win-loss record was not the most impressive takeaway from the competitions,” writes Ms. Warner. “While teams of humans sweat their predictive algorithms for months leading up to competition, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.”

The scientific community hasn’t yet combined machine efficiency with human understanding. But IBM is creating a network of university research collaboration with the aim of achieving this goal, said the press release.

Why is Christian Science in our name?

Israeli protesters are back on their feet. Missing is a unified voice.

West Africa’s model of ballots over bullets

Are world’s 200 million pastoral herders a climate threat?

Israeli protesters are back on their feet. Missing is a unified voice.

Russia tried to stay on good terms with Iran and Israel. Then they started fighting.

After years of debate, UK Parliament signs off on sending migrants to Rwanda

Your subscription makes our work possible.

Can scientists create an AI that that sees, hears, and...understands?

Get stories that empower and uplift daily.

Can scientists create an AI that that sees, hears, and...understands?

Help fund Monitor journalism for $11/ month

Unlimited digital access $11/month.

Digital subscription includes:

Related stories

Test your knowledge Are you scientifically literate? Take our quiz

What will artificial intelligence look like in 15 years?

AI in the real world: Tech leaders consider practical issues.

Get stories that empower and uplift daily.

Subscription expired

Session expired

No subscription

Get stories that
empower and uplift daily.

Get stories that
empower and uplift daily.