Can scientists create an AI that that sees, hears, and...understands?

IBM and MIT will collaborate on the Laboratory for Brain-inspired Multimedia Machine Comprehension – a bid to help AI interpret what it sees and hears, the same way human beings do.

Mary Knox Merrill/ The Christian Science Monitor
The Ray and Maria Stata Center for Computer, Information and Intelligence Sciences at MIT was designed by Frank O. Gehry. The Stata Center houses the Computer Science and Artificial Intelligence Laboratory (CSAIL), the Laboratory for Information and Decision Systems (LIDS) and the Department of Linguistics and Philosophy.

It’s easy to take for granted the complex mental tasks human beings are constantly performing.

When we watch a baseball game, we can easily distinguish the pitcher from the mound he stands on, describe how he winds up before he flings the ball towards the plate, and even predict if the next pitch will be a hanging curveball or 100-mph fastball.

There isn't yet a machine that can comprehend such tasks that are simple to us.

IBM and the Massachusetts Institute of Technology in Cambridge hope to change that. The two organizations announced a partnership Tuesday for machines to see, hear, and interpret like humans do. The IBM-MIT Laboratory for Brain-inspired Multimedia Machine Comprehension, BM3C, for short, is a multi-year collaboration, said IBM in a news release.  

The “brain-inspired” laboratory is just one of a number of partnerships IBM is pursuing to advance artificial intelligence (AI), as the scientific community makes headway in improving machines’ abilities to think as human beings do, and, in some cases, even outperform humans. In June, researchers announced they have developed a way for a machine to predict whether two humans will greet each other with a handshake or a hug. And last year, computer scientists created a machine that’s better at creating predictive algorithms than two-thirds of its human competitors. But being able to teach a machine to see and hear like humans has so far been out of reach.

The problem is human command of sights and sounds spans multiple cognitive disciplines, explains TechCrunch’s Devin Coldewey:

Say your camera is good enough to track objects minutely – what good is it if you don’t know how to separate objects from their background? Say you can do that – what good is it if you can’t identify the objects? Then you need to establish relationships between them, intuit physical rules … all stuff our brains are especially good at. 

Because the human mind has already mastered this skill, researchers plan to model machine vision on virtual neural networks based on the real thing.

They expect that a machine-vision system could have big applications for industries such as healthcare, education, and entertainment.

Other MIT researchers have already made advances in computer prediction and comprehension. Researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) taught a machine to predict how humans would greet each other. After showing the system 600 hours of raw footage from YouTube videos and television shows, it correctly guessed how people would greet each other 43 percent of the time, as Eva Botwin-Kowacki wrote for The Christian Science Monitor in June. In the same experiment, humans guessed right 71 percent of the time.

Two other researchers at MIT also created a Data Science Machine that can find patterns and select which data points are relevant, all without the problem-solving help of humans, wrote Kelsey Warner for the Monitor in October 2015.

“But the win-loss record was not the most impressive takeaway from the competitions,” writes Ms. Warner. “While teams of humans sweat their predictive algorithms for months leading up to competition, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.”

The scientific community hasn’t yet combined machine efficiency with human understanding. But IBM is creating a network of university research collaboration with the aim of achieving this goal, said the press release. 

You've read  of  free articles. Subscribe to continue.

Dear Reader,

About a year ago, I happened upon this statement about the Monitor in the Harvard Business Review – under the charming heading of “do things that don’t interest you”:

“Many things that end up” being meaningful, writes social scientist Joseph Grenny, “have come from conference workshops, articles, or online videos that began as a chore and ended with an insight. My work in Kenya, for example, was heavily influenced by a Christian Science Monitor article I had forced myself to read 10 years earlier. Sometimes, we call things ‘boring’ simply because they lie outside the box we are currently in.”

If you were to come up with a punchline to a joke about the Monitor, that would probably be it. We’re seen as being global, fair, insightful, and perhaps a bit too earnest. We’re the bran muffin of journalism.

But you know what? We change lives. And I’m going to argue that we change lives precisely because we force open that too-small box that most human beings think they live in.

The Monitor is a peculiar little publication that’s hard for the world to figure out. We’re run by a church, but we’re not only for church members and we’re not about converting people. We’re known as being fair even as the world becomes as polarized as at any time since the newspaper’s founding in 1908.

We have a mission beyond circulation, we want to bridge divides. We’re about kicking down the door of thought everywhere and saying, “You are bigger and more capable than you realize. And we can prove it.”

If you’re looking for bran muffin journalism, you can subscribe to the Monitor for $15. You’ll get the Monitor Weekly magazine, the Monitor Daily email, and unlimited access to

QR Code to Can scientists create an AI that that sees, hears, and...understands?
Read this article in
QR Code to Subscription page
Start your subscription today