Most people have occasionally experienced an awkward hug/handshake combination where one person goes in for a hug while the other extends a hand to shake. Still, more often, humans are able to anticipate how to meet another person's greeting, thanks to years of experience with human interaction.
But can a machine develop the same kind of intuition?
Researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) in Cambridge set out to train a computer to be able to predict how people will greet each other. And their algorithm can do just that.
The system studied 600 hours of raw footage from YouTube videos and television shows like "The Office" and "Desperate Housewives." Then, when shown previously-unseen footage, the algorithm was able to predict how people would greet each other accurately over 43 percent of the time when the video was one second away from the greeting.
By contrast, human subjects correctly predicted the greeting 71 percent of the time.
"Humans automatically learn to anticipate actions through experience, which is what made us interested in trying to imbue computers with the same sort of common sense," Carl Vondrick, a CSAIL PhD student who is the first author on a paper to be presented this week at the International Conference on Computer Vision and Pattern Recognition, said in a press release. "We wanted to show that just by watching large amounts of video, computers can gain enough knowledge to consistently make predictions about their surroundings."
"There’s a lot of subtlety to understanding and forecasting human interactions," he said. "We hope to be able to work off of this example to be able to soon predict even more complex tasks."
The accuracy of the algorithm still has to be improved before it will have any practical uses, Mr. Vondrick said. But, if improved, this technology could help robots interact with humans, improve predictive emergency response systems, or provide real-time social advice via Google Glass-like technologies.
But for now we might just have to settle for spoilers seconds ahead of moments in our favorite television shows.
"I’m excited to see how much better the algorithms get if we can feed them a lifetime’s worth of videos," Vondrick said. "We might see some significant improvements that would get us closer to using predictive-vision in real-world situations."