How Google's neural network will improve YouTube

Researchers from Google unveiled software that analyzes videos using a machine learning technology called a neural network that some argue could lead to truly artificial intelligence.

|
Dado Ruvic/ Reuters file
A picture illustration shows a YouTube logo reflected in a person's eye, in central Bosnian town of Zenica in June 2014. Google researchers last week unveiled software that automatically picks a high-quality thumbnail for a YouTube video using a machine learning technology called a deep neural network.

For many people who post videos on YouTube, an enticing thumbnail can make or break whether viewers decide to click on a video or scroll to the next one. But what if the video sharing site could pick the best image for each video automatically?

That was the question researchers from Google, which owns YouTube, attempted to answer recently by feeding thousands of high-quality images into a computer in order to train it to do what photographers would likely argue is a high subjective task: select the best quality photo from each video on the site.

The company unveiled its automatic “thumbnailer” in a blog post last Thursday, explaining that the tool analyzes a user’s video at one frame per second, giving each frame a score. The software then selects the thumbnails with the highest quality scores and displays them.

The technology builds on Google’s deep neural network of supercomputers that the company has been training to “think” and recognize images – such as identifying videos of cats on YouTube, even though the computer had no previous information about what a cat looked like.

Neural networks are one part of advances in so-called deep learning, a subset of machine learning that is designed to mimic higher-level thought and abstraction and may be one path toward developing truly artificial intelligence, the Monitor reported in July.

But determining a high quality photo is an additional challenge, the researchers say.

“Unlike the task of identifying if a video contains your favorite animal, judging the visual quality of a video frame can be very subjective - people often have very different opinions and preferences when selecting frames as video thumbnails,” wrote Weilong Yang from Google’s Video Content Analysis team and Min-hsuan Tsai from the YouTube Creator team.

Part of the issue is that neural networks function in a different manner than the human brain, some researchers say.  Like humans, neural networks are designed to learn “by example,” wrote Imperial College London researchers Christos Stergiou and Dimitrios Siganos, in a 2011 guide.

But to do this, they apply limited layers of computation to draw conclusions and perform specific tasks – such as pattern recognition – which differ from the “distributed, varied and compounded approach” used by human brains, Tufts University computer science professor Anselm Blumer told the Monitor in July.

This leads to an issue called “overfitting,” where “the network has memorized the training examples, but it has not learned to generalize to new situations,” a guide from the software developer Mathworks explains.

For Google’s network, this has lead to some downright bizarre results when researchers inputted images and video into the system, leading the computer to create new, artistic images of its own that often scarcely resembled the original. In July, researchers unveiled a series of hallucinatory images created by the software, such as horses sprouting dog's heads and brightly-colored glowing temples.

“A network like that is harder to train, and it’s much easier for it to come to false conclusions,” Professor Blumer told the Monitor.

The goal for machine learning researchers focused on neural networks is to introduce additional layers of abstraction into the process, better mimicking human brains in understanding concepts – like photographic composition or what may define an e-mail as spam – and allowing the computer to “learn” how to apply them.

In the case of YouTube’s thumbnail software, this training process appears to be succeeding.

In order to ensure the computer could distinguish high-quality images from low-quality ones, the Google researchers uploaded custom thumbnails created by YouTube users – which tended to be well-framed and in-focus, designating these as high quality while contrasting them with “low quality” images selected randomly from a sampling of videos.

This allowed the computer to “learn” about nuances of framing and composition, as well as gaining the ability to favor images that emphasized a central character in the video – such as a music video performer, or a family pet, two examples the researchers showed in their blog post.

They also put the new images to a more subjective test, showing them to human subjects side by side with images from YouTube’s previous thumbnail software. People who looked at the two sets of images preferred the images selected by the neural network more than 65 percent of the time, the researchers wrote.

You've read  of  free articles. Subscribe to continue.
Real news can be honest, hopeful, credible, constructive.
What is the Monitor difference? Tackling the tough headlines – with humanity. Listening to sources – with respect. Seeing the story that others are missing by reporting what so often gets overlooked: the values that connect us. That’s Monitor reporting – news that changes how you see the world.

Dear Reader,

About a year ago, I happened upon this statement about the Monitor in the Harvard Business Review – under the charming heading of “do things that don’t interest you”:

“Many things that end up” being meaningful, writes social scientist Joseph Grenny, “have come from conference workshops, articles, or online videos that began as a chore and ended with an insight. My work in Kenya, for example, was heavily influenced by a Christian Science Monitor article I had forced myself to read 10 years earlier. Sometimes, we call things ‘boring’ simply because they lie outside the box we are currently in.”

If you were to come up with a punchline to a joke about the Monitor, that would probably be it. We’re seen as being global, fair, insightful, and perhaps a bit too earnest. We’re the bran muffin of journalism.

But you know what? We change lives. And I’m going to argue that we change lives precisely because we force open that too-small box that most human beings think they live in.

The Monitor is a peculiar little publication that’s hard for the world to figure out. We’re run by a church, but we’re not only for church members and we’re not about converting people. We’re known as being fair even as the world becomes as polarized as at any time since the newspaper’s founding in 1908.

We have a mission beyond circulation, we want to bridge divides. We’re about kicking down the door of thought everywhere and saying, “You are bigger and more capable than you realize. And we can prove it.”

If you’re looking for bran muffin journalism, you can subscribe to the Monitor for $15. You’ll get the Monitor Weekly magazine, the Monitor Daily email, and unlimited access to CSMonitor.com.

QR Code to How Google's neural network will improve YouTube
Read this article in
https://www.csmonitor.com/Technology/2015/1013/How-Google-s-neural-network-will-improve-YouTube
QR Code to Subscription page
Start your subscription today
https://www.csmonitor.com/subscribe