For many people who post videos on YouTube, an enticing thumbnail can make or break whether viewers decide to click on a video or scroll to the next one. But what if the video sharing site could pick the best image for each video automatically?
That was the question researchers from Google, which owns YouTube, attempted to answer recently by feeding thousands of high-quality images into a computer in order to train it to do what photographers would likely argue is a high subjective task: select the best quality photo from each video on the site.
The company unveiled its automatic “thumbnailer” in a blog post last Thursday, explaining that the tool analyzes a user’s video at one frame per second, giving each frame a score. The software then selects the thumbnails with the highest quality scores and displays them.
The technology builds on Google’s deep neural network of supercomputers that the company has been training to “think” and recognize images – such as identifying videos of cats on YouTube, even though the computer had no previous information about what a cat looked like.
Neural networks are one part of advances in so-called deep learning, a subset of machine learning that is designed to mimic higher-level thought and abstraction and may be one path toward developing truly artificial intelligence, the Monitor reported in July.
But determining a high quality photo is an additional challenge, the researchers say.
“Unlike the task of identifying if a video contains your favorite animal, judging the visual quality of a video frame can be very subjective - people often have very different opinions and preferences when selecting frames as video thumbnails,” wrote Weilong Yang from Google’s Video Content Analysis team and Min-hsuan Tsai from the YouTube Creator team.
Part of the issue is that neural networks function in a different manner than the human brain, some researchers say. Like humans, neural networks are designed to learn “by example,” wrote Imperial College London researchers Christos Stergiou and Dimitrios Siganos, in a 2011 guide.
But to do this, they apply limited layers of computation to draw conclusions and perform specific tasks – such as pattern recognition – which differ from the “distributed, varied and compounded approach” used by human brains, Tufts University computer science professor Anselm Blumer told the Monitor in July.
This leads to an issue called “overfitting,” where “the network has memorized the training examples, but it has not learned to generalize to new situations,” a guide from the software developer Mathworks explains.
For Google’s network, this has lead to some downright bizarre results when researchers inputted images and video into the system, leading the computer to create new, artistic images of its own that often scarcely resembled the original. In July, researchers unveiled a series of hallucinatory images created by the software, such as horses sprouting dog's heads and brightly-colored glowing temples.
“A network like that is harder to train, and it’s much easier for it to come to false conclusions,” Professor Blumer told the Monitor.
The goal for machine learning researchers focused on neural networks is to introduce additional layers of abstraction into the process, better mimicking human brains in understanding concepts – like photographic composition or what may define an e-mail as spam – and allowing the computer to “learn” how to apply them.
In the case of YouTube’s thumbnail software, this training process appears to be succeeding.
In order to ensure the computer could distinguish high-quality images from low-quality ones, the Google researchers uploaded custom thumbnails created by YouTube users – which tended to be well-framed and in-focus, designating these as high quality while contrasting them with “low quality” images selected randomly from a sampling of videos.
This allowed the computer to “learn” about nuances of framing and composition, as well as gaining the ability to favor images that emphasized a central character in the video – such as a music video performer, or a family pet, two examples the researchers showed in their blog post.
They also put the new images to a more subjective test, showing them to human subjects side by side with images from YouTube’s previous thumbnail software. People who looked at the two sets of images preferred the images selected by the neural network more than 65 percent of the time, the researchers wrote.