Google fights spam with artificial intelligence

Gmail announced that it had prevented 99.9 percent of spam from reaching inboxes, but will artificial intelligence help bridge that final tenth of a percent?

Google's new AI program is capable of learning from experience, much like a human brain.

Google DeepMind

July 13, 2015

The robot wars will be won by spambots, unless Google engineers have anything to say about it.

The company announced in its Gmail blog on Thursday that it has been using Google’s artificial neural network to help with e-mail spam filtering. Already, the company says that it’s been able to block 99.9 percent of spam from reaching inboxes, while incorrectly classifying legitimate e-mail as spam only 0.05 percent of the time.

And it’s all thanks to data collection.

From Grace Hopper to Ada Lovelace: women who revolutionized computer science

For the most part, Google’s system is based on Gmail’s “report spam” and “not spam” buttons. By taking this user input and referencing other user actions, the Internet giant can learn what counts as spam and what doesn’t. For e-mails that were sent with maliciousness intent, the server can learn, parse, and redirect from the inbox.

But spam can still make it past blockers through a variety of ways, the company says. Often, spam succeeds by using previously unaccounted domains (new ones such as .xyz or .horse can get past filters) or by mimicking desired e-mails (or “ham”). Despite new filters, spammers find ways to circumvent them.

Though we may not have completely eradicated spam as computer scientists had thought we would, Internet companies have been able to at least limit its pervasiveness.

The remaining problem lies not in detecting which e-mails are junk. “Blacklisting is an efficient anti-spam mechanism, but is becoming more and more prone to false positives,” reads a paper from MIT’s Spam Conference 2010, which brought experts together to discuss the future of spam detection. Often times, the “coarse granularity” of blacklists sweep non-malicious addresses into the junk bin, the report says.

And even with whitelists, or lists of approved online addresses, the report asserts that services are just using heuristics to curb spam rather than addressing any computational approach.

From Grace Hopper to Ada Lovelace: women who revolutionized computer science

So Google is using its “neural network” – a series of learning supercomputers designed to “think” and identify imagery – to detect spam and help close that remaining tenth of a percent of error.

This type of artificial intelligence is grown from a type of machine learning known as “deep learning.” These types of neural networks attempt to mimic higher-level thought and abstraction, and many see it as one of the roots for development of artificial intelligence.

Google thinks this can stop junk. Instead of utilizing white- or blacklists to identify spam or ham e-mails, its neural network can use natural-language processing and information from other users to draw conclusions about the messages being analyzed.

But neural networks have their own problems, says Anselm Blumer, associate professor of computer science at Tufts University. To Dr. Blumer, these artificial “neural networks” approach learning from a perspective that is wholly different from how people actually think.

Neural networks apply limited layers of computation to draw conclusions and learn, which is different from the distributed, varied, and compounded approach that brains take, says Blumer, whose research is in machine learning, artificial intelligence (AI), and human-computer interaction.

“In a sense, neural networks is a bad name,” Blumer says. The computation simplifies the process, but deep-learning researchers are hoping that their work can pave the way for improved AI. With increased layers of abstraction, computational processes may do better to mimic human thought in understanding and creating concepts. Google’s AI recently demonstrated this type of learning in identifying images of cats (even though it had never seen a cat before). With more layers, computers may soon be able to learn at the same pace as humans.

Even so, decision making using neural networks can lead to what Blumer calls “overfitting.”

“A network like that is harder to train, and it’s much easier for it to come to false conclusions,” he says.

Like the spam filters creating “false positives” of junk, artificial neural networks are looking to create things out of what it can find. If there isn’t anything there, it may run into the same problems that Gmail’s filter is currently facing.

Or it may decide that horses are full of dogs, like Google’s neural network did last month.

The problem, Blumer says, lies in the computing. It’s hard to program a computer that doesn’t come to false conclusions, without also making sure not to miss any real ones.

Like a human bias, it’s easy for this artificial intelligence to prefer simpler conclusions. The difficulty lies in understanding and weighing nuance of semantic language – a problem that has plagued spam researchers for years.

But as the computing power for artificial intelligence improves, so will the spambots.

For right now, Google is focusing on improving its neural network to properly fit the needs of its users, and it will continue to improve data using the “spam” versus “not spam” filters in Gmail. And like the human brain, the more it learns, the more accurate its actions will be.