Google fights spam with artificial intelligence

Gmail announced that it had prevented 99.9 percent of spam from reaching inboxes, but will artificial intelligence help bridge that final tenth of a percent?

Google DeepMind
Google's new AI program is capable of learning from experience, much like a human brain.

The robot wars will be won by spambots, unless Google engineers have anything to say about it.

The company announced in its Gmail blog on Thursday that it has been using Google’s artificial neural network to help with e-mail spam filtering. Already, the company says that it’s been able to block 99.9 percent of spam from reaching inboxes, while incorrectly classifying legitimate e-mail as spam only 0.05 percent of the time.

And it’s all thanks to data collection.

For the most part, Google’s system is based on Gmail’s “report spam” and “not spam” buttons. By taking this user input and referencing other user actions, the Internet giant can learn what counts as spam and what doesn’t. For e-mails that were sent with maliciousness intent, the server can learn, parse, and redirect from the inbox.

But spam can still make it past blockers through a variety of ways, the company says. Often, spam succeeds by using previously unaccounted domains (new ones such as .xyz or .horse can get past filters) or by mimicking desired e-mails (or “ham”). Despite new filters, spammers find ways to circumvent them.

Though we may not have completely eradicated spam as computer scientists had thought we would, Internet companies have been able to at least limit its pervasiveness.

The remaining problem lies not in detecting which e-mails are junk. “Blacklisting is an efficient anti-spam mechanism, but is becoming more and more prone to false positives,” reads a paper from MIT’s Spam Conference 2010, which brought experts together to discuss the future of spam detection. Often times, the “coarse granularity” of blacklists sweep non-malicious addresses into the junk bin, the report says.

And even with whitelists, or lists of approved online addresses, the report asserts that services are just using heuristics to curb spam rather than addressing any computational approach.

So Google is using its “neural network” – a series of learning supercomputers designed to “think” and identify imagery – to detect spam and help close that remaining tenth of a percent of error.

This type of artificial intelligence is grown from a type of machine learning known as “deep learning.” These types of neural networks attempt to mimic higher-level thought and abstraction, and many see it as one of the roots for development of artificial intelligence.

Google thinks this can stop junk. Instead of utilizing white- or blacklists to identify spam or ham e-mails, its neural network can use natural-language processing and information from other users to draw conclusions about the messages being analyzed.

But neural networks have their own problems, says Anselm Blumer, associate professor of computer science at Tufts University. To Dr. Blumer, these artificial “neural networks” approach learning from a perspective that is wholly different from how people actually think.

Neural networks apply limited layers of computation to draw conclusions and learn, which is different from the distributed, varied, and compounded approach that brains take, says Blumer, whose research is in machine learning, artificial intelligence (AI), and human-computer interaction.

“In a sense, neural networks is a bad name,” Blumer says. The computation simplifies the process, but deep-learning researchers are hoping that their work can pave the way for improved AI. With increased layers of abstraction, computational processes may do better to mimic human thought in understanding and creating concepts. Google’s AI recently demonstrated this type of learning in identifying images of cats (even though it had never seen a cat before). With more layers, computers may soon be able to learn at the same pace as humans.

Even so, decision making using neural networks can lead to what Blumer calls “overfitting.”

“A network like that is harder to train, and it’s much easier for it to come to false conclusions,” he says.

Like the spam filters creating “false positives” of junk, artificial neural networks are looking to create things out of what it can find. If there isn’t anything there, it may run into the same problems that Gmail’s filter is currently facing.

Or it may decide that horses are full of dogs, like Google’s neural network did last month.

The problem, Blumer says, lies in the computing. It’s hard to program a computer that doesn’t come to false conclusions, without also making sure not to miss any real ones.

Like a human bias, it’s easy for this artificial intelligence to prefer simpler conclusions. The difficulty lies in understanding and weighing nuance of semantic language – a problem that has plagued spam researchers for years.

But as the computing power for artificial intelligence improves, so will the spambots.

For right now, Google is focusing on improving its neural network to properly fit the needs of its users, and it will continue to improve data using the “spam” versus “not spam” filters in Gmail. And like the human brain, the more it learns, the more accurate its actions will be.

You've read  of  free articles. Subscribe to continue.
Real news can be honest, hopeful, credible, constructive.
What is the Monitor difference? Tackling the tough headlines – with humanity. Listening to sources – with respect. Seeing the story that others are missing by reporting what so often gets overlooked: the values that connect us. That’s Monitor reporting – news that changes how you see the world.

Dear Reader,

About a year ago, I happened upon this statement about the Monitor in the Harvard Business Review – under the charming heading of “do things that don’t interest you”:

“Many things that end up” being meaningful, writes social scientist Joseph Grenny, “have come from conference workshops, articles, or online videos that began as a chore and ended with an insight. My work in Kenya, for example, was heavily influenced by a Christian Science Monitor article I had forced myself to read 10 years earlier. Sometimes, we call things ‘boring’ simply because they lie outside the box we are currently in.”

If you were to come up with a punchline to a joke about the Monitor, that would probably be it. We’re seen as being global, fair, insightful, and perhaps a bit too earnest. We’re the bran muffin of journalism.

But you know what? We change lives. And I’m going to argue that we change lives precisely because we force open that too-small box that most human beings think they live in.

The Monitor is a peculiar little publication that’s hard for the world to figure out. We’re run by a church, but we’re not only for church members and we’re not about converting people. We’re known as being fair even as the world becomes as polarized as at any time since the newspaper’s founding in 1908.

We have a mission beyond circulation, we want to bridge divides. We’re about kicking down the door of thought everywhere and saying, “You are bigger and more capable than you realize. And we can prove it.”

If you’re looking for bran muffin journalism, you can subscribe to the Monitor for $15. You’ll get the Monitor Weekly magazine, the Monitor Daily email, and unlimited access to

QR Code to Google fights spam with artificial intelligence
Read this article in
QR Code to Subscription page
Start your subscription today