MIT's CodePhage helps computers automatically detect, devour their own bugs

Massachusetts Institute of Technology researchers presented a system to detect bugs in programs as they run and repair them by borrowing functionality from other applications.

Andrew Kelly / Reuters

July 10, 2015

Researchers from Massachusetts Institute of Technology say they've developed a way to automatically find and fix one of the most common types of software bugs without even seeing a program's source code. Instead, it detects glitches as programs run, and repairs them by borrowing functionality from other applications. 

"It’s actually surprising that it works," says Emery Berger, a University of Massachusetts at Amherst professor who served on the review committee at the mid-June Association for Computing Machinery conference where the repair system, dubbed CodePhage, debuted.

The bug, known as a buffer overflow, has been a nuisance for programmers and security professionals for decades. It was the heart of malware and vulnerabilities ranging from the famous Morris Worm in the 1980s to widely publicized exploits named Venom and Ghost this year. Depending on how and where an overflow is used, it could theoretically give an attacker complete control over a system. 

OK, she’s worth $1 billion, but can Taylor Swift write poetry? We ask the experts.

The flaw happens when more data is directed to a block of memory than it can handle – and, therefore, it overflows. The extra data overwrites whatever is stored in the next block of memory. That could mean changing information used in a program or adding in commands to run malicious programs. 

When an overflow occurs, the CodePhage system searches public archives of open source software for a program that handles the same type of data without overflowing. It then adapts the data entry portion of the working program into a patch for the broken one. In CodePhage's designers' jargon, it finds a "donor" for the ailing application.

Still, finding a compatible program is a challenge. Most software isn't designed using interchangeable parts. Similar segments of different programs can be wildly different — they can use different names, different parameters, and incorporate different functionalities.

"It's a lot like playing hide-and-go-seek in the dark and not knowing who else is playing the game," says MIT computer science Prof. Martin Rinard, who worked on the team at MIT's Computer Science and Artificial Intelligence Laboratory headed by research scientist Stelios Sidiroglou-Douskos.

CodePhage evaluates exactly what code to take from a donor by tracing the path of input data as it works its way into a computer's memory. 

Columbia’s president called the police. Students say they don’t know who to trust.

Dr. Rinard hopes that automating the detection and patching processes will reduce the chance that an attacker can take advantage of the bug within a system. "One of the biggest problems is the significant delay after discovering a vulnerability until it’s patched,” he says. CodePhage makes the patching process much faster, Rinard says, because it "takes the human out of the process.”

That's critical because, while many operating systems and programming languages have gotten better about handling buffer overflows, problems remain incredibly common, says Amol Sarwate, director of vulnerability research at Qualys, a security firm that discovered the vulnerability known as Ghost that was caused by a buffer overflow problem. 

"If [CodePhage] could get rid of buffer overflows,” says Mr. Sarwate, “it would get rid of the majority of headaches for people who defend networks.”

CodePhage is still only a lab-tested prototype, notes Professor Berger from the UMAss and with the complexities of code, problems could arise as it is used at the scale of a production product. Still, he's optimistic it can succeed. So are the scientists at MIT CSAIL, who believe they can adapt CodePhage to be used for any situation where coders make mistakes, even less critical ones such as writing inefficient code.