The secret linguistics clues researchers used to link DNC hack to Russia

Increasingly, governments and cybersecurity firms are relying on linguistic clues found in malicious code or metadata to identify lone hackers or the nations that are behind high-profile attacks.

President Vladimir Putin in Sochi, Russia, in May.

Reuters

August 1, 2016

Call it the telltale font.

For security researchers delving into the source of malicious software that infected the Democratic National Committee's computers, linguistic clues in computer fonts, messages buried in malicious applications, and even comments from the alleged culprit helped tie the attack back to Russia.

In fact, linguistics is becoming increasingly important as governments and cybersecurity firms seek to accurately identify lone hackers or the nations that are behind high-profile attacks. And the stakes for this kind of attribution are growing higher as the US has responded to recent breaches with sanctions, political pressure, and in the future could retaliate with military action.

In Kentucky, the oldest Black independent library is still making history

"In the digital world, we look at every aspect of communication," says Mario Vuksan, chief executive officer of the cybersecurity firm ReversingLabs. "From the way a hacking group connects to an asset to the way the binary code is written to text and email messages."

For instance, code could be compiled on machines that are loaded with specific languages. And hackers could tip their hand by using expressions common in certain countries or languages.

When it comes to investigating cybercrimes, techniques range from classical linguistic pursuits, such as word count analysis that examines patterns of language use, to more behavioral analysis that tries to identify unique patterns or behaviors using lexical analysis, says Steve Bongardt, a former agent in the FBI's Behavioral Analysis Unit who now works with the firm Fidelis Cybersecurity.

Mr. Bongardt likens it to investigating a crime scene, with hacking groups or individuals falling back on well-worn modus operandi that govern how an attack is carried out and less regimented "rituals" that are just as suggestive of a particular actor.

But linguistic clues often fall far short of pinning attribution for any single actor, Bongardt and others agreed. Rather, they say, governments and law enforcement agencies investigating crimes need to look to the preponderance of evidence – most of it not linguistic – as they attempt to understand who was behind an incident. 

A majority of Americans no longer trust the Supreme Court. Can it rebuild?

In the case of the DNC hack, a previously unknown hacker who identified himself as Guccifer 2.0 claimed responsibility for the breach. He said he was Romanian without any connections to the Russian government. But cybersecurity experts and tech journalists poked holes in those claims by closely analyzing his comment and other language and cultural identifiers in metadata. 

Initially, however, an early profile of the suspected DNC hackers by the cybersecurity firm CrowdStrike relied on a wealth of technical evidence to support the theory two groups with links to Russian intelligence were responsible.

CrowdStrike's analysis did not rely at all on linguistic clues. Rather, it compiled a list of 12 separate indicators of compromise that were common to the two hacking crews. They ranged from malicious programs to tools for managing malicious software and extracting sensitive data.

But after Guccifer 2.0 emerged to claim responsibility for the DNC breach, researchers soon noted subtle clues in his speech – as well as in documents offered from his website – that cast doubt on his account of the hack. For instance, the tech news site Ars Technica noted those clues ranged from Russian language text buried in the PDF format of leaked opposition research on Donald Trump.

But that kind of information is still not conclusive, says Mr. Vuksan of ReversingLabs, making attribution a challenge when it comes to cyberattacks and breaches, 

“Cyber being what it is, it’s an area where covert action can be done at different levels in many different ways,” he says. “Decoys, intelligence, and counter intelligence can all reside within the same breath.”

Still, clues buried in language in blog posts, social media, or malicious code is critical in an age when nation-backed hackers aren’t beyond using disinformation campaigns to cover their tracks.

Experts say that Guccifer 2.0's claim of credit for the DNC hack is strikingly similar to claims of responsibility following an attack on the French TV5Monde network in April 2015. After attackers took over the network's websites and displayed images promoting the Islamic State, a group calling themselves the CyberCaliphate said they were behind the breach.

However, on closer examination, the attack was carried about by the same group tied to the DNC hack, says Toni Gidwani, director of threat research operations at the firm ThreatConnect.

The purpose of such ruses isn’t to fool everyone, says Ms. Gidwani. Instead, she says, its to be "good enough" to create doubt about the prevailing narrative. "If you look at the broader Russian doctrine of cyberoperations, sowing discord is a measure of success."