Distributing malware by attaching tainted documents to emails is one of the oldest tricks in the book. It’s not just a theoretical risk—real attackers use malicious documents to infect targets all the time. So on top of its anti-spam and anti-phishing efforts, Gmail expanded its malware detection capabilities at the end of last year to include more tailored document monitoring. Good news, it’s working.
At the RSA security conference in San Francisco on Tuesday, Google’s security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It’s challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous. Google says that 63 percent of the malicious documents it blocks each day are different than the ones its systems flagged the day before. But this is exactly the type of pattern-recognition problem where deep learning can be helpful.
Currently 56 percent of malware threats against Gmail users come from Microsoft Office documents, and 2 percent come from PDFs. In the months that it’s been active, the new scanner has increased its daily malicious Office document detection by 10 percent.
“Ten percent matters,” Bursztein told WIRED. “We’re trying to close the gap as much as possible. We want to keep adding machine learning everywhere we can, where it makes sense. Machine learning does amazing things sometimes, but sometimes it’s overhyped. We try to use it as an extra layer rather than the only layer. We think that works way better.”
The document analyzer looks for common red flags, probes files if they have components that may have been purposefully obfuscated, and does other checks like examining macros—the tool in Microsoft Word documents that chains commands together in a series and is often used in attacks. The volume of malicious documents that attackers send out varies widely day to day. Bursztein says that since its deployment, the document scanner has been particularly good at flagging suspicious documents sent in bursts by malicious botnets or through other mass distribution methods. He was also surprised to discover how effective the scanner is at analyzing Microsoft Excel documents, a complicated file format that can be difficult to assess.
Though a 10 percent detection increase may not sound like a lot, it’s a massive improvement at the scale Google is working on, and any gains are productive given that the threat of malicious documents is a real concern around the world. Bursztein says that companies and nonprofits are three times more likely to be targeted by malicious documents than other organizations, and that government entities are five times more likely. Some industries are more likely than others to be targeted, as well. Transportation and critical infrastructure utilities, for example, have a much higher risk than the education sector.
The prevalence of malicious document attacks varies around the world, but for attackers the approach is always an option. Bursztein points out that kits for crafting malicious documents and tailoring them to evade antivirus scanners are readily available in online criminal forums, ranging in price from about $400 to $5,000.
While the scanner is catching more malicious documents than ever, Bursztein and his colleagues will continue to refine it in the hopes of blocking an even bigger chunk of the malware sent to Gmail accounts worldwide.
“Malware is something we did after spam and phishing, because malware is a bit harder,” he says. “We don’t have the malware itself in an email; the documents are all we have at that point. But we always want to improve our detection capabilities and with malicious documents we chose the one where we could make the most impact for our users.”
When a full-blown hack is just a rogue Word document download away, users will take whatever extra protections they can get.
More Great WIRED Stories