Monday, 26 October 2015

RTF Exploit Document Extraction


On April 21, 2015, Microsoft released MS15-033.  In this patch release was a fix for a memory corruption error present in the SmartTag parsing library of MS Office 2007/2010/2013.  This vulnerability was assigned CVE-2015-1641.

I won't rehash the technicalities here, as others have already done a thorough job documenting the specifics (see here and here).  The important part for the purpose of this discussion is the makeup of the exploit document itself, and perhaps moreso that of the encoded binary data used to hide the intended malware payload.

Note: if you want to follow along here is a sample (hash: aafc9a94b1676172dfc55ae5660b5263888c603821b73b6d2540a7578b67c431)which exhibits the described behaviour.

RTF Document

I examined dozens of documents which exploit this vulnerability and they appear to have consistent structures, suggesting that either they are being built with a kit or builder of some sort or that they are all being constructed via modification of an initial in-the-wild example or proof of concept.

The documents are RTF format containing 4 objects: a reference to an OLE control (otkloadr.dll - see here for why) and 3 Microsoft Word document objects. As well, there is a large binary data chunk appended to the end of the file, after the valid RTF structure ends.

One of the Word documents is responsible for triggering the vulnerability, and another contains a large binary component which is used in a heap spray.  This binary data contains a rop-chain and a first stage shellcode. The first stage shellcode locates and executes the second stage shellcode from the large binary chunk, and this second stage is responsible for decrypting itself.

Second stage shellcode, pre-decryption
Second stage shellcode, decrypted, showing ROR7 hash based API lookups
(resolution of hashes to API calls using FLARE team ShellCode Hashes ida-python plugin)

Once decrypted, this shellcode is responsible for some key actions:

  1. Locate, decrypt, and execute the malware binary payload.
  2. Patch some key bytes in the registry to mask the MS Word crash (pursuant to the exploit)
  3. Locate, decrypt and display the decoy document. 

The malware payload and decoy document are both contained inside the large binary segment appended to the end of the RTF file.  The steps for locating and decrypting both the malware and the decoy are also found inside this second stage shellcode:

Second stage shellcode, locating and decrypting malware payload
The shellcode searches for a sequence of 0xBA bytes, and empirical evidence has shown that the binary chunk typically contains 7 consecutive 0xBA bytes to mark the start of the encrypted payload (although the shellcode does not explicitly require all seven).  The shellcode then decrypts each double word of data by XOR'ing it with the value 0xCAFEBABE and moves it to a buffer.  The end marker of 0xBBBBBBBB is sought, and again in practice a consecutive run of seven 0xBB bytes is typically seen in the samples.

Finally, using similar techniques, the shellcode extracts everything between the 0xBB markers and a run of 0xBC bytes.  This block of data is the encrypted decoy document, and it is similarly decrypted using an XOR operation against each Dword - this time using 0xBAADF00D as the xor key.

In summary then we can see that the overall high level structure of this binary data segment is:

Stage2_SC | 0xBA's | (enc) Payload | 0xBB's| (enc) Decoy | 0xBC's

Obviously this process is far too tedious for manual repetition, so I constructed a rudimentary script to seek, extract, and dump both the malware payload and decoy documents to disk.  It can be grabbed from github here.

Concluding Remarks

In summary, an important fact to note here is that while the CVE-2015-1641 vulnerability itself is new, there have been (and likely will continue to be) numerous vulnerabilities in MS Office handling of documents (in particular RTF files).  During the analysis of samples for this post, it became apparent that the second stage shellcode found in the examined CVE-2015-1641 exploit documents has being reused frequently, and simply shoe-horned into the RTF exploit du jour.

Numerous samples for CVE-2014-1761 and CVE-2012-0158 for example are using the same format of Stage2_SC + Marker + MalwarePayload + Marker + Decoy + Marker in their malicious document payloads.  As such, the script referenced above is going to work perfectly well against other RTF based exploit documents which happen to utilize this same binary data segment format.