Project Zero
Breaking the Sound Barrier Part I: Fuzzing CoreAudio with Mach Messages
Guest post by Dillon Franke, Senior Security Engineer, 20% time on Project Zero
Every second, highly-privileged MacOS system daemons accept and process hundreds of IPC messages. In some cases, these message handlers accept data from sandboxed or unprivileged processes.
In this blog post, I’ll explore using Mach IPC messages as an attack vector to find and exploit sandbox escapes. I’ll detail how I used a custom fuzzing harness, dynamic instrumentation, and plenty of debugging/static analysis to identify a high-risk type confusion vulnerability in the coreaudiod system daemon. Along the way, I’ll discuss some of the difficulties and tradeoffs I encountered.
Transparently, this was my first venture into the world of MacOS security research and building a custom fuzzing harness. I hope this post serves as a guide to those who wish to embark on similar research endeavors.
I am open-sourcing the fuzzing harness I built, as well as several tools I wrote that were useful to me throughout this project. All of this can be found here: https://github.com/googleprojectzero/p0tools/tree/master/CoreAudioFuzz
The Approach: Knowledge-Driven FuzzingFor this research project, I adopted a hybrid approach that combined fuzzing and manual reverse engineering, which I refer to as knowledge-driven fuzzing. This method, learned from my friend Ned Williamson, balances automation with targeted investigation. Fuzzing provided the means to quickly test a wide range of inputs and identify areas where the system’s behavior deviated from expectations. However, when the fuzzer’s code coverage plateaued or specific hurdles arose, manual analysis came into play, forcing me to dive deeper into the target’s inner workings.
Knowledge-driven fuzzing offers two key advantages. First, the research process never stagnates, as the goal of improving the code coverage of the fuzzer is always present. Second, achieving this goal requires a deep understanding of the code you are fuzzing. By the time you begin triaging legitimate, security-relevant crashes, the reverse engineering process will have given you extensive knowledge of the codebase, enabling analysis of crashes from an informed perspective.
The cycle I followed during this research is as follows:
- Identify an attack vector
- Choose a target
- Create a fuzzing harness
- Fuzz and produce crashes
- Analyze crashes and code coverage
- Iterate on the fuzzing harness
- Repeat steps 4-6
Standard browser sandboxing limits code execution by restricting direct operating system access. Consequently, exploiting a browser vulnerability typically requires the use of a separate “sandbox escape” vulnerability.
Since interprocess communication (IPC) mechanisms allow two processes to communicate with each other, they can naturally serve as a bridge from a sandboxed process to an unrestricted one. This makes them a prime attack vector for sandbox escapes, as shown below.
I chose Mach messages, the lowest level IPC component in the MacOS operating system, as the attack vector of focus for this research. I chose them mostly due to my desire to understand MacOS IPC mechanisms at their most core level, as well as the track record of historical security issues with Mach messages.
Previous Work and BackgroundLeveraging Mach messages in exploit chains is far from a novel idea. For example, Ian Beer identified a core design issue in 2016 with the XNU kernel related to the handling of task_t Mach ports, which allowed for exploitation via Mach messages. Another post showed how an in-the-wild exploit chain utilized Mach messages in 2019 for heap grooming techniques. I also drew much inspiration from Ret2 Systems’ blog post about leveraging Mach message handlers to find and weaponize a Safari sandbox escape.
I won’t spend too much time detailing the ins and outs of how Mach messages work, (that is better left to a more comprehensive post on the subject) but here’s a brief overview of Mach IPC for this blog post:
- Mach messages are stored within kernel-managed message queues, represented by a Mach port
- A process can fetch a message from a given port if it holds the receive right for that port
- A process can send a message to a given port if it holds a send right to that port
MacOS applications can register a service with the bootstrap server, a special mach port which all processes have a send right to by default. This allows other processes to send a Mach message to the bootstrap server inquiring about a specific service, and the bootstrap server can respond with a send right to that service’s Mach port. MacOS system daemons register Mach services via launchd. You can view their .plist files within the /System/Library/LaunchAgents and /System/Library/LaunchDaemons directories to get an idea of the services registered. For example, the .plist file below highlights a Mach service registered for the Address Book application on MacOS using the identifier com.apple.AddressBook.AssistantService.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>POSIXSpawnType</key>
<string>Adaptive</string>
<key>Label</key>
<string>com.apple.AddressBook.AssistantService</string>
<key>MachServices</key>
<dict>
<key>com.apple.AddressBook.AssistantService</key>
<true/>
</dict>
<key>ProgramArguments</key>
<array>
<string>/System/Library/Frameworks/AddressBook.framework/Versions/A/Helpers/ABAssistantService.app/Contents/MacOS/ABAssistantService</string>
</array>
</dict>
</plist>
Choose a TargetAfter deciding I wanted to research Mach services, the next question was which service to target. In order for a sandboxed process to send Mach messages to a service, it has to be explicitly allowed. If the process is using Apple’s App Sandbox feature, this is done within a .sb file, written using the TinyScheme format. The snippet below shows an excerpt of the sandbox file for a WebKit GPU Process. The allow mach-lookup directive is used to allow a sandboxed process to lookup and send Mach messages to a service.
# File: /System/Volumes/Preboot/Cryptexes/Incoming/OS/System/Library/Frameworks/WebKit.framework/Versions/A/Resources/com.apple.WebKit.GPUProcess.sb
(with-filter (system-attribute apple-internal)
(allow mach-lookup
(global-name "com.apple.analyticsd")
(global-name "com.apple.diagnosticd")))
(allow mach-lookup
(global-name "com.apple.audio.audiohald")
(global-name "com.apple.CARenderServer")
(global-name "com.apple.fonts")
(global-name "com.apple.PowerManagement.control")
(global-name "com.apple.trustd.agent")
(global-name "com.apple.logd.events"))
This helped me narrow my focus significantly from all MacOS processes, to processes with a sandbox-accessible Mach service:
In addition to inspecting the sandbox profiles, I used Jonathan Levin’s sbtool utility to test which Mach services could be interacted with for a given process. The tool (which was a bit outdated, but I was able to get it to compile) uses the builtin sandbox_exec function under the hood to provide a nice list of accessible Mach service identifiers:
❯ ./sbtool 2813 mach
com.apple.logd
com.apple.xpc.smd
com.apple.remoted
com.apple.metadata.mds
com.apple.coreduetd
com.apple.apsd
com.apple.coreservices.launchservicesd
com.apple.bsd.dirhelper
com.apple.logind
com.apple.revision
…Truncated…
Ultimately, I chose to take a look at the coreaudiod daemon, and specifically the com.apple.audio.audiohald service for the following reasons:
- It is a complex process
- It allows Mach communications from several impactful applications, including the Safari GPU process
- The Mach service had a large number of message handlers
- The service seemed to allow control and and modification of audio hardware, which would likely require elevated privileges
- The coreaudiod binary and the CoreAudio Framework it heavily uses were both closed source, which would provide a unique reverse engineering challenge
Once I chose an attack vector and target, the next step was to create a fuzzing harness capable of sending input through the attack vector (a Mach message) at a proper location within the target.
A coverage-guided fuzzer is a powerful weapon, but only if its energy is focused in the right place—like a magnifying glass concentrating sunlight to start a fire. Without proper focus, the energy dissipates, achieving little impact.
Determining an Entry PointIdeally, a fuzzer should perfectly replicate the environment and capabilities available to a potential attacker. However, this isn't always practical. Trade-offs often need to be made, such as accepting a higher rate of false positives for increased performance, simplified instrumentation, or ease of development. Therefore, identifying the “right place” to fuzz is highly dependent on the specific target and research goals.
Option 1: Interprocess FuzzingAll Mach messages are sent and received using the mach_msg API, as shown below. Therefore, I thought the most intuitive way to fuzz coreaudiod‘s Mach message handlers would be to write a fuzzing harness that called the mach_msg API and allow my fuzzer to modify the message contents to produce crashes. The approach would look something like this:
However, this approach had a large downside: since we were sending IPC messages, the fuzzing harness would be in a different process space than the target. This meant code coverage information would need to be shared across a process boundary, which is not supported by most fuzzing tools. Additionally, kernel message queue processing adds a significant performance overhead.
Option 2: Direct HarnessWhile requiring a bit more work up front, another option was to write a fuzzing harness that directly loaded and called the Mach message handlers of interest. This would have the massive advantage of putting our fuzzer and instrumentation in the same process as the message handlers, allowing us to more easily obtain code coverage.
One notable downside of this fuzzing approach is that it assumes all fuzzer-generated inputs pass the kernel’s Mach message validation layer, which in a real system occurs before a message handler gets called. As we’ll see later, this is not always the case. In my view, however, the pros of fuzzing in the same process space (speed and easy code coverage collection) outweighed the cons of a potential increase in false positives.
The approach would be as follows:
- Identify a suitable function for processing incoming mach messages
- Write a fuzzing harness to load the message handling code from coreaudiod
- Use a fuzzer to generate inputs and call the fuzzing harness
- Profit, hopefully
To start, I searched for the Mach service identifier, com.apple.audioaudiohald, but found no references to it within the coreaudiod binary. Next, I checked the libraries it loaded using otool. Logically, the CoreAudio framework seemed like a good candidate for housing the code for our message handler.
$ otool -L /usr/sbin/coreaudiod
/usr/sbin/coreaudiod:
/System/Library/PrivateFrameworks/caulk.framework/Versions/A/caulk (compatibility version 1.0.0, current version 1.0.0)
/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio (compatibility version 1.0.0, current version 1.0.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 2602.0.255)
/usr/lib/libAudioStatistics.dylib (compatibility version 1.0.0, current version 1.0.0, weak)
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 2602.0.255)
/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1700.255.5)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.120.2)
However, I was surprised to find that the path returned by otool did not exist!
$ stat /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio
stat: /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio: stat: No such file or directory
The Dyld Shared CacheA bit of research showed me that as of MacOS Big Sur, most framework binaries are not stored on disk but within the dyld shared cache, a mechanism for pre-linking libraries to allow applications to run faster. Thankfully, IDA Pro, Binary Ninja, and Ghidra support parsing the dyld shared cache to obtain the libraries stored within. I also used this helpful tool to successfully extract libraries for additional analysis.
Once I had the CoreAudio Framework within IDA, I quickly found a call to bootstrap_check_in with the service identifier passed as an argument, proving the CoreAudio framework binary was responsible for setting up the Mach service I wanted to fuzz. However, it still wasn’t obvious where the message handling code was happening, despite quite a bit of reverse engineering.
It turns out this is due to the use of the Mach Interface Generator, (MIG) an Interface Definition Language from Apple that makes it easier to write RPC clients and servers by abstracting away much of the Mach layer. When compiled, MIG message handling code gets bundled into a structure called a subsystem. One can easily grep for these subsystems to find their offsets:
$ nm -m ./System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio | grep -i subsystem
(undefined) external _CACentralStateDumpRegisterSubsystem (from AudioToolboxCore)
00007ff840470138 (__DATA_CONST,__const) non-external _HALC_HALB_MIGClient_subsystem
00007ff840470270 (__DATA_CONST,__const) non-external _HALS_HALB_MIGServer_subsystem
Next, I searched in IDA for cross-references to the _HALS_HALB_MIGServer_subsystem symbol, which identified the MIG server function that parsed incoming Mach messages! The routine is shown below, with the first parameter (the rdi register) being the incoming Mach message and the second (the rsi register) being the message to return to the client. The MIG server function extracted the msgh_id parameter from the Mach message and used that to index into the MIG subsystem. Then, the necessary function handler was called.
I further confirmed this by setting an LLDB breakpoint on the coreaudiod process (after disabling SIP) for the _HALB_MIGServer_server function. Then, I adjusted the volume on my system, and the breakpoint was hit:
In this example, tracing the message handler called from the MIG subsystem showed the _XObject_HasProperty function was called based on the Mach message’s msgh_id.
Depending on the msgh_id, a few dozen message handlers were accessible from the MIG subsystem. They are easily identifiable by the convenient __X prefix to their function names added by MIG.
The _HALB_MIGServer_server function struck a great balance between getting close to low-level message handling code while still resembling the inputs that a call to mach_msg would take. I decided this was the place to inject fuzz input into.
Creating a Basic Fuzzing HarnessAfter identifying the function I wanted to fuzz, the next step was to write a program to read a file and deliver the file’s contents as input to the target function. This might have been as easy as linking the CoreAudio library with my fuzzing harness and calling the _HALB_MIGServer_server function, but unfortunately the function was not exported.
Instead, I borrowed some logic from Ivan Fratric and his TinyInst tool (we’ll be talking about it a lot more later) which returns a provided symbol’s address from a library. The code parses the structure of Mach-O binaries, specifically their headers and load commands, to locate and extract symbol information. This made it possible to resolve and call the target function in my fuzzing harness, even when it wasn’t exported.
So, the high level function of my harness was as follows:
- Load the CoreAudio Library
- Get a function pointer for the target function from the CoreAudio Library
- Read an input from a file
- Call the target function with the input
The full implementation of my fuzzing harness can be found here. An example of invoking the harness to send a message from an input file is shown below:
$ ./harness -f corpora/basic/1 -v
*******NEW MESSAGE*******
Message ID: 1010000 (XSystem_Open)
------ MACH MSG HEADER ------
msg_bits: 2319532353
msg_size: 56
msg_remote_port: 1094795585
msg_local_port: 1094795585
msg_voucher_port: 1094795585
msg_id: 1010000
------ MACH MSG BODY (32 bytes) ------
0x01 0x00 0x00 0x00 0x03 0x30 0x00 0x00 0x41 0x41 0x41 0x41 0x41 0x41 0x11 0x00 0x41 0x41 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
------ MACH MSG TRAILER ------
msg_trailer_type: 0
msg_trailer_size: 32
msg_seqno: 0
msg_sender: 0
------ MACH MSG TRAILER BODY (32 bytes) ------
0xf5 0x01 0x00 0x00 0xf5 0x01 0x00 0x00 0x14 0x00 0x00 0x00 0xf5 0x01 0x00 0x00 0x14 0x00 0x00 0x00 0x7e 0x02 0x00 0x00 0xa3 0x86 0x01 0x00 0x4f 0x06 0x00 0x00
Processing function result: 1
*******RETURN MESSAGE*******
------ MACH MSG HEADER ------
msg_bits: 1
msg_size: 36
msg_remote_port: 1094795585
msg_local_port: 0
msg_voucher_port: 0
msg_id: 1010100
------ MACH MSG BODY (12 bytes) ------
0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Harvesting Legitimate Mach MessagesI now had a way to deliver data directly into the MIG subsystem (_HALB_MIGServer_server) I wanted to fuzz. However, I had no idea the specific message size, options, or data the handler was expecting. While a coverage-guided fuzzer will begin to uncover the proper message format over time, it is advantageous to obtain a seed corpus of legitimate inputs when first beginning to fuzz to improve efficiency.
To do this, I used LLDB to set a breakpoint on the MIG subsystem and dump the first argument (containing the incoming Mach message). Then, I played around with the operating system to cause Mach messages to be sent to coreaudiod. The Audio MIDI Setup MacOS application ended up being great for this, as it allows one to create, edit, and delete audio devices.
Fuzz and Produce CrashesArmed with a small seed corpus and an input delivery mechanism, the next step was to configure a fuzzer to use the created fuzzing harness and obtain code coverage. I used the excellent Jackalope fuzzer built and maintained by Ivan Fratric. I chose Jackalope primarily for its high level of customizability—it allows easy implementation of custom mutators, instrumentation, and sample delivery. Additionally, I appreciated its seamless usage on macOS, particularly its code coverage capabilities powered by TinyInst. In contrast, I tried and failed to collect code coverage using Frida against system daemons on macOS.
I used the following command to start a Jackalope fuzzing run:
$ jackalope -in in/ -out out/ -delivery file -instrument_module CoreAudio -target_module harness -target_method _fuzz -nargs 1 -iterations 1000 -persist -loop -dump_coverage -cmp_coverage -generate_unwind -nthreads 5 -- ./harness -f @@
Iterate on the Fuzzing HarnessThis harness quickly generated many crashes, a sign I was on the right track. However, I quickly learned that initial crashes are often not indicative of a security bug, but of a design bug in the fuzzing harness itself or an invalid assumption.
Iteration 1: Target InitializationOne of the difficulties with my fuzzing approach was that my target function (the Mach message handler) expected the HAL system to be in a specific state to begin receiving Mach messages. By simply calling the library function with my fuzzing harness, these assumptions were broken.
This caused errors to start popping up. As shown in the diagram below, the harness bypassed much of the bootstrapping functionality the coreaudiod process would normally take care of during startup.
Code coverage, as well as error messages, can be very helpful in helping determine some of the initialization steps a fuzzing harness is neglecting. For example, I noticed my data flow would always fail early in most Mach message handlers, logging the message Error: there is no system.
It turns out I needed to initialize the HAL System before I could interact correctly with the Mach APIs. In my case, calling the _AudioHardwareStartServer function in my fuzzing harness took care of most of the necessary initialization.
Iteration 2: API Call ChainingMy first crack at a fuzzing harness was cool, but it made a pretty large assumption: all accessible Mach message handlers functioned independently of each other. As I quickly learned, this assumption was incorrect. As I ran the fuzzer, error messages like the following one started popping up:
The error seemed to indicate the SetPropertyData Mach handler was expecting a client to be registered via a previous Mach message. Clearly, the Mach handlers I was fuzzing were stateful and depended on each other to function properly. My fuzzing harness would need to take this into consideration in order to have any hope of obtaining good code coverage on the target.
This highlights a common problem in the fuzzing world: most coverage-guided fuzzers accept a single input, (a bunch of bytes) while many things we want to fuzz accept data in a completely different format, such as several arguments of different types, or even several function calls. This Google writeup explains the problem well, as does Ned Williamson’s OffensiveCon Talk from 2019.
To get around this limitation, we can use a technique I refer to as API Call Chaining, which considers each fuzz input as a stream that can be read from to craft multiple valid inputs. Thus, each fuzzing iteration would be capable of generating multiple Mach messages. This simple but important insight allows a fuzzer to explore the interdependency of separate function calls using the same code-coverage informed input.
The FuzzedDataProvider class, which is part of LibFuzzer but can be included as a header for use with any fuzzing harness, is a great choice for consuming a fuzz sample and transforming it into a more meaningful data type. Consider the following pseudocode:
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
FuzzedDataProvider fuzz_data(data, size); // Initialize FDP
while (fuzz_data.remaining_bytes() >= MACH_MSG_MIN_SIZE) { // Continue until we've consumed all bytes
uint32_t msg_id = fuzz_data.ConsumeIntegralInRange<uint32_t>(1010000, 1010062);
switch (msg_id) {
case '1010000': {
send_XSystem_Open_msg(fuzz_data);
}
case '1010001': {
send_XSystem_Close_msg(fuzz_data);
}
case '1010002': {
send_XSystem_GetObjectInfo_msg(fuzz_data);
}
... continued
}
}
}
This code transforms a blob of bytes into a mechanism that can repeatedly call APIs with fuzz data in a deterministic manner. What’s more, a coverage-guided fuzzer will be able to explore and identify a series of API calls that improves code coverage. From the fuzzer’s perspective, it is simply modifying an array of bytes, blissfully unaware of the additional complexity happening under the hood.
For example, my fuzzer quickly identified that most interactions with the audiohald service required a call to the _XSystem_Open message handler to register a client before most APIs could be called. The inputs the fuzzer saved to its corpus naturally reflected this fact over time.
Iteration 3: Mocking Out Buggy/Unneeded FunctionalitySometimes coverage plateaus, and a fuzzer struggles to explore new code paths. For example, say we’re fuzzing an HTTP server and it keeps getting stuck because it’s trying to read and parse configuration files on startup. If our focus was on the server’s request parsing and response logic, we might choose to mock out the functionality we don’t care about in order to focus the fuzzer’s code coverage exploration elsewhere.
In my fuzzing harness’ case, calling the initialization routines was causing my harness to try to register the com.apple.audio.audiohald Mach service with the bootstrap server, which was throwing an error because it was already registered by launchd. Since my harness didn’t need to register the Mach service in order to inject messages, (remember, our harness calls the MIG subsystem directly) I decided to mock out the functionality.
When dealing with pure C functions, function interposing can be used to easily modify a function’s behavior. In the example below, I declare a new version of the bootstrap_check_in function that just says returns KERN_SUCCESS, effectively nopping it out while telling the caller that it was successful.
#include <mach/mach.h>
#include <stdarg.h>
// Forward declaration for bootstrap_check_in
kern_return_t bootstrap_check_in(mach_port_t bootstrap_port, const char *service_name, mach_port_t *service_port);
// Custom implementation of bootstrap_check_in
kern_return_t custom_bootstrap_check_in(mach_port_t bootstrap_port, const char *service_name, mach_port_t *service_port) {
// Ensure service_port is non-null and set it to a non-zero value
if (service_port) {
*service_port = 1; // Set to a non-zero value
}
return KERN_SUCCESS; // Return 0 (KERN_SUCCESS)
}
// Interposing array for bootstrap_check_in
__attribute__((used)) static struct {
const void* replacement;
const void* replacee;
} interposers[] __attribute__((section("__DATA,__interpose"))) = {
{ (const void *)custom_bootstrap_check_in, (const void *)bootstrap_check_in }
};
In the case of C++ functions, I used TinyInst’s Hook API to modify problematic functionality. In one specific scenario, my fuzzer was crashing the target constantly because the CFRelease function was being called with a NULL pointer. Some further analysis told me that this was a non-security relevant bug where a user’s input, which was assumed to contain a valid plist object, was not properly validated. If the plist object was invalid or NULL, a downstream function call would contain NULL, and an abort would occur.
So, I wrote the following TinyInst hook, which checked whether the plist object passed into the function was NULL. If so, my hook returned the function call early, bypassing the buggy code.
void HALSWriteSettingHook::OnFunctionEntered() {
printf("HALS_SettingsManager::_WriteSetting Entered\n");
if (!GetRegister(RDX)) {
printf("NULL plist passed as argument, returning to prevent NULL CFRelease\n");
printf("Current $RSP: %p\n", GetRegister(RSP));
void *return_address;
RemoteRead((void*)GetRegister(RSP), &return_address, sizeof(void *));
printf("Current return address: %p\n", GetReturnAddress());
printf("Current $RIP: %p\n", GetRegister(RIP));
SetRegister(RAX, 0);
SetRegister(RIP, GetReturnAddress());
printf("$RIP register is now: %p\n", GetRegister(ARCH_PC));
SetRegister(RSP, GetRegister(RSP) + 8); // Simulate a ret instruction
printf("$RSP is now: %p\n", GetRegister(RSP));
}
}
Next, I modified Jackalope to use my instrumentation using the CreateInstrumentation API. That way, my hook was applied during each fuzzing iteration, and the annoying NULL CFRelease calls stopped happening. The output below shows the hook preventing a crash from a NULL plist object passed the troublesome API:
Instrumented module CoreAudio, code size: 7516156
Hooking function __ZN11HALS_System13_WriteSettingEP11HALS_ClientPK10__CFStringPKv in module CoreAudio
HALS_SettingsManager::_WriteSetting Entered
NULL plist passed as argument, returning to prevent NULL CFRelease
Current $RSP: 0x7ff7bf83b358
Current return address: 0x7ff8451e7430
Current $RIP: 0x7ff84533a675
$RIP register is now: 0x7ff8451e7430
$RSP is now: 0x7ff7bf83b360
Total execs: 6230
Unique samples: 184 (0 discarded)
Crashes: 3 (2 unique)
Hangs: 0
Offsets: 13550
Execs/s: 134
The code to reproduce and build this fuzzer with custom instrumentation can be found here: https://github.com/googleprojectzero/p0tools/tree/master/CoreAudioFuzz/jackalope-modifications
Iteration 4: Improving Sample StructureThe great thing about a fuzzing-centric auditing technique is that it highlights knowledge gaps in the code you are auditing. As you address these gaps, you gain a deeper understanding of the structure and constraints of the inputs that your fuzzing harness should generate. These insights enable you to refine your harness to produce more targeted inputs, effectively penetrating deeper code paths and improving overall code coverage. The following subsections highlight examples of how I identified and implemented opportunities to iterate on my fuzzing harness, significantly enhancing its efficiency and effectiveness.
Message Handler Syntax ChecksCode coverage results from fuzzing runs are incredibly telling. I noticed that after running my fuzzer for a few days, it was having trouble exploring past the beginning of most of the Mach message handlers. One simple example is shown below, (explored basic blocks are highlighted in blue) where several comparisons were not being passed , causing the function to error out early on. Here, the rdi register is the incoming Mach message we sent to the handler.
The comparisons were checking that the Mach message was well formatted, with a message length set to 0x34 and various options set within the message. If it wasn’t, it was discarded.
With this in mind, I modified my fuzzing harness to set the fields in the Mach messages I sent to the _XIOContext_SetClientControlPort handler such that they passed these conditions. The fuzzer could modify other pieces of the message as it pleased, but since these aspects needed to conform to strict guidelines, I simply hardcoded them.
These small modifications were the beginning of an input structure I was building for my target. The efficiency of my fuzzing improved astronomically after adding these guidelines to the fuzzer - my code coverage increased by 2000% shortly thereafter.
Out-of-Line (OOL) Message DataI noticed my fuzzing setup started generating tons of crashes from a call to mig_deallocate, which frees a given address. At first, I thought I had found an interesting bug, since I could control the address passed to mig_deallocate:
I quickly learned, however, that Mach messages can contain various types of Out-of-line (OOL) data. This allows a client to allocate a memory region and place a pointer to it within the Mach message, which will be processed and, in some cases, freed by the message handler. When sending a Mach message with the mach_msg API, the XNU kernel will validate that the memory pointed to by OOL descriptors is properly owned and accessible by the client process.
I hadn’t found a vulnerability; my fuzzing harness was simply attached to the target at a point downstream which bypassed the normal memory checks that would have been performed by the kernel. To remedy this, I modified my fuzzing harness to support allocating space for OOL data and passing the valid memory address within the Mach messages I fuzzed.
The VulnerabilityAfter many fuzzing harness iterations, lldb “next instruction” commands, and hours spent overheating my MacBook Pro, I had finally begun to acquire an understanding of the CoreAudio framework and generate some meaningful crashes.
But first, some background knowledge.
The Hardware Abstraction Layer (HAL)The com.apple.audio.audiohald Mach service exposes an interface known as the Hardware Abstraction Layer (HAL). The HAL allows clients to interact with audio devices, plugins, and settings on the operating system, represented in the coreaudiod process as C++ objects of type HALS_Object.
In order to interact with the HAL, a client must first register itself. There are a few ways to do this, but the simplest is using the _XSystem_Open Mach API. Calling this API will invoke the HALS_System::AddClient method, which uses the Mach message’s audit token to create a client (clnt) HALS_Object to map subsequent requests to that client. The code block below shows an IDA decompilation snippet of the creation of a clnt object.
v85[0] = v5 != 0;
v28 = v83[0];
v29 = 'clnt';
HALS_Object::HALS_Object((HALS_Object *)v13, 'clnt', 0, (__int64)v83[0], v30);
*(_QWORD *)v13 = &unk_7FF850E56640;
*(_OWORD *)(v13 + 72) = 0LL;
*(_OWORD *)(v13 + 88) = 0LL;
*(_DWORD *)(v13 + 104) = 1065353216;
Stepping into the HALS_Object constructor, a mutex is acquired before getting the next available object ID before making a call to HALS_ObjectMap::MapObject.
void __fastcall HALS_Object::HALS_Object(HALS_Object *this, _BOOL4 a2, unsigned int a3, __int64 a4, HALS_Object *a5)
{
unsigned int v5; // r12d
HALB_Mutex::Locker *v6; // r15
unsigned int v7; // ebx
HALS_Object *v8; // rdx
int v9; // eax
v5 = a3;
*(_QWORD *)this = &unk_7FF850E7C200;
*((_DWORD *)this + 2) = 0;
*((_DWORD *)this + 3) = HALB_MachPort::CreatePort(0LL, a2, a3);
*((_WORD *)this + 8) = 257;
*((_WORD *)this + 10) = 1;
pthread_once(&HALS_ObjectMap::sObjectInfoListInitialized, HALS_ObjectMap::Initialize);
v6 = HALS_ObjectMap::sObjectInfoListMutex;
HALB_Mutex::Lock(HALS_ObjectMap::sObjectInfoListMutex);
v7 = (unsigned int)HALS_ObjectMap::sNextObjectID;
LODWORD(HALS_ObjectMap::sNextObjectID) = (_DWORD)HALS_ObjectMap::sNextObjectID + 1;
HALB_Mutex::Locker::~Locker(v6);
*((_DWORD *)this + 6) = v7;
*((_DWORD *)this + 7) = a2;
if ( !v5 )
v5 = a2;
*((_DWORD *)this + 8) = v5;
if ( a4 )
v9 = *(_DWORD *)(a4 + 24);
else
v9 = 0;
*((_DWORD *)this + 9) = v9;
*((_QWORD *)this + 5) = &stru_7FF850E86420;
*((_BYTE *)this + 48) = 0;
*((_DWORD *)this + 13) = 0;
HALS_ObjectMap::MapObject((HALS_ObjectMap *)v7, (__int64)this, v8);
}
The HALS_ObjectMap::MapObject function adds the freshly allocated object to a linked list stored on the heap. I wrote a program using the TinyInst Hook API that iterates through each object in the list and dumps its raw contents:
To modify an existing HALS_Object, most of the HAL Mach message handlers use the HALS_ObjectMap::CopyObjectByObjectID function, which accepts an integer ID (parsed from the Mach message’s body) for a given HALS_Object, which it then looks up in the Object Map and returns a pointer to the object.
For example, here’s a small snippet of the _XSystem_GetObjectInfo Mach message handler, which calls the HALS_ObjectMap::CopyObjectByObjectID function before accessing information about the object and returning it.
HALS_Client::EvaluateSandboxAllowsMicAccess(v5);
v7 = (HALS_ObjectMap *)HALS_ObjectMap::CopyObjectByObjectID((HALS_ObjectMap *)v3);
v8 = v7;
if ( !v7 )
{
v13 = __cxa_allocate_exception(0x10uLL);
*(_QWORD *)v13 = &unk_7FF850E85518;
v13[2] = 560947818;
__cxa_throw(v13, (struct type_info *)&`typeinfo for'CAException, CAException::~CAException);
}
An Intriguing CrashWhenever my fuzzer produced a crash, I always took the time to fully understand the crash’s root cause. Often, the crashes were not security relevant, (i.e. a NULL dereference) but fully understanding the reason behind the crash helped me understand the target better and invalid assumptions I was making with my fuzzing harness. Eventually, when I did identify security relevant crashes, I had a good understanding of the context surrounding them.
The first indication from my fuzzer that a vulnerability might exist was a memory access violation during an indirect call instruction, where the target address was calculated using an index into the rax register. As shown in the following backtrace, the crash occurred shallowly within the _XIOContext_Fetch_Workgroup_Port Mach message handler.
Further investigating the context of the crash in IDA, I noticed that the rax register triggering the invalid memory access was directly derived from a call to the HALS_ObjectMap::CopyObjectByObjectID function.
Specifically, it attempted the following:
- Fetch a HALS_Object from the Object Map based on an ID provided in the Mach message
- Dereference the address a1 at offset 0x68 of the HALS_Object
- Dereference the address a2 at offset 0x0 of a1
- Call the function pointer at offset 0x168 of a2
The operations leading to the crash indicated that at offset 0x68 of the HALS_Object it fetched, the code expected a pointer to an object with a vtable. The code would then look up a function within the vtable, which would presumably retrieve the object’s “workgroup port.”
When the fetched object was of type ioct, (IOContext) everything functioned as normal. However, the test input my fuzzer generated was causing the function to fetch a HALS_Object of a different type, which led to an invalid function call. The following diagram shows how an attacker able to influence the pointer at offset 0x68 of a HALS_Object might hijack control flow.
This vulnerability class is referred to as a type confusion, where the vulnerable code makes the assumption that a retrieved object or struct is a specific type, but it is possible to provide a different one. The object’s memory layout might be completely different, meaning memory accesses and vtable lookups might occur in the wrong place, or even out of bounds. Type confusion vulnerabilities can be extremely powerful due to their ability to form reliable exploits.
Affected FunctionsThe _XIOContext_Fetch_Workgroup_Port Mach message handler wasn’t the only function that assumed it was dealing with an ioct object without checking the type. The table below shows several other message handlers that suffered from the same issue:
Mach Message Handler
Affected Routine
_XIOContext_Fetch_Workgroup_Port
_XIOContext_Fetch_Workgroup_Port
_XIOContext_Start
___ZNK14HALS_IOContext22HasEnabledInputStreamsEv_block_invoke
_XIOContext_StartAtTime
___ZNK14HALS_IOContext16GetNumberStreamsEb_block_invoke
_XIOContext_Start_With_WorkInterval
___ZNK14HALS_IOContext22HasEnabledInputStreamsEv_block_invoke
_XIOContext_SetClientControlPort
_XIOContext_SetClientControlPort
_XIOContext_Stop
_XIOContext_Stop
Apple did perform proper type checking on some of the Mach message handlers. For example, the _XIOContent_PauseIO message handler, shown below, calls a function that checks whether the fetched object is of type ioct before using it. It is not clear why these checks were implemented in certain areas, but not others.
The impact of this vulnerability can range from an information leak to control flow hijacking. In this case, since the vulnerable code is performing a function call, an attacker could potentially control the data at the offset read during the type confusion, allowing them to control the function pointer and redirect execution. Alternatively, if the attacker can provide an object smaller than 0x68 bytes, an out-of-bounds read would be possible, paving the way for further exploitation opportunities such as memory corruption or arbitrary code execution.
Creating a Proof of ConceptBecause my fuzzing harness was connected downstream in the Mach message handling process, it was important to build an end-to-end proof-of-concept that used the mach_msg API to send a Mach message to the vulnerable message handler within coreaudiod. Otherwise, we might have triggered a false positive as we did in the case of the mig_deallocate crash where we thought we had a bug, but were actually just bypassing security checks.
In this case, however, the bug was triggerable using the mach_msg API, making it a legitimate opportunity for use as a sandbox escape. The proof-of-concept code I put together for triggering this issue on MacOS Sequoia 15.0.1 can be found here.
It’s worth noting that code running on Apple Silicon uses Pointer Authentication Codes (PACs) , which could make exploitation more difficult. In order to exploit this bug through an invalid vtable call, an attacker would need the ability to sign pointers, which would be possible if the attacker gained native code execution in an Apple-signed process. However, I only analyzed and tested this issue on x86-64 versions of MacOS.
How Apple Fixed the IssueI reported this type confusion vulnerability to Apple on October 9, 2024. It was fixed on December 11, 2024, assigned CVE-2024-54529, and a patch was introduced in MacOS Sequoia 15.2, Sonoma 14.7.2, and Ventura 13.7.2. Interestingly, Apple mentions that the vulnerability allowed for code execution with kernel privileges. That part interested me, since as far as I could tell the execution was only possible as the _coreaudiod group, which was not equivalent to kernel privileges.
Apple’s fix was simple: since each HALS Object contains information about its type, the patch adds a check within the affected functions to ensure the fetched object is of type ioct before dereferencing the object and performing a function call.
You might have noticed how the offset derefenced within the HALS Object is 0x70 in the updated version, but was 0x68 in the vulnerable version. Often, such struct modifications are not security relevant, but will differ based on other bug fixes or added features.
RecommendationsTo prevent similar type confusion vulnerabilities in the future, Apple should consider modifying the CopyObjectByObjectID function (or any others that make assumptions about an object’s type) to include a type check. This could be achieved by passing the expected object type as an argument and verifying the type of the fetched object before returning it. This approach is similar to how deserialization functions often include a template parameter to ensure type safety.
ConclusionThis blog post described my journey into the world of MacOS vulnerability research and fuzzing. I hope I have shown how a knowledge-driven fuzzing approach can allow rapid prototyping and iteration, a deep understanding of the target, and high impact bugs.
In my next post, I will perform a detailed walkthrough of my experience attempting to exploit CVE-2024-54529.
The Windows Registry Adventure #6: Kernel-mode objects
Posted by Mateusz Jurczyk, Google Project Zero
Welcome back to the Windows Registry Adventure! In the previous installment of the series, we took a deep look into the internals of the regf hive format. Understanding this foundational aspect of the registry is crucial, as it illuminates the design principles behind the mechanism, as well as its inherent strengths and weaknesses. The data stored within the regf file represents the definitive state of the hive. Knowing how to parse this data is sufficient for handling static files encoded in this format, such as when writing a custom regf parser to inspect hives extracted from a hard drive. However, for those interested in how regf files are managed by Windows at runtime, rather than just their behavior in isolation, there's a whole other dimension to explore: the multitude of kernel-mode objects allocated and maintained throughout the lifecycle of an active hive. These auxiliary objects are essential for several reasons:
- To track all currently loaded hives, their properties (e.g., load flags), their memory mappings, and the relationships between them (especially for delta hives overlaid on top of each other).
- To synchronize access to keys and hives within the multithreaded Windows environment.
- To cache hive information for faster access compared to direct memory mapping lookups.
- To integrate the registry with the NT Object Manager and support standard operations (opening/closing handles, setting/querying security descriptors, enforcing access checks, etc.).
- To manage the state of pending transactions before they are fully committed to the underlying hive.
To address these diverse requirements, the Windows kernel employs numerous interconnected structures. In this post, we will examine some of the most critical ones, how they function, and how they can be effectively enumerated and inspected using WinDbg. It's important to note that Microsoft provides official definitions only for some registry-related structures through PDB symbols for ntoskrnl.exe. In many cases, I had to reverse-engineer the relevant code to recover structure layouts, as well as infer the types and names of particular fields and enums. Throughout this write-up, I will clearly indicate whether each structure definition is official or reverse-engineered. If you spot any inaccuracies, please let me know. The definitions presented here are primarily derived from Windows Server 2019 with the March 2022 patches (kernel build 10.0.17763.2686), which was the kernel version used for the majority of my registry code analysis. However, over 99% of registry structure definitions appear to be identical between this version and the latest Windows 11, making the information directly applicable to the latest systems as well.
Hive structuresGiven that hives are the most intricate type of registry object, it's not surprising that their kernel-mode descriptors are equally complex and lengthy. The primary hive descriptor structure in Windows, known as _CMHIVE, spans a substantial 0x12F8 bytes – exceeding 4 KiB, the standard memory page size on x86-family architectures. Contained within _CMHIVE, at offset 0, is another structure of type _HHIVE, which occupies 0x600 bytes, as depicted in the diagram below:
This relationship mirrors that of other common Windows object pairs, such as _EPROCESS / _KPROCESS and _ETHREAD / _KTHREAD. Because _HHIVE is always allocated as a component of the larger _CMHIVE structure, their pointer types are effectively interchangeable. If you encounter a decompiled access using a _HHIVE* pointer that extends beyond the size of the structure, it almost certainly indicates a reference to a field within the encompassing _CMHIVE object.
But why are two distinct structures dedicated to representing a single registry hive? While technically not required, this separation likely serves to delineate fields associated with different abstraction layers of the hive. Specifically:
- _HHIVE manages the low-level aspects of the hive, including the hive header, bins, and cells, as well as in-memory mappings and synchronization state with its on-disk counterpart (e.g., dirty sectors).
- _CMHIVE handles more abstract information about the hive, such as the cache of security descriptors, pointers to high-level kernel objects like the root Key Control Block (KCB), and the associated transaction resource manager (_CM_RM structure).
The next subsections will provide a deeper look into the responsibilities and inner workings of these two structures.
_HHIVE structure overviewThe primary role of the _HHIVE structure is to manage the memory-related state of a hive. This allows higher-level registry code to perform operations such as allocating, freeing, and marking cells as "dirty" without needing to handle the low-level implementation details. The _HHIVE structure comprises 49 top-level members, most of which will be described in larger groups below:
0: kd> dt _HHIVE
nt!_HHIVE
+0x000 Signature : Uint4B
+0x008 GetCellRoutine : Ptr64 _CELL_DATA*
+0x010 ReleaseCellRoutine : Ptr64 void
+0x018 Allocate : Ptr64 void*
+0x020 Free : Ptr64 void
+0x028 FileWrite : Ptr64 long
+0x030 FileRead : Ptr64 long
+0x038 HiveLoadFailure : Ptr64 Void
+0x040 BaseBlock : Ptr64 _HBASE_BLOCK
+0x048 FlusherLock : _CMSI_RW_LOCK
+0x050 WriterLock : _CMSI_RW_LOCK
+0x058 DirtyVector : _RTL_BITMAP
+0x068 DirtyCount : Uint4B
+0x06c DirtyAlloc : Uint4B
+0x070 UnreconciledVector : _RTL_BITMAP
+0x080 UnreconciledCount : Uint4B
+0x084 BaseBlockAlloc : Uint4B
+0x088 Cluster : Uint4B
+0x08c Flat : Pos 0, 1 Bit
+0x08c ReadOnly : Pos 1, 1 Bit
+0x08c Reserved : Pos 2, 6 Bits
+0x08d DirtyFlag : UChar
+0x090 HvBinHeadersUse : Uint4B
+0x094 HvFreeCellsUse : Uint4B
+0x098 HvUsedCellsUse : Uint4B
+0x09c CmUsedCellsUse : Uint4B
+0x0a0 HiveFlags : Uint4B
+0x0a4 CurrentLog : Uint4B
+0x0a8 CurrentLogSequence : Uint4B
+0x0ac CurrentLogMinimumSequence : Uint4B
+0x0b0 CurrentLogOffset : Uint4B
+0x0b4 MinimumLogSequence : Uint4B
+0x0b8 LogFileSizeCap : Uint4B
+0x0bc LogDataPresent : [2] UChar
+0x0be PrimaryFileValid : UChar
+0x0bf BaseBlockDirty : UChar
+0x0c0 LastLogSwapTime : _LARGE_INTEGER
+0x0c8 FirstLogFile : Pos 0, 3 Bits
+0x0c8 SecondLogFile : Pos 3, 3 Bits
+0x0c8 HeaderRecovered : Pos 6, 1 Bit
+0x0c8 LegacyRecoveryIndicated : Pos 7, 1 Bit
+0x0c8 RecoveryInformationReserved : Pos 8, 8 Bits
+0x0c8 RecoveryInformation : Uint2B
+0x0ca LogEntriesRecovered : [2] UChar
+0x0cc RefreshCount : Uint4B
+0x0d0 StorageTypeCount : Uint4B
+0x0d4 Version : Uint4B
+0x0d8 ViewMap : _HVP_VIEW_MAP
+0x110 Storage : [2] _DUAL
SignatureEqual to 0xBEE0BEE0, it is a unique signature of the _HHIVE / _CMHIVE structures. It may be useful in digital forensics for identifying these structures in raw memory dumps, and is yet another reference to bees in the Windows registry implementation.
Function pointersNext up, there are six function pointers, initialized in HvHiveStartFileBacked and HvHiveStartMemoryBacked, and pointing at internal kernel handlers for the following operations:
Pointer name
Pointer value
Operation
GetCellRoutine
HvpGetCellPaged or HvpGetCellFlat
Translate cell index to virtual address
ReleaseCellRoutine
HvpReleaseCellPaged or HvpReleaseCellFlat
Release previously translated cell index
Allocate
CmpAllocate
Allocate kernel memory within global registry quota
Free
CmpFree
Free kernel memory within global registry quota
FileWrite
CmpFileWrite
Write data to hive file
FileRead
CmpFileRead
Read data from hive file
As we can see, these functions provide the basic functionality of operating on kernel memory, cell indexes, and the hive file. In my opinion, the most important of them is GetCellRoutine, whose typical destination, HvpGetCellPaged, performs the cell map walk in order to translate a cell index into the corresponding address within the hive mapping.
It is natural to think that these function pointers could prove useful for exploitation if an attacker managed to corrupt them through a buffer overflow or a use-after-free condition. That was indeed the case in Windows 10 and earlier, but in Windows 11, these calls are now de-virtualized, and most call sites reference one of HvpGetCellPaged / HvpGetCellFlat and HvpReleaseCellPaged / HvpReleaseCellFlat directly, without referring to the pointers. This is great for security, as it completely eliminates the usefulness of those fields in any offensive scenarios.
Here's an example of a GetCellRoutine call in Windows 10, disassembled in IDA Pro:
And the same call in Windows 11:
Hive load failure informationThis is a pointer to a public _HIVE_LOAD_FAILURE structure, which is passed as the first argument to the SetFailureLocation function every time an error occurs while loading a hive. It can be helpful in tracking which validity checks have failed for a given hive, without having to trace the entire loading process.
Base blockA pointer to a copy of the hive header, represented by the _HBASE_BLOCK structure.
Synchronization locksThere are two locks with the following purpose:
- FlusherLock – synchronizes access to the hive between clients changing data inside cells and the flusher thread;
- WriterLock – synchronizes access to the hive between writers that modify the bin/cell layout.
They are officially of type _CMSI_RW_LOCK, but they boil down to _EX_PUSH_LOCK, and they are used with standard kernel APIs such as ExAcquirePushLockSharedEx.
Dirty blocks informationBetween offsets 0x58 and 0x84, _HHIVE stores several data structures representing the state of synchronization between the in-memory and on-disk instances of the hive.
Hive flagsFirst of all, there are two flags at offset 0x8C that indicate if the hive mapping is flat and if the hive is read-only. Secondly, there is a 32-bit HiveFlags member that stores further flags which aren't (as far as I know) included in any public Windows symbols. I have managed to reverse-engineer and infer the meaning of the constants I have observed, resulting in the following enum:
enum _HV_HIVE_FLAGS
{
HIVE_VOLATILE = 0x1,
HIVE_NOLAZYFLUSH = 0x2,
HIVE_PRELOADED = 0x10,
HIVE_IS_UNLOADING = 0x20,
HIVE_COMPLETE_UNLOAD_STARTED = 0x40,
HIVE_ALL_REFS_DROPPED = 0x80,
HIVE_ON_PRELOADED_LIST = 0x400,
HIVE_FILE_READ_ONLY = 0x8000,
HIVE_SECTION_BACKED = 0x20000,
HIVE_DIFFERENCING = 0x80000,
HIVE_IMMUTABLE = 0x100000,
HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL = 0x800000,
};
Below is a one-liner explanation of each flag:
- HIVE_VOLATILE: the hive exists in memory only; set, e.g., for \Registry and \Registry\Machine\HARDWARE.
- HIVE_NOLAZYFLUSH: changes to the hive aren't automatically flushed to disk and require a manual flush; set, e.g., for \Registry\Machine\SAM.
- HIVE_PRELOADED: the hive is one of the default, system ones; set, e.g., for \Registry\Machine\SOFTWARE, \Registry\Machine\SYSTEM, etc.
- HIVE_IS_UNLOADING: the hive is currently being loaded or unloaded in another thread and shouldn't be accessed before the operation is complete.
- HIVE_COMPLETE_UNLOAD_STARTED: the unloading process of the hive has started in CmpCompleteUnloadKey.
- HIVE_ALL_REFS_DROPPED: all references to the hive through KCBs have been dropped.
- HIVE_ON_PRELOADED_LIST: the hive is linked into a linked-list via the PreloadedHiveList field.
- HIVE_FILE_READ_ONLY: the underlying hive file is read-only and shouldn't be modified; indicates that the hive was loaded with the REG_OPEN_READ_ONLY flag set.
- HIVE_SECTION_BACKED: the hive is mapped in memory using section views.
- HIVE_DIFFERENCING: the hive is a differencing one (version 1.6, loaded under \Registry\WC).
- HIVE_IMMUTABLE: the hive is immutable and cannot be modified; indicates that it was loaded with the REG_IMMUTABLE flag set.
- HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL: the kernel always maintains a local copy of every page of the hive, either by locking it in physical memory or creating a private copy through the CoW mechanism.
Between offsets 0xA4 to 0xCC, there are a number of fields having to do with log file management, i.e. the .LOG1/.LOG2 files accompanying the main hive file on disk.
Hive versionThe Version field stores the minor version of the hive, which should theoretically be an integer between 3–6. However, as mentioned in the previous blog post, it is possible to set it to an arbitrary 32-bit value either by specifying a major version equal to 0 and any desired minor version, or by enticing the kernel to recover the hive header from a log file, and abusing the fact that the HvAnalyzeLogFiles function is more permissive than HvpGetHiveHeader. Nevertheless, I haven't found any security implications of this behavior.
View mapThe view map holds all the essential information about how the hive is mapped in memory. The specific implementation of registry memory management has evolved considerably over the years, with its details changing between consecutive system versions. In the latest ones, the view map is represented by the top-level _HVP_VIEW_MAP public structure:
0: kd> dt _HVP_VIEW_MAP
nt!_HVP_VIEW_MAP
+0x000 SectionReference : Ptr64 Void
+0x008 StorageEndFileOffset : Int8B
+0x010 SectionEndFileOffset : Int8B
+0x018 ProcessTuple : Ptr64 _CMSI_PROCESS_TUPLE
+0x020 Flags : Uint4B
+0x028 ViewTree : _RTL_RB_TREE
The semantics of its respective fields are as follows:
- SectionReference: Contains a kernel-mode handle to a section object corresponding to the hive file, created via ZwCreateSection in CmSiCreateSectionForFile.
- StorageEndFileOffset: Stores the maximum size of the hive that can be represented with file-backed sections at any given time. Initially set to the size of the loaded hive, it can dynamically increase or decrease at runtime for mutable (normal) hives.
- SectionEndFileOffset: Represents the size of the hive file section at the time of loading. It is never modified past the first initialization in HvpViewMapStart, and seems to be mostly used as a safeguard against extending an immutable hive file beyond its original size.
- ProcessTuple: A structure of type _CMSI_PROCESS_TUPLE, it identifies the host process of the hive's section views. This field currently always points to the global CmpRegistryProcess object, which corresponds to the dedicated "Registry" process that hosts all hive mappings in the system. However, this field could enable a more fine-grained separation of hive mappings across multiple processes, should Microsoft choose to implement such a feature.
- Flags: Represents a set of memory management flags relevant to the entire hive. These flags are not publicly documented; however, through reverse engineering, I have determined their purpose to be as follows:
- VIEW_MAP_HIVE_FILE_IMMUTABLE (0x1): Indicates that the hive has been loaded as immutable, meaning no data is ever saved back to the underlying hive file.
- VIEW_MAP_MUST_BE_KEPT_LOCAL (0x2): Indicates that all of the hive data must be persistently stored in memory, and not just accessible through file-backed sections. This is likely to protect against double-fetch conditions involving hives loaded from remote network shares.
- VIEW_MAP_CONTAINS_LOCKED_PAGES (0x4): Indicates that some of the hive's pages are currently locked in physical memory using ZwLockVirtualMemory.
- ViewTree: This is the root of a view tree structure, which contains the descriptors of each continuous section view mapped in memory.
Overall, the implementation of low-level hive memory management in Windows is more complex than might initially seem necessary. This complexity arises from the kernel's need to gracefully handle a variety of corner cases and interactions. For example, hives may be loaded as immutable, which indicates that the hive may be operated on in memory, but changes must not be flushed to disk. Simultaneously, the system must support recovering data from .LOG files, including the possibility of extending the hive beyond its original on-disk length. At runtime, it must also be possible to efficiently modify the registry data, as well as shrink and extend it on demand. To further complicate matters, Windows enforces different rules for locking hive pages in memory depending on the backing volume of the file, carefully balancing optimal memory usage and system security guarantees. These and many other factors collectively contribute to the complexity of hive memory management.
To better understand how the view tree is organized, let's first analyze the general logic of the hive mapping code.
The hive mapping logicThe main kernel function responsible for mapping a hive in memory is HvLoadHive. It implements the overall logic and coordinates various sub-routines responsible for performing more specialized tasks, in the following order:
- Header Validation: The kernel reads and inspects the hive's header to ascertain its integrity, ensuring that the hive has not been tampered with or corrupted. Relevant function: HvpGetHiveHeader.
- Log Analysis: The kernel processes the hive's transaction logs, scrutinising them to identify any pending changes or inconsistencies that necessitate recovery procedures. Relevant function: HvAnalyzeLogFiles.
- Initial Section Mapping: A section object is created based on the hive file, and further segmented into multiple views, each aligned to 4 KiB boundaries and capped at 2 MiB. At this point, the kernel prioritizes the creation of an initial mapping without focusing on the granular layout of individual bins within the hive. Relevant function: HvpViewMapStart.
- Cell Map Initialization: The cell map, a component that translates cell indexes to memory address, is initialized. Its entries are configured to point to the newly created views. Relevant function: HvpMapHiveImageFromViewMap.
- Log Recovery (if required): If the preceding log analysis reveals the need for data recovery, the kernel attempts to restore data integrity. This is the earliest point at which the newly created memory mappings may already be modified and marked as "dirty", indicating that their contents have been altered and require synchronisation with the on-disk representation. Relevant function: HvpPerformLogFileRecovery.
- Bin Mapping: In this final stage, the kernel establishes definitive memory mappings for each bin within the hive, ensuring that each bin occupies a contiguous region of memory. This process may necessitate creating new views, eliminating existing ones, or adjusting their boundaries to accommodate the specific arrangement of bins. Relevant function: HvpRemapAndEnlistHiveBins.
Now that we understand the primary components of the loading process, we can examine the internal structure of the section view tree in more detail.
The view treeLet's consider an example hive consisting of three bins of sizes 256 KiB, 2 MiB and 128 KiB, respectively. After step 3 ("Initial Section Mapping"), the section views created by the kernel are as follows:
As we can see, at this point, the kernel doesn't concern itself with bin boundaries or continuity: all it needs to achieve is to make every page of the hive accessible through a section view for log recovery purposes. In simple terms, the way that HvpViewMapStart (or more specifically, HvpViewMapCreateViewsForRegion) works is it creates as many 2 MiB views as necessary, followed by one last view that covers the remaining part of the file. So in our example, we have the first view that covers bin 1 and the beginning of bin 2, and the second view that covers the trailing part of bin 2 and the entire bin 3. It's important to note that memory continuity is only guaranteed within the scope of a single view, and views 1 and 2 may be mapped at completely different locations in the virtual address space.
Later in step 6, the system ensures that every bin is mapped as a contiguous block of memory before handing off the hive to the client. This is done by iterating through all the bins, and for every bin that spans more than one view in the current view map, the following operations are performed:
- If the start and/or the end of the bin fall into the middle of existing views, these views are truncated from either side. Furthermore, if there are any views that are fully covered by the bin, they are freed and removed from the tree.
- A new, dedicated section view is created for the bin and inserted into the view tree.
In our hypothetical scenario, the resulting view layout would be as follows:
As we can see, the kernel shrinks views 1 and 2, and creates a new view 3 corresponding to bin 2 to fill the gap. The final layout of the binary tree of section view descriptors is illustrated below:
Knowing this, we can finally examine the structure of a single view tree entry. It is not included in the public symbols, but I named it _HVP_VIEW. My reverse-engineered version of its definition is as follows:
struct _HVP_VIEW
{
RTL_BALANCED_NODE Node;
LARGE_INTEGER ViewStartOffset;
LARGE_INTEGER ViewEndOffset;
SSIZE_T ValidStartOffset;
SSIZE_T ValidEndOffset;
PBYTE MappingAddress;
SIZE_T LockedPageCount;
_HVP_VIEW_PAGE_FLAGS PageFlags[];
};
The role of each particular field is documented below:
- Node: This is the structure used to link all of the entries into a single red-black tree, passed to helper kernel functions such as RtlRbInsertNodeEx and RtlRbRemoveNode.
- ViewStartOffset and ViewEndOffset: This offset pair specifies the overall byte range covered by the underlying section view object in the hive file. Their difference corresponds to the cumulative length of the red and green boxes in a single row in the diagrams above.
- ValidStartOffset and ValidEndOffset: This offset pair specifies the valid range of the hive accessible through this view, i.e. the green rectangles in the diagrams. It must always be a subset of the [ViewStartOffset, ViewEndOffset] range, and may dynamically change while re-mapping bins (as just shown in this section), as well as when shrinking and extending the hive.
- MappingAddress: This is the base address of the section view mapping in memory, as returned by ZwMapViewOfSection. It is valid in the context of the process specified by _HVP_VIEW_MAP.ProcessTuple (currently always the "Registry" process). It covers the entire range between [ViewStartOffset, ViewEndOffset], but only pages between [ValidStartOffset, ValidEndOffset] are accessible, and the rest of the section view is marked as PAGE_NOACCESS.
- LockedPageCount: Specifies the number of pages locked in virtual memory using ZwLockVirtualMemory within this view.
- PageFlags: A variable-length array that specifies a set of flags for each memory page in the [ViewStartOffset, ViewEndOffset] range.
I haven't found any (un)official sources documenting the set of supported page flags, so below is my attempt to name them and explain their meaning:
Flag
Value
Description
VIEW_PAGE_VALID
0x1
Indicates if the page is valid – true for pages between [ValidStartOffset, ValidEndOffset], false otherwise. If this flag is clear, all other flags are irrelevant/unused.
The flag is set:
- When creating section views during hive loading, first the initial ones in HvpViewMapStart, and then the bin-specific ones in HvpRemapAndEnlistHiveBins.
- When extending an active hive in HvpViewMapExtendStorage.
The flag is cleared:
- When trimming the existing views in HvpRemapAndEnlistHiveBins to make room for new ones.
- When shrinking the hive in HvpViewMapShrinkStorage.
VIEW_PAGE_COW_BY_CALLER
0x2
Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as initiated by a client action, e.g. a registry operation that modified data in a cell and thus resulted in marking the page as dirty.
The flag is set:
- When dirtying a hive cell, in HvpViewMapMakeViewRangeCOWByCaller.
The flag is cleared:
- When flushing the registry changes to disk, in HvpViewMapMakeViewRangeUnCOWByCaller.
VIEW_PAGE_COW_BY_POLICY
0x4
Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as required by the policy that all pages of non-local hives (hives loaded from volumes other than the system volume) must always remain in memory.
The flag is set:
- In HvpViewMapMakeViewRangeValid, as an alternative way of keeping a local copy of the hive pages in memory (if locking fails, or the caller doesn't want the pages locked).
- In HvpViewMapMakeViewRangeCOWByCaller, when converting previously locked pages to the "CoW by policy" state.
- In HvpMappedViewConvertRegionFromLockedToCOWByPolicy, when lazily converting previously locked pages to the "CoW by policy" state in a thread that runs every 60 seconds (as indicated by CmpLazyLocalizeIntervalInSeconds).
The flag is cleared:
- In HvpViewMapMakeViewRangeUnCOWByPolicy, which currently only ever seems to happen for hives loaded from the system volume, i.e. "\SystemRoot" and "\OSDataRoot", as listed in the global CmpWellKnownVolumeList array.
VIEW_PAGE_WRITABLE
0x8
Indicates if the page is currently marked as writable, typically as a result of a modifying operation on the page that hasn't been yet flushed to disk.
The flag is set:
- In HvpViewMapMakeViewRangeCOWByCaller, when marking a cell as dirty.
The flag is cleared:
- In HvpViewMapMakeViewRangeUnCOWByCaller, when flushing the hive changes to disk.
- In HvpViewMapSealRange, when setting the memory as read-only for miscellaneous reasons (after performing log file recovery, etc.).
VIEW_PAGE_LOCKED
0x10
Indicates if the page is currently locked in physical memory.
The flag is set:
- In HvpViewMapMakeViewRangeValid if the caller requests page locking, and there is enough space left in the 64 MiB working set of the Registry process. In practice, this boils down to locking the initial 2 MiB hive mappings created in HvpViewMapStart for all app hives and for normal hives outside of the system disk volume.
The flag is cleared:
- Whenever the state of the page changes to CoW-by-policy or Invalid in the following functions:
- HvpViewMapMakeViewRangeCOWByCaller
- HvpMappedViewConvertRegionFromLockedToCOWByPolicy
- HvpViewMapMakeViewRangeUnCOWByPolicy
- HvpViewMapMakeViewRangeInvalid
The semantics of most of the flags are straightforward, but perhaps VIEW_PAGE_COW_BY_POLICY and VIEW_PAGE_LOCKED warrant a slightly longer explanation. The two flags are mutually exclusive, and they represent nearly identical ways to achieve the same goal: ensure that a copy of each hive page remains resident in memory or a pagefile. Under normal circumstances, the kernel could simply create the necessary section views in their default form, and let the memory management subsystem decide how to handle their pages most efficiently. However, one of the guarantees of the registry is that once a hive has been loaded, it must remain operational for as long as it is active in the system. On the other hand, section views have the property that (parts of) their underlying data may be completely evicted by the kernel, and later re-read from the original storage medium such as the hard drive. So, it is possible to imagine a situation where:
- A hive is loaded from a removable drive (e.g. a CD-ROM or flash drive) or a network share,
- Due to high memory pressure from other applications, some of the hive pages are evicted from memory,
- The removable drive with the hive file is ejected from the system,
- A client subsequently tries to operate on the hive, but parts of it are unavailable and cannot be fetched again from the original source.
This could cause some significant problems and make the registry code fail in unexpected ways. It would also constitute a security vulnerability: the kernel assumes that once it has opened and sanitized the hive file, its contents remain consistent for as long as the hive is used. This is achieved by opening the file with exclusive access, but if the hive data was ever re-read by the Windows memory manager, a malicious removable drive or an attacker-controlled network share could ignore the exclusivity request and provide different, invalid data on the second read. This would result in a kind of "double fetch" condition and potentially lead to kernel memory corruption.
To address both the reliability and security concerns, Windows makes sure to never evict pages corresponding to hives for which exclusive access cannot be guaranteed. This covers hives loaded from a location other than the system volume, and since Windows 10 19H1, also all app hives regardless of the file location. The first way to achieve this is by locking the pages directly in physical memory with a ZwLockVirtualMemory call. It is used for the initial ≤ 2 MiB section views created while loading a hive, up to the working set limit of the Registry process currently set at 64 MiB. The second way is by taking advantage of the copy-on-write mechanism – that is, marking the relevant pages as PAGE_WRITECOPY and subsequently touching each of them using the HvpViewMapTouchPages helper function. This causes the memory manager to create a private copy of each memory page containing the same data as the original, thus preventing them from ever being unavailable for registry operations.
Between the two types of resident pages, the CoW type effectively becomes the default option in the long term. Eventually most pages converge to this state, even if they initially start as locked. This is because locked pages transition to CoW on multiple occasions, e.g. when converted by the background CmpDoLocalizeNextHive thread that runs every 60 seconds, or during the modification of a cell. On the other hand, once a page transitions to the CoW state, it never reverts to being locked. A diagram illustrating the transitions between the page residence states in a hive loaded from removable/remote storage is shown below:
For normal hives loaded from the system volume (i.e. without the VIEW_MAP_MUST_BE_KEPT_LOCAL flag set), the state machine is much simpler:
As a side note, CVE-2024-43452 was an interesting bug that exploited a flaw in the page residency protection logic. The bug arose because some data wasn't guaranteed to be resident in memory and could be fetched twice from a remote SMB share during bin mapping. This occurred early in the hive loading process, before page residency protections were fully in place. The kernel trusted the data from the second read without re-validation, allowing it to be maliciously set to invalid values, resulting in kernel memory corruption.
Cell mapsAs discussed in Part 5, almost every cell contains references to other cells in the hive in the form of cell indexes. Consequently, virtually every registry operation involves multiple rounds of translating cell indexes into their corresponding virtual addresses in order to traverse the registry structure. Section views are stored in a red-black tree, so the search complexity is O(log n). This may seem decent, but if we consider that on a typical system, the registry is read much more often than it is extended/shrunk, it becomes apparent that it makes sense to further optimize the search operation at the cost of a less efficient insertion/deletion. And this is exactly what cell maps are: a way of trading a faster search complexity of O(1) for slower insertion/deletion complexity of O(n) instead of O(log n). Thanks to this technique, HvpGetCellPaged – perhaps the hottest function in the Windows registry implementation – executes in constant time.
In technical terms, cell maps are pagetable-like structures that divide the 32-bit hive address space into smaller, nested layers consisting of so-called directories, tables, and entries. As a reminder, the layout of cell indexes and cell maps is illustrated in the diagram below, based on a similar diagram in the Windows Internals book, which itself draws from Mark Russinovich's 1999 article, Inside the Registry:
Given the nature of the data structure, the corresponding cell map walk involves dereferencing three nested arrays based on the subsequent 1, 10 and 9-bit parts of the cell index, and then adding the final 12-bit offset to the page-aligned address of the target block. The internal kernel structures matching the respective layers of the cell map are _DUAL, _HMAP_DIRECTORY, _HMAP_TABLE and _HMAP_ENTRY, all publicly accessible via the ntoskrnl.exe PDB symbols. The entry point to the cell map is the Storage array at the end of the _HHIVE structure:
0: kd> dt _HHIVE
nt!_HHIVE
[...]
+0x118 Storage : [2] _DUAL
The index into the two-element array represents the storage type, 0 for stable and 1 for volatile, so a single _DUAL structure describes a 2 GiB view of a specific storage space:
0: kd> dt _DUAL
nt!_DUAL
+0x000 Length : Uint4B
+0x008 Map : Ptr64 _HMAP_DIRECTORY
+0x010 SmallDir : Ptr64 _HMAP_TABLE
+0x018 Guard : Uint4B
+0x020 FreeDisplay : [24] _FREE_DISPLAY
+0x260 FreeBins : _LIST_ENTRY
+0x270 FreeSummary : Uint4B
Let's examine the semantics of each field:
- Length: Expresses the current length of the given storage space in bytes. Directly after loading the hive, the stable length is equal to the size of the hive on disk (including any data recovered from log files, minus the 4096 bytes of the header), and the volatile space is empty by definition. Only cell map entries within the [0, Length - 1] range are guaranteed to be valid.
- Map: Points to the actual directory structure represented by _HMAP_DIRECTORY.
- SmallDir: Part of the "small dir" optimization, discussed in the next section.
- Guard: Its specific role is unclear, as the field is always initialized to 0xFFFFFFFF upon allocation and never used afterwards. I expect that it is some kind of debugging remnant from the early days of the registry development, presumably related to the small dir optimization.
- FreeDisplay: A data structure used to optimize searches for free cells during the cell allocation process. It consists of 24 buckets, each corresponding to a specific cell size range and represented by the _FREE_DISPLAY structure, indicating which pages in the hive may potentially contain free cells of the given length.
- FreeBins: The head of a doubly-linked list that links the descriptors of entirely empty bins in the hive, represented by the _FREE_HBIN structures.
- FreeSummary: A bitmask indicating which buckets within FreeDisplay have any hints set for the given cell size. A zero bit at a given position means that there are no free cells of the specific size range anywhere in the hive.
The next level in the cell map hierarchy is the _HMAP_DIRECTORY structure:
0: kd> dt _HMAP_DIRECTORY
nt!_HMAP_DIRECTORY
+0x000 Directory : [1024] Ptr64 _HMAP_TABLE
As we can see, it is simply a 1024-element array of pointers to _HMAP_TABLE:
0: kd> dt _HMAP_TABLE
nt!_HMAP_TABLE
+0x000 Table : [512] _HMAP_ENTRY
Further, we get a 512-element array of pointers to the final level of the cell map, _HMAP_ENTRY:
0: kd> dt _HMAP_ENTRY
nt!_HMAP_ENTRY
+0x000 BlockOffset : Uint8B
+0x008 PermanentBinAddress : Uint8B
+0x010 MemAlloc : Uint4B
This last level contains a descriptor of a single page in the hive and warrants a deeper analysis. Let's start by noting that the four least significant bits of PermanentBinAddress correspond to a set of undocumented flags that control various aspects of the page behavior. I was able to reverse-engineer them and partially recover their names, largely thanks to the fact that some older Windows 10 builds contained non-inlined functions operating on these flags, with revealing names like HvpMapEntryIsDiscardable or HvpMapEntryIsTrimmed:
enum _MAP_ENTRY_FLAGS
{
MAP_ENTRY_NEW_ALLOC = 0x1,
MAP_ENTRY_DISCARDABLE = 0x2,
MAP_ENTRY_TRIMMED = 0x4,
MAP_ENTRY_DUMMY = 0x8,
};
Here's a brief summary of their meaning based on my understanding:
- MAP_ENTRY_NEW_ALLOC: Indicates that this is the first page of a bin. Cell indexes pointing into this page must specify an offset within the range of [0x20, 0xFFF], as they cannot fall into the first 32 bytes that correspond to the _HBIN structure.
- MAP_ENTRY_DISCARDABLE: Indicates that the whole bin is empty and consists of a single free cell.
- MAP_ENTRY_TRIMMED: Indicates that the page has been marked as "trimmed" in HvTrimHive. More specifically, this property is related to hive reorganization, and is set during the loading process on some number of trailing pages that only contain keys accessed during boot, or not accessed at all since the last reorganization. The overarching goal is likely to prevent introducing unnecessary fragmentation in the hive by avoiding mixing together keys with different access histories.
- MAP_ENTRY_DUMMY: Indicates that the page is allocated from the kernel pool and isn't part of a section view.
With this in mind, let's dive into the details of each _HMAP_ENTRY structure member:
- PermanentBinAddress: The lower 4 bits contain the above flags. The upper 60 bits represent the base address of the bin mapping corresponding to this page.
- BlockOffset: This field has a dual functionality. If the MAP_ENTRY_DISCARDABLE flag is set, it is a pointer to a descriptor of a free bin, _FREE_HBIN, linked into the _DUAL.FreeBins linked list. If it is clear (the typical case), it expresses the offset of the page relative to the start of the bin. Therefore, the virtual address of the block's data in memory can be calculated as (PermanentBinAddress & (~0xF)) + BlockOffset.
- MemAlloc: If the MAP_ENTRY_NEW_ALLOC flag is set, it contains the size of the bin, otherwise it is zero.
And this concludes the description of how cell maps are structured. Taking all of it into account, the implementation of the HvpGetCellPaged function starts to make a lot of sense. Its pseudocode comes down to the following:
_CELL_DATA *HvpGetCellPaged(_HHIVE *Hive, HCELL_INDEX Index) {
_HMAP_ENTRY *Entry = &Hive->Storage[Index >> 31].Map
->Directory[(Index >> 21) & 0x3FF]
->Table[(Index >> 12) & 0x1FF];
return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + (Index & 0xFFF) + 4;
}
The same process is followed, for example, by the implementation of the WinDbg !reg cellindex extension, which also translates a pair of a hive pointer and a cell index into the virtual address of the cell.
The small dir optimizationThere is one other implementation detail about the cell maps worth mentioning here – the small dir optimization. Let's start with the observation that a majority of registry hives in Windows are relatively small, below 2 MiB in size. This can be easily verified by using the !reg hivelist command in WinDbg, and taking note of the values in the "Stable Length" and "Volatile Length" columns. Most of them usually contain values between several kilobytes to hundreds of kilobytes. This would mean that if the kernel allocated the full first-level directory for these hives (taking up 1024 entries × 8 bytes = 8 KiB on 64-bit platforms), they would still only use the first element in it, leading to a non-trivial waste of memory – especially in the context of the early 1990's when the registry was first implemented. In order to optimize this common scenario, Windows developers employed an unconventional approach to simulate a 1-item long "array" with the SmallDir member of type _HMAP_TABLE in the _DUAL structure, and have the _DUAL.Map pointer point at it instead of a separate pool allocation when possible. Later, whenever the hive grows and requires more than one element of the cell map directory, the kernel falls back to the standard behavior and performs a normal pool allocation for the directory array.
A revised diagram illustrating the cell map layout of a small hive is shown below:
Here, we can see that indexes 1 through 1023 of the directory array are invalid. Instead of correctly initialized _HMAP_TABLE structures, they point into "random" data corresponding to other members of the _DUAL and the larger _CMHIVE structure that happen to be located after _DUAL.SmallDir. Ordinarily, this is merely a low-level detail that doesn't have any meaningful implications, as all actively loaded hives remain internally consistent and always contain cell indexes that remain within the bounds of the hive's storage space. However, if we look at it through the security lens of hive-based memory corruption, this behavior suddenly becomes very interesting. If an attacker was able to implant an out-of-bounds cell index with the directory index greater than 0 into a hive, they would be able to get the kernel to operate on invalid (but deterministic) data as part of the cell map walk, and enable a powerful arbitrary read/write primitive. In addition to the small dir optimization, this technique is also enabled by the fact that the HvpGetCellPaged routine doesn't perform any bounds checks of the cell indexes, instead blindly trusting that they are always valid.
If you are curious to learn more about the exploitation aspect of out-of-bounds cell indexes, it was the main subject of my Practical Exploitation of Registry Vulnerabilities in the Windows Kernel talk given at OffensiveCon 2024 (slides and video recording are available). I will also discuss it in more detail in one of the future blog posts focused specifically on the security impact of registry vulnerabilities.
_CMHIVE structure overviewBeyond the first member of type _HHIVE at offset 0, the _CMHIVE structure contains more than 3 KiB of further information describing an active hive. This data relates to concepts more abstract than memory management, such as the registry tree structure itself. Below, instead of a field-by-field analysis, we'll focus on the general categories of information within _CMHIVE, organized loosely by increasing complexity of the data structures:
- Reference count: a 32-bit refcount primarily used during short-term operations on the hive, to prevent the object from being freed while actively operated on. These are used by the thin wrappers CmpReferenceHive and CmpDereferenceHive.
- File handles and sizes: handles and current sizes of the hive files on disk, such as the main hive file (.DAT) and the accompanying log files (.LOG, .LOG1, .LOG2). The handles are stored in FileHandles array, and the sizes reside in ActualFileSize and LogFileSizes.
- Text strings: some informational strings that may prove useful when trying to identify a hive based on its _CMHIVE structure. For example, the hive file name is stored in FileUserName, and the hive mount point path is stored in HiveRootPath.
- Timestamps: there are several timestamps that can be found in the hive descriptor, such as DirtyTime, UnreconciledTime or LastWriteTime.
- List entries: instances of the _LIST_ENTRY structure used to link the hive into various double-linked lists, such as the global list of hives in the system (HiveList, starting at nt!CmpHiveListHead), or the list of hives within a common trust class (TrustClassEntry).
- Synchronization mechanisms: various objects used to synchronize access to the hive as a whole, or some of its parts. Examples include HiveRundown, SecurityLock and HandleClosePendingEvent.
- Unload history: a 128-element array that stores the number of steps that have been successfully completed in the process of unloading the hive. Its specific purpose is unclear, it might be a debugging artifact retained from older versions of Windows.
- Late unload state: objects related to deferred unloading of registry hives (LateUnloadWorkItemState, LateUnloadFinishedEvent, LateUnloadWorkItem).
- Hive layout information: the hive reorganization process in Windows tries to optimize hives by grouping together keys accessed during system runtime, followed by keys accessed during system boot, followed by completely unused keys. If a hive is structured according to this order during load, the kernel saves information about the boundaries between the three distinct areas in the BootStart, UnaccessedStart and UnaccessedEnd members of _CMHIVE.
- Flushing state and dirty block information: any state that has to do with marking cells as dirty and synchronizing their contents to disk. There are a significant number of fields related to the functionality, with names starting with "Flush...", "Unreconciled..." and "CapturedUnreconciled...".
- Volume context: a pointer to a public _CMP_VOLUME_CONTEXT structure, which provides extended information about the disk volume of the hive file. As an example, it is used in the internal CmpVolumeContextMustHiveFilePagesBeKeptLocal routine to determine whether the volume is a system one, and consequently whether certain security/reliability assumptions are guaranteed for it or not.
- KCB table and root KCB: a table of the globally visible KCB (Key Control Block) structures corresponding to keys in the hive, and a pointer to the root key's KCB. I will discuss KCBs in more detail in the "Key structures" section below.
- Security descriptor cache: a cache of all security descriptors present in the hive, allocated from the kernel pool and thus accessible more efficiently than the underlying hive mappings. In my bug reports, I have often taken advantage of the security cache as a straightforward way to demonstrate the exploitability of security descriptor use-after-frees. A security node UAF can be easily converted into an UAF of its pool-based cached object, which then reliably triggers a Blue Screen of Death when Special Pool is enabled. The security cache of any given hive can be enumerated using the !reg seccache command in WinDbg.
- Transaction-related objects: a pointer to a _CM_RM structure that describes the Resource Manager object associated with the hive, if "heavyweight" transactions (i.e. KTM transactions) are enabled for it.
Last but not least, _CMHIVE has its own Flags field that is different from _HHIVE.Flags. As usual, the flags are not documented, so the listing below is a product of my own analysis:
enum _CM_HIVE_FLAGS
{
CM_HIVE_UNTRUSTED = 0x1,
CM_HIVE_IN_SID_MAPPING_TABLE = 0x2,
CM_HIVE_HAS_RM = 0x8,
CM_HIVE_IS_VIRTUALIZABLE = 0x10,
CM_HIVE_APP_HIVE = 0x20,
CM_HIVE_PROCESS_PRIVATE = 0x40,
CM_HIVE_MUST_BE_REORGANIZED = 0x400,
CM_HIVE_DIFFERENCING_WRITETHROUGH = 0x2000,
CM_HIVE_CLOUDFILTER_PROTECTED = 0x10000,
};
A brief description of each of them is as follows:
- CM_HIVE_UNTRUSTED: the hive is "untrusted" in the sense of registry symbolic links; in other words, it is not one of the default system hives loaded on boot. The distinction is that trusted hives can freely link to all other hives in the system, while untrusted ones can only link to hives within their so-called trust class. This is to prevent confused deputy-style privilege escalation attacks in the system.
- CM_HIVE_IN_SID_MAPPING_TABLE: the hive is linked into an internal data structure called the "SID mapping table" (nt!CmpSIDToHiveMapping), used to efficiently look up the user class hives mounted at \Registry\User\<SID>_Classes for the purposes of registry virtualization.
- CM_HIVE_HAS_RM: KTM transactions are enabled for this hive, meaning that the corresponding .blf and .regtrans-ms files are present in the same directory as the main hive file. The flag is clear if the hive is an app hive or if it was loaded with the REG_HIVE_NO_RM flag set.
- CM_HIVE_IS_VIRTUALIZABLE: accesses to this hive may be subject to registry virtualization. As far as I know, the only hive with this flag set is currently HKLM\SOFTWARE, which seems in line with the official documentation.
- CM_HIVE_APP_HIVE: this is an app hive, i.e. it was loaded under \Registry\A with the REG_APP_HIVE flag set.
- CM_HIVE_PROCESS_PRIVATE: this hive is private to the loading process, i.e. it was loaded with the REG_PROCESS_PRIVATE flag set.
- CM_HIVE_MUST_BE_REORGANIZED: the hive fragmentation threshold (by default 1 MiB) has been exceeded, and the hive should undergo the reorganization process at the next opportunity. The flag is simply a means of communication between the CmCheckRegistry and CmpReorganizeHive internal routines, both of which execute during hive loading.
- CM_HIVE_DIFFERENCING_WRITETHROUGH: this is a delta hive loaded in the writethrough mode, which technically means that the DIFF_HIVE_WRITETHROUGH flag was specified in the DiffHiveFlags member of the VRP_LOAD_DIFFERENCING_HIVE_INPUT structure, as discussed in Part 4.
- CM_HIVE_CLOUDFILTER_PROTECTED: new flag added in December 2024 as part of the fix for CVE-2024-49114. It indicates that the hive file has been protected against being converted to a Cloud Filter placeholder by setting the "$Kernel.CFDoNotConvert" extended attribute (EA) on the file in CmpAdjustFileCFSafety.
This concludes the documentation of the hive descriptor structure, arguably the largest and most complex object in the Windows registry implementation.
Key structuresThe second most important objects in the registry are keys. They can be basically thought of as the essence of the registry, as nearly every registry operation involves them in some way. They are also the one and only registry element that is tightly integrated with the Windows NT Object Manager. This comes with many benefits, as client applications can operate on the registry using standardized handles, and can leverage automatic security checks and object lifetime management. However, this integration also presents its own challenges, as it requires the Configuration Manager to interact with the Object Manager correctly and handle its intricacies and edge cases securely. For this reason, internal key-related structures play a crucial role in the registry implementation. They help organize key state in a way that simplifies keeping it up-to-date and internally consistent. For security researchers, understanding these structures and their semantics is invaluable. This knowledge enables you to quickly identify bugs in existing code or uncover missing handling of unusual but realistic conditions.
The two fundamental key structures in the Windows kernel are the key body (_CM_KEY_BODY) and key control block (_CM_KEY_CONTROL_BLOCK). The key body is directly associated with a key handle in the NT Object Manager, similar to the role that the _FILE_OBJECT structure plays for file handles. In other words, this is the initial object that the kernel obtains whenever it calls ObReferenceObjectByHandle to reference a user-supplied handle. There may concurrently exist a number of key body structures associated with a single key, as long as there are several programs holding active handles to the key. Conversely, the key control block represents the global state of a specific key and is used to manage its general properties. This means that for most keys in the system, there is at most one KCB allocated at a time. There may be no KCB for keys that haven't been accessed yet (as they are initialized by the kernel lazily), and there may be more than one KCB for the same registry path if the key has been deleted and created again (these two instances of the key are treated as separate entities, with one of them being marked as deleted/non-existent). Taking this into account, the relationship between key bodies and KCBs is many-to-one, with all of the key bodies of a single KCB being connected in a doubly-linked list, as shown in the diagram below:
The following subsections provide more detail about each of these two structures.
Key bodyThe key body structure is allocated and initialized in the internal CmpCreateKeyBody routine, and freed by the NT Object Manager when all references to the object are dropped. It is a relatively short and simple object with the following definition:
0: kd> dt _CM_KEY_BODY
nt!_CM_KEY_BODY
+0x000 Type : Uint4B
+0x004 AccessCheckedLayerHeight : Uint2B
+0x008 KeyControlBlock : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x010 NotifyBlock : Ptr64 _CM_NOTIFY_BLOCK
+0x018 ProcessID : Ptr64 Void
+0x020 KeyBodyList : _LIST_ENTRY
+0x030 Flags : Pos 0, 16 Bits
+0x030 HandleTags : Pos 16, 16 Bits
+0x038 Trans : _CM_TRANS_PTR
+0x040 KtmUow : Ptr64 _GUID
+0x048 ContextListHead : _LIST_ENTRY
+0x058 EnumerationResumeContext : Ptr64 Void
+0x060 RestrictedAccessMask : Uint4B
+0x064 LastSearchedIndex : Uint4B
+0x068 LockedMemoryMdls : Ptr64 Void
Let's quickly go over each field:
- Type: for normal keys (i.e. almost all of them), this field is set to a magic value of 0x6B793032 ('ky02'). However, for predefined keys, this is the 32-bit value of the link's target key with the highest bit set. This member is therefore used to distinguish between regular keys and predefined ones, for example in CmObReferenceObjectByHandle. Predefined keys have been now largely deprecated, but it is still possible to observe a non-standard Type value by opening a handle to one of the two last remaining ones: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009 and CurrentLanguage under the same path.
- AccessCheckedLayerHeight: a new field added in November 2023 as part of the fix for CVE-2023-36404. It is used for layered keys and contains the index of the lowest layer in the key stack that was access-checked when opening the key. It is later taken into account during other registry operations, in order to avoid leaking data from lower-layer, more restrictive keys that could have been created since the handle was opened.
- KeyControlBlock: a pointer to the corresponding key control block.
- NotifyBlock: an optional pointer to the notify block associated with this handle. This is related to the key notification functionality in Windows and is described in more detail in the "Key notification structures" section below.
- ProcessID: the PID of the process that created the handle. It doesn't seem to serve any purpose in the kernel other than to be enumerable using the NtQueryOpenSubKeysEx system call (which requires SeRestorePrivilege, and is therefore available to administrators only).
- KeyBodyList: the list entry used to link all the key bodies within a single KCB together.
- Flags: a set of flags concerning the specific key body. Here's my interpretation of them based on reverse engineering:
- KEY_BODY_HIVE_UNLOADED (0x1): indicates that the underlying hive of the key has been unloaded and is no longer active.
- KEY_BODY_DONT_RELOCK (0x2): this seems to be a short-term flag used to communicate between CmpCheckKeyBodyAccess/CmpCheckOpenAccessOnKeyBody and the nested CmpDoQueryKeyName routine, in order to indicate that the key's KCB is already locked and shouldn't be relocked again.
- KEY_BODY_DONT_DEINIT (0x4): if this flag is set, CmpDeleteKeyObject returns early and doesn't proceed with the regular deinitialization of the key body object. However, it is unclear if/where the flag is set in the code, as I personally haven't found any instances of it happening during my analysis.
- KEY_BODY_DELETED (0x8): indicates that the key has been deleted since the handle was opened, and it no longer exists.
- KEY_BODY_DONT_VIRTUALIZE (0x10): indicates that registry virtualization is disabled for this handle, as a result of opening the key with the (undocumented but present in SDK headers) REG_OPTION_DONT_VIRTUALIZE flag.
- HandleTags: from the kernel perspective, this is simply a general purpose 16-bit storage that can be set by clients on a per-handle basis using NtSetInformationKey with the KeySetHandleTagsInformation information class, and queried with NtQueryKey and the KeyHandleTagsInformation information class. As far as I know, the kernel doesn't dictate how this field should be used and leaves it up to the registry clients. In practice, it seems to be mostly used for purposes related to WOW64 and the Registry Redirector, storing flags such as KEY_WOW64_64KEY (0x100) and KEY_WOW64_32KEY (0x200), as well as some internal ones. The WOW64 functionality is implemented in KernelBase.dll, and functions such as ConstructKernelKeyPath and LocalBaseRegOpenKey are a good starting point for reverse engineering, if you're curious to learn more. I have also observed the 0x1000 handle tag being set in the internal IopApplyMutableTagToRegistryKey kernel routine for keys such as HKLM\System\ControlSet001\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}\0000, but I'm unsure of its meaning.
- Trans: Indicates the transactional state of the handle. If the handle is not transacted (i.e. it wasn't opened with one of RegOpenKeyTransacted or RegCreateKeyTransacted), it is set to zero. Otherwise, the lowest bit specifies the type of the transaction: 0 for KTM and 1 for lightweight transactions. The remaining bits form a pointer to the associated transaction object, either of the TmTransactionObjectType type (represented by the _KTRANSACTION structure), or of the CmRegistryTransactionType type (represented by a non-public structure that I've personally named _CM_LIGHTWEIGHT_TRANS_OBJECT).
- KtmUow: if the handle is associated with a KTM transaction, this field stores the GUID that uniquely identifies it. For non-transacted and lightweight-transacted handles, the field is unused.
- ContextListHead: this is the head of the doubly-linked list of contexts that have been associated with the key body using the CmSetCallbackObjectContext function. It is related to the registry callbacks functionality; see also the Specifying Context Information MSDN article for more details.
- EnumerationResumeContext: this is part of an optimization of the subkey enumeration process of layered keys (implemented in CmpEnumerateLayeredKey). Performing full enumeration of a layered key from scratch up to the given index is a very complex task, and repeating it over and over for each iteration of an enumeration loop would be very inefficient. The resume context helps address the problem for sequential enumeration by saving the intermediate state reached at an NtEnumerateKey call with a given index, and being able to resume from it when a request for index+1 comes next. It also has the added benefit of making it possible to stop and restart the enumeration process in the scope of a single system call, which is used to pause the operation and temporarily release some locks if the code detects that the registry is particularly congested. This happens at the intersection of the CmEnumerateKey and CmpEnumerateLayeredKey functions, with the latter potentially returning STATUS_RETRY and the former resuming the operation if such a situation arises.
- RestrictedAccessMask, LastSearchedIndex, LockedMemoryMdls: relatively new fields introduced in Windows 10 and 11, which I haven't looked very deeply into and thus won't discuss in detail here.
After a key handle is translated into the corresponding _CM_KEY_BODY structure using the ObReferenceObjectByHandle(CmKeyObjectType) call, typically early in the execution of a registry-related system call, there are three primary operations that are usually performed. First, the kernel does a key status check by evaluating the expression KeyBody.Flags & 9 to determine if the key is associated with an unloaded hive (flag 0x1) or has been deleted (flag 0x8). This check is essential because most registry operations are only permitted on active, existing keys, and enforcing this condition is a fundamental step for guaranteeing registry state consistency. Second, the code accesses the KeyControlBlock pointer, which provides further access to the hive pointer (KCB.KeyHive), the key's cell index (KCB.KeyCell), and other necessary fields and data structures required to perform any meaningful read/write actions on the key. Finally, the code checks the key body's Trans/KtmUow members to determine if the handle is part of a transaction, and if so, the transaction is used as additional context for the action requested by the caller. Accesses to other members of the _CM_KEY_BODY structure are less frequent and serve more specialized purposes.
Key control blockThe key control block object can be thought of as the heart of the Windows kernel registry tree representation. It is effectively the descriptor of a single key in the system, and the second most important key-related object after the key node. It is always allocated from the kernel pool, and serves four main purposes:
- Mirrors frequently used information from the key node to make it faster to access by the kernel code. This includes building an efficient, in-memory representation of the registry tree to optimize the traversal time when referring to registry paths.
- Works as a single point of reference for all active handles to a specific key, and helps synchronize access to the key in the multithreaded Windows environment.
- Represents any pending, transacted state of the registry key that has been introduced by a client, but not fully committed yet.
- Represents any complex relationships between registry keys that extend beyond the internal structure of the hive. The primary example are differencing hives, which are overlaid on top of each other, and whose corresponding keys form so-called key stacks.
Blog post #2 in this series highlighted the dramatic growth of the registry codebase across successive Windows versions, illustrating the subsystem's steady expansion over the last few decades. Similarly, the size of the Key Control Block (KCB) itself has nearly doubled in time, from 168 bytes in Windows XP x64 to 312 bytes in the latest Windows 11 release. This expansion underscores the increasing amount of information associated with every registry key, which the kernel must manage consistently and securely.
The KCB structure layout is present in the PDB symbols and can be displayed in WinDbg:
0: kd> dt _CM_KEY_CONTROL_BLOCK
nt!_CM_KEY_CONTROL_BLOCK
+0x000 RefCount : Uint8B
+0x008 ExtFlags : Pos 0, 16 Bits
+0x008 Freed : Pos 16, 1 Bit
+0x008 Discarded : Pos 17, 1 Bit
+0x008 HiveUnloaded : Pos 18, 1 Bit
+0x008 Decommissioned : Pos 19, 1 Bit
+0x008 SpareExtFlag : Pos 20, 1 Bit
+0x008 TotalLevels : Pos 21, 10 Bits
+0x010 KeyHash : _CM_KEY_HASH
+0x010 ConvKey : _CM_PATH_HASH
+0x018 NextHash : Ptr64 _CM_KEY_HASH
+0x020 KeyHive : Ptr64 _HHIVE
+0x028 KeyCell : Uint4B
+0x030 KcbPushlock : _EX_PUSH_LOCK
+0x038 Owner : Ptr64 _KTHREAD
+0x038 SharedCount : Int4B
+0x040 DelayedDeref : Pos 0, 1 Bit
+0x040 DelayedClose : Pos 1, 1 Bit
+0x040 Parking : Pos 2, 1 Bit
+0x041 LayerSemantics : UChar
+0x042 LayerHeight : Int2B
+0x044 Spare1 : Uint4B
+0x048 ParentKcb : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x050 NameBlock : Ptr64 _CM_NAME_CONTROL_BLOCK
+0x058 CachedSecurity : Ptr64 _CM_KEY_SECURITY_CACHE
+0x060 ValueList : _CHILD_LIST
+0x068 LinkTarget : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x070 IndexHint : Ptr64 _CM_INDEX_HINT_BLOCK
+0x070 HashKey : Uint4B
+0x070 SubKeyCount : Uint4B
+0x078 KeyBodyListHead : _LIST_ENTRY
+0x078 ClonedListEntry : _LIST_ENTRY
+0x088 KeyBodyArray : [4] Ptr64 _CM_KEY_BODY
+0x0a8 KcbLastWriteTime : _LARGE_INTEGER
+0x0b0 KcbMaxNameLen : Uint2B
+0x0b2 KcbMaxValueNameLen : Uint2B
+0x0b4 KcbMaxValueDataLen : Uint4B
+0x0b8 KcbUserFlags : Pos 0, 4 Bits
+0x0b8 KcbVirtControlFlags : Pos 4, 4 Bits
+0x0b8 KcbDebug : Pos 8, 8 Bits
+0x0b8 Flags : Pos 16, 16 Bits
+0x0bc Spare3 : Uint4B
+0x0c0 LayerInfo : Ptr64 _CM_KCB_LAYER_INFO
+0x0c8 RealKeyName : Ptr64 Char
+0x0d0 KCBUoWListHead : _LIST_ENTRY
+0x0e0 DelayQueueEntry : _LIST_ENTRY
+0x0e0 Stolen : Ptr64 UChar
+0x0f0 TransKCBOwner : Ptr64 _CM_TRANS
+0x0f8 KCBLock : _CM_INTENT_LOCK
+0x108 KeyLock : _CM_INTENT_LOCK
+0x118 TransValueCache : _CHILD_LIST
+0x120 TransValueListOwner : Ptr64 _CM_TRANS
+0x128 FullKCBName : Ptr64 _UNICODE_STRING
+0x128 FullKCBNameStale : Pos 0, 1 Bit
+0x128 Reserved : Pos 1, 63 Bits
+0x130 SequenceNumber : Uint8B
I will not document each member individually, but will instead cover them in larger groups according to their common themes and functions.
Reference countKey Control Blocks are among the most frequently referenced registry objects, as almost every persistent registry operation involves an associated KCB. These blocks are referenced in various ways: by a subkey's KCB.ParentKcb pointer, a symbolic link key's KCB.LinkTarget pointer, through the global KCB tree, via open key handles (and the corresponding key bodies), in pending transacted operations (e.g., the _CM_KCB_UOW.KeyControlBlock pointer), and so on.
For system stability and security, it's crucial to accurately track all these active KCB references. This is done using the RefCount field, the first member in the KCB structure (offset 0x0). Historically a 16-bit field, it became a 32-bit integer, and on modern systems, it is a native word size—typically 64-bits on most computers. Whenever kernel code needs to operate on a KCB or store a pointer to it, it should increment the RefCount using functions from the CmpReferenceKeyControlBlock family. Conversely, when a KCB reference is no longer needed, functions like CmpDereferenceKeyControlBlock should decrement the count. When RefCount reaches zero, the kernel knows the structure is no longer in use and can safely free it.
Besides standard reference counting, KCBs employ optimizations to delay certain memory management processes. This avoids excessive KCB allocation and deallocation when a KCB is briefly unreferenced. Two mechanisms are used: delay deref and delay close. The former delays the actual refcount decrement, while the latter postpones object deallocation even after RefCount reaches zero. Callers must use the specialized function CmpDelayDerefKeyControlBlock for the delayed dereference.
From a low-level security perspective, it's worth considering potential issues related to the reference counting. Integer overflow might seem like a possibility, but it's practically impossible due to the field's width and additional overflow protection present in the CmpReferenceKeyControlBlock-like functions. A more realistic concern is a scenario where the kernel accidentally decrements the refcount by a larger value than the number of released references. This could lead to premature KCB deallocation and a use-after-free condition. Therefore, accurate KCB reference counting is a crucial area to investigate when researching Windows for registry vulnerabilities.
Basic key informationAs mentioned earlier, one of the most important types of information in the KCB is the unique identifier of the key in the hive, consisting of the _HHIVE descriptor pointer (KeyHive) and the corresponding key cell index (KeyCell). Very frequently, the kernel uses these two members to obtain the address of the key node mapping, which resembles the following pattern in the decompiled code:
_HHIVE *Hive = Kcb->KeyHive;
_CM_KEY_NODE *KeyNode = Hive->GetCellRoutine(Hive, Kcb->KeyCell);
//
// Further operations on KeyNode...
//
Cached data from the key nodeWhenever some information about a key needs to be queried based on its handle, it is generally more efficient to read it from the KCB than the key node. The reason is that a pool-based KCB access requires fewer memory fetches (it avoids the cell map walk), bypasses the context switch to the Registry process, and eliminates the potential need to page in hive data from disk. Consequently, the following types of information are cached inside KCBs:
- Key name, which is stored in a public _CM_NAME_CONTROL_BLOCK structure and pointed to by the NameBlock member. Every unique key name in the system has its own instance of the _CM_NAME_CONTROL_BLOCK object, which is reference-counted and shared across all KCBs of keys with that name. This is an optimization designed to prevent storing multiple redundant copies of the same string in kernel memory.
- Flags, stored in the Flags member and being an exact copy of the _CM_KEY_NODE.Flags value. There is also the KcbUserFlags field that caches the value of _CM_KEY_NODE.UserFlags, and KcbVirtControlFlags, which caches the value of _CM_KEY_NODE.VirtControlFlags. The semantics of all of these bitmasks were discussed in Part 5.
- Security descriptor, stored in a separate _CM_KEY_SECURITY_CACHE structure and pointed to by CachedSecurity.
- Subkey count, stored in the SubKeyCount field. It expresses the cumulative number of the key's stable and volatile subkeys, i.e. it is equal to the sum of _CM_KEY_NODE.SubKeyCounts[0] and SubKeyCounts[1].
- Value list, stored in the ValueList structure of type _CHILD_LIST, and equivalent to _CM_KEY_NODE.ValueList.
- Key limits, represented by KcbMaxNameLen, KcbMaxValueNameLen and KcbMaxValueDataLen. They correspond to the key node fields with the same names without the "Kcb" prefix.
- Fully qualified path, stored in FullKCBName. It is lazily initialized in the internal CmpConstructAndCacheName function, either when resolving a symbolic link, or as a result of calling the documented CmCallbackGetKeyObjectID API. A previously initialized path may be marked as stale by setting FullKCBNameStale (the least significant bit of the FullKCBName pointer).
It is essential for system security that the information found in KCBs is always synchronized with their key node counterparts. This is one of the most fundamental assumptions of the Windows registry implementation, and failure to guarantee it typically results in memory corruption or other severe security vulnerabilities.
Extended flagsIn addition to the flags fields that simply mirror the corresponding values from the key node, like Flags, KcbUserFlags and KcbVirtControlFlags, there is also a set of extended flags that are KCB-specific. They are stored in the following fields:
+0x008 ExtFlags : Pos 0, 16 Bits
+0x008 Freed : Pos 16, 1 Bit
+0x008 Discarded : Pos 17, 1 Bit
+0x008 HiveUnloaded : Pos 18, 1 Bit
+0x008 Decommissioned : Pos 19, 1 Bit
+0x008 SpareExtFlag : Pos 20, 1 Bit
[...]
+0x040 DelayedDeref : Pos 0, 1 Bit
+0x040 DelayedClose : Pos 1, 1 Bit
+0x040 Parking : Pos 2, 1 Bit
For the eight explicitly defined flags, here's a brief explanation:
- Freed: the KCB has been freed, but the underlying pool allocation may still be alive as part of the CmpFreeKCBListHead (older systems) or CmpKcbLookaside (Windows 10 and 11) lookaside lists.
- Discarded: the KCB has been unlinked from the global KCB tree and is not available for name-based lookups, but there may still be active references to it via open handles. It is typically set for keys that have been deleted, and for old instances of keys that have been renamed.
- HiveUnloaded: the underlying hive has been unloaded.
- Decommissioned: the KCB is no longer used (its reference count dropped to zero) and it is ready to be freed, but it hasn't been freed just yet.
- SpareExtFlag: as the name suggests, this is a spare bit that may be associated with a new flag in the future.
- DelayedDeref: the key is subject to a "delayed deref" mechanism, due to having been dereferenced using CmpDelayDerefKeyControlBlock instead of CmpDereferenceKeyControlBlock. This serves to defer the actual dereferencing of the KCB by some time, anticipating its near-future need and thus avoiding a redundant free-allocate sequence.
- DelayedClose: the key is subject to a "delayed close" mechanism, which is similar to delayed deref, but it involves delaying the freeing of a KCB structure even if its refcount has dropped to zero.
- Parking: the purpose of this bit is unclear, and it seems to be currently unused.
Last but not least, the ExtFlags member stores a further set of flags, which can be expressed as the following enum:
enum _CM_KCB_EXT_FLAGS
{
CM_KCB_NO_SUBKEY = 0x1,
CM_KCB_SUBKEY_ONE = 0x2,
CM_KCB_SUBKEY_HINT = 0x4,
CM_KCB_SYM_LINK_FOUND = 0x8,
CM_KCB_KEY_NON_EXIST = 0x10,
CM_KCB_NO_DELAY_CLOSE = 0x20,
CM_KCB_INVALID_CACHED_INFO = 0x40,
CM_KCB_READ_ONLY_KEY = 0x80,
CM_KCB_READ_ONLY_SUBKEY = 0x100,
};
Let's break it down:
- CM_KCB_NO_SUBKEY, CM_KCB_SUBKEY_ONE, CM_KCB_SUBKEY_HINT: these flags are currently obsolete, and were originally related to an old performance optimization. CM_KCB_NO_SUBKEY indicated that the key had no subkeys. CM_KCB_SUBKEY_ONE indicated that the key had exactly one subkey, and its 32-bit hint value was stored in KCB.HashKey. Finally, CM_KCB_SUBKEY_HINT indicated that the hints of all subkeys were stored in a dynamically allocated buffer pointed to by KCB.IndexHint. According to my analysis, none of the flags seem to be used in modern versions of Windows, even though their related fields in the KCB structure still exist.
- CM_KCB_SYM_LINK_FOUND: indicates that the key is a symbolic link whose target KCB has already been resolved during a previous access, and is cached in KCB.CachedChildList.RealKcb (older systems) or KCB.LinkTarget (Windows 10 and 11). It is an optimization designed to speed up the process of traversing symlinks, by performing the path lookup only once and later referring directly to the cached KCB where possible.
- CM_KCB_KEY_NON_EXIST: this is another deprecated flag that existed in historical implementations of the registry, but doesn't seem to be used anymore.
- CM_KCB_NO_DELAY_CLOSE: indicates that the key mustn't be subject to the "delayed close" mechanism, and instead should be freed as soon as all references to it are dropped.
- CM_KCB_INVALID_CACHED_INFO: this flag simply indicates that the IndexHint/HashKey/SubKeyCount fields contain out-of-date information that shouldn't be relied on.
- CM_KCB_READ_ONLY_KEY: this key is designated as read-only and, therefore, is not modifiable. The flag can be set by using the undocumented NtLockRegistryKey system call, which can only be called from kernel-mode. Shout out to James Forshaw who wrote an interesting post about it on his blog.
- CM_KCB_READ_ONLY_SUBKEY: the exact meaning and usage of the flag is unclear, but it appears to be enabled for keys with at least one descendant subkey marked as read-only. Specifically, the internal CmLockKeyForWrite function (the main routine behind NtLockRegistryKey's logic) sets it iteratively for every parent key of the read-only key, up to and including the hive's root.
To optimize access, the KCB stores the first four key body handles in the KeyBodyArray for fast, lockless access. The KeyBodyListHead field maintains the head of a doubly-linked list for any additional handles.
KCB lockThe KcbPushlock member within the KCB structure is a lock used to synchronize access to the key during various registry system calls. This lock is passed to standard kernel pushlock APIs, such as ExAcquirePushLockSharedEx, ExAcquirePushLockExclusiveEx, and ExReleasePushLockEx
Transacted stateThe key control block is central to managing the transacted state of registry keys, maintaining pending changes in memory before they are committed to the hive. Several fields within the KCB are specifically dedicated to this function:
- KCBUoWListHead: This field is a list head that anchors a list of Unit of Work (UoW) structures. Each UoW represents a specific action taken within a transaction, such as creating, deleting a key or setting or deleting a value. This list allows the system to track all pending transactional operations related to a particular key, and it is crucial for ensuring atomicity, as it records the operations that must be applied or rolled back as a single unit.
- TransKCBOwner: This field is used to identify the transaction object that "owns" the key. It is set on the KCBs of transactionally created keys, and signifies that the key is currently only visible in the context of the specific transaction. Once the transaction commits, this field is cleared, and the key becomes visible in the global registry tree.
- KCBLock and KeyLock: Two so-called intent locks of type _CM_INTENT_LOCK, which are used to ensure that no two transactions can be associated with a single key if their respective operations could invalidate each other's state. According to my understanding, KCBLock protects the consistency of the KCB in this regard, and KeyLock protects the key node. The !reg ixlock WinDbg command is designed to display the internal state of these locks.
- TransValueCache: This field is a structure that caches value entries associated with a particular KCB, if at least one of its values has been modified in an active transaction. Before a value is set, modified or deleted within a transaction for the first time, a copy of the current value list is taken and stored here. When a transaction is committed, the TransValueCache state is applied back to the key's persistent value list. On rollback, the list is simply discarded.
- TransValueListOwner: This field is a pointer to a transaction that currently "owns" the TransValueCache. At any given time, for each key, there may be at most one active transaction that has any pending operations involving the key's values.
These fields collectively form the core transaction management within the Windows Registry. Ever since their introduction in Windows Vista, they need to be correctly handled as part of every registry action, be it a read/write one, a transacted/non-transacted one etc. This is because the kernel must potentially incorporate any transacted state in any information queries, and must similarly pay attention not to allow the existence of two contradictory transactions at the same time, and not to allow a non-transacted operation to break any assumptions of an active transaction without invalidating it first. And any bugs related to managing the transacted state may have significant security implications, with some interesting examples being CVE-2023-21748 and CVE-2023-23420. The specific structures used to store the transacted state, such as _CM_TRANS or _CM_KCB_UOW, are discussed in more detail in the "Transaction structures" section below.
Layered key stateLayered keys were introduced in Windows 10 version 1607 to support containerisation through differencing hives. Because overlaying hives on top of each other is primarily a runtime concept, the Key Control Block (KCB) is the natural place to hold the state related to this feature, and there are three main members involved in this process:
- LayerSemantics: This 2-bit field indicates the state of a key within the layering system. It is an exact copy of the key's _CM_KEY_NODE.LayerSemantics value, cached in KCB for easier/quicker access. For a detailed overview of its possible values, please refer to Part 5.
- LayerHeight: This field specifies the level of the key within the differencing hive stack. A higher LayerHeight indicates that the key is higher up in the stack of layered hives, and a value of zero is used for base hives (i.e. normal non-differencing hives loaded on the host system).
- LayerInfo: This is a pointer to a _CM_KCB_LAYER_INFO structure, which describes the key's position within the stack of differencing hives. Among other things, it contains a pointer to the lower layer on the key stack, and the head of a list of layers above the current one.
The specifics of the structures associated with this functionality are discussed in the "Layered keys" section below.
KCB tree structureWhile key bodies are a common way to access KCB structures, they're not the only method. They are integral when you have an open handle to a key, as operations on the handle follow the handle → key body → KCB translation path. However, looking up keys by name or path is also crucial. Whether a key is opened or created, it relies on either an existing handle and a relative path (single subkey name or a longer path with backslash-separated names), or an absolute path starting with "\Registry\". In this scenario, the kernel needs to quickly check if a KCB exists for the given key and to obtain its address if it does. To achieve this, KCBs are organized into their own tree structure, which the kernel can traverse. The tree is rooted in CmpRegistryRootObject (specifically CmpRegistryRootObject->KeyControlBlock, as CmpRegistryRootObject itself is the key body representing the \Registry key), and mirrors the current registry layout from a high-level perspective.
Let's highlight several key points:
- KCB Existence: There's no guarantee that a corresponding KCB exists for every registry key. KCBs are allocated lazily only when a key is opened, created, or when a KCB that depends on the one being created is about to be allocated.
- Consistent KCB Tree Structure: The KCB tree structure is always consistent. If a KCB exists for a key, then KCBs for all its ancestors up to the root \Registry key must also exist.
- Cached Information in KCBs: KCBs contain cached information from the key node, plus additional runtime information that may not yet be in the hive (e.g., pending transactions). Before performing any operation on a key, it's crucial to consult its KCB.
- KCB Uniqueness: At any given time, there can be only one KCB corresponding to a specific key attached to the tree. It's possible for multiple KCBs of the same key to exist in memory, but only if some of them correspond to deleted instances, in which case they are no longer visible in the global tree (only through the handles, until they are closed). Before creating a new KCB, the kernel should always ensure that there isn't an existing one, and if there is, use it. Failing to maintain this invariant can lead to severe consequences, as illustrated by CVE-2023-23420.
- KCB Tree and Hives: The KCB tree combines key descriptors from different hives and therefore must implement support for "exit nodes" and "entry nodes", as described in the previous blog post. Both exit and entry nodes have corresponding KCBs that can be viewed and analyzed in WinDbg. Resolving transitions between exit and entry nodes generally involves reading the (_HHIVE*, root cell index) pair from the exit node and then locating and navigating to the corresponding KCB in the destination hive. To speed up this process, the kernel uses an optimization that sets the CM_KCB_SYM_LINK_FOUND flag (0x8) in the exit node's KCB and stores the entry node's KCB address in KCB.LinkTarget, simulating a resolved symbolic link and avoiding the need to look up the entry's KCB every time the key is traversed. In the diagram above, entry keys are marked in blue, exit nodes in orange, and the special connection between them by the connector with black squares.
- Key Depth: Every open key in the system has a depth in the global tree, representing the number of nesting levels separating it from the root. This value is stored in the TotalLevels field. For example, the root key \Registry has a depth of 1, and the key \Registry\Machine\Software\Microsoft\Windows has a depth of 5.
- Parent KCB Pointer: Every initialized KCB structure (whether attached to the tree or not) contains a pointer to its parent KCB in the ParentKcb field. The only exception is the global root \Registry, for which this pointer is NULL.
Now that we understand how the KCB tree works conceptually, let's examine how it is represented in memory. Interestingly, the KCB structure itself doesn't store a list of its subkeys. Instead, it relies on a simple 32-bit hash of the text string for fast lookups by name. The hash is calculated by multiplying successive characters of the string by powers of 37, where the first character is multiplied by the highest power and the last by the lowest (370, which is 1). This allows for a straightforward iterative implementation, shown below in C code:
uint32_t HashString(const std::string& str) {
uint32_t hash = 0;
for (size_t i = 0; i < str.size(); i++) {
hash = hash * 37 + toupper(str[i]);
}
return hash;
}
Some example outputs of the algorithm are:
HashString("Microsoft") = 0x7f00cd26
HashString("Windows") = 0x2f7de68b
HashString("CurrentVersion") = 0x7e25f69d
To calculate the hash of a path with multiple components, the same algorithm steps are repeated. However, in this case, the hashes of the successive path parts are treated similarly to the letters in the previous example. Therefore, the following formula is used to calculate the hash of the full "Microsoft\Windows\CurrentVersion" path:
0x7f00cd26 × 372 + 0x2f7de68b × 371 + 0x7e25f69d × 370 = 0x86a158ea
The hash value calculated for each key, based on its path relative to the hive's root, is stored in KCB.ConvKey.Hash. Consequently, the hash value for the standard system key HKLM\Software\Microsoft\Windows\CurrentVersion is 0x86a158ea.
Every hive has a directory of the KCBs within it, structured as a hashmap with a fixed number of buckets. Each bucket comprises a linked list of the KCBs located there. Internally, this directory is referred to as the "KCB cache" and is represented by the following two fields in the _CMHIVE structure:
+0x670 KcbCacheTable : Ptr64 _CM_KEY_HASH_TABLE_ENTRY
+0x678 KcbCacheTableSize : Uint4B
KcbCacheTable is a pointer to a dynamically allocated array of _CM_KEY_HASH_TABLE_ENTRY structures, and KcbCacheTableSize specifies the number of buckets (i.e., the number of elements in the KcbCacheTable array). In practice, the size of this KCB cache is 128 buckets for the virtual \Registry hive, 512 for the vast majority of hives loaded in the system, and 1024 for two specific system hives: HKLM\Software and HKLM\System. Given a specific key with a name hash denoted as ConvKey, its KCB can be found in the cache bucket indexed as follows:
TmpHash = 101027 * (ConvKey ^ (ConvKey >> 9));
CacheIndex = (TmpHash ^ (TmpHash >> 9)) & (Hive->KcbCacheTableSize - 1);
//
// Kcb can be found in Hive->KcbCacheTable[CacheIndex]
//
The operation of translating a key's path hash to its KCB cache table index (excluding the modulo KcbCacheTableSize step) is called "finalization". There's even a WinDbg helper command that can perform this action for us: !reg finalize. We can test it on the hash we calculated for the "Microsoft\Windows\CurrentVersion" path:
0: kd> !reg finalize 0x86a158ea
Finalized Hash for Hash=0x86a158ea: 0xc2c65312
So, the finalized hash is 0xc2c65312, and since the KCB cache hive size of the SOFTWARE hive is 1024, this means that the index of the HKLM\Software\Microsoft\Windows\CurrentVersion key in the array will be the lowest 10 bits, or 0x312. We can verify that our calculations are correct by finding the SOFTWARE hive in memory and listing the keys located in its individual buckets:
0: kd> !reg hivelist
ah...
| ffffe10d2dad4000 | 4da2000 | ffffe10d2da78000 | 3a6000 | ffffe10d3489f000 | ffffe10d2d8ff000 | emRoot\System32\Config\SOFTWARE
...
0: kd> !reg openkeys ffffe10d2dad4000
...
Index 312: 86a158ea kcb=ffffe10d2d576a30 cell=000a58e8 f=00200000 \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION
...
As we can see, our calculations have been proven to be accurate. We could achieve a similar result with the !reg hashindex command, which takes the address of the _HHIVE object and the ConvKey for a given key, and then prints out information about the corresponding bucket.
Within a single bucket in the KCB cache, all the KCBs are linked together in a singly-linked list starting at the _CM_KEY_HASH_TABLE_ENTRY.Entry pointer. The subsequent elements are accessible through the _CM_KEY_HASH.NextHash field, which points to the KCB.KeyHash structure in the next KCB on the list. A diagram of this data structure is shown below:
Now that we understand how the KCB objects are internally organized, let's examine how name lookups are implemented. Suppose we want to take a single step through a path and find the KCB of the next subkey based on its parent KCB and the key name. The process is as follows (assuming the parent is not an exit node):
- Get the pointer to the hive descriptor on which we are currently operating from ParentKcb->KeyHive.
- Calculate the hash of the subkey name based on its full path relative to the hive in which it is located.
- Calculate the appropriate index in the KCB cache based on the name hash and iterate through the linked list, comparing:
- The hash of the key name.
- The pointer to the parent KCB.
- If both of the above match, perform a full comparison of the key name. If it matches, we have found the subkey.
The process is particularly interesting because it is not based on directly iterating through the subkeys of a given key, but instead on iterating through all the keys in the particular cache bucket. Thanks to the use of hashing, the vast majority of checks of potential candidates for the sought-after subkey are reduced to a single comparison of two 32-bit numbers, making the whole process quite efficient. The performance is mostly dependent on the total number of keys in the hive and the number of hash collisions for the specific cache index.
If you'd like to dive deeper into the implementation of KCB tree traversal, I recommend analyzing the internal function CmpFindKcbInHashEntryByName, which performs a single step through the tree as described above. Another useful function to analyze is CmpPerformCompleteKcbCacheLookup, which recursively searches the tree to find the deepest KCB object corresponding to one of the elements of a given path.
For those experimenting in WinDbg, here are a few useful commands related to KCBs and their trees:
- !reg findkcb: This command finds the address of the KCB in the global tree that corresponds to the given fully qualified registry path, if it exists.
- !reg querykey: Similar to the command above, but in addition to providing the KCB address, it also prints the hive descriptor address, the corresponding key node address, and information about subkeys and values of the given key.
- !reg kcb: This command prints basic information about a key based on its KCB. Its advantage is that it translates flag names into their textual equivalents (e.g., CompressedName, NoDelete, HiveEntry, etc.), but it often doesn't provide the specific information one is looking for. In that case, it might be necessary to use the dt _CM_KEY_CONTROL_BLOCK command to dump the entire structure.
So far, this blog post has described only a few of the most important registry structures, which are essential to know for anyone conducting research in this area. However, in total, there are over 150 different structures used in the Windows kernel and related to the registry, and only about half are documented through debug symbols or on Microsoft's website. While it's impossible to detail the operation and function of all of these structures in one article, this section aims to at least provide an overview of a majority of them, to note which of them are publicly available, and to briefly describe how they are used internally.
The layout of many structures corresponding to the most complex mechanisms is publicly unknown at the time of writing and requires significant time and energy to reconstruct. Even then, the correct meaning of each field and flag cannot be guaranteed. Therefore, the information below should be used with caution and verified against the specific Windows version(s) in question before relying on it in any way.
Key opening/creationIn PDB
Structure name
Description
❌
Parse context
Given that the registry is integrated with the standard Windows object model, all operations on registry paths (both absolute and relative) must be performed through the standard NT Object Manager interface.
For example, the NtCreateKey syscall calls the CmCreateKey helper function. At this point, there are no further calls to Configuration Manager, but instead, there is a call to ObOpenObjectByNameEx (a more advanced version of ObOpenObjectByName). Several levels down, the kernel will transfer execution back to the registry code, specifically to the CmpParseKey callback, which is the entry point responsible for handling all path operations (i.e., all key open/create actions). This means that the CmCreateKey and CmpParseKey functions, which work together, cannot pass an arbitrary number of input and output arguments to each other. They only have one pointer (ParseContext) at their disposal, which can serve as a communication channel. Thus, the agreement between these functions is that the pointer points to a special "parse context" structure, which has three main roles:
- Pass the input configuration of a given operation, e.g. information about:
- operation mode (open/create),
- transactionality of the operation,
- following of symbolic links,
- flags related to WOW64 functionality,
- optional class data of the created key.
- Pass some return information, such as whether the key was opened or created,
- Cache certain information within a single "parse" request, e.g.:
- information on whether registry virtualization is enabled for a given process,
- when following a symbolic link, a pointer to the originating hive descriptor, in order to check whether the given transition is allowed within the hive trust class,
- when following a symbolic link, a pointer to the KCB of its target (or the closest possible ancestor).
Reconstructing the layout of this structure is a critical step in getting a better understanding of how the key opening/creation process works internally.
❌
Path info
When a client references a key by name, one of the first actions taken by the CmpParseKey function (or more specifically, CmpDoParseKey) is to take the string representing that name (absolute or relative), break it into individual parts separated by backslashes, and calculate the 32-bit hashes for each of them. This ensures that parsing only occurs once and doesn't need to be repeated. The structure where the result of this operation is stored is called "path info".
According to the documentation, a single registry path reference can contain a maximum of 32 levels of nesting. Therefore, the path info structure allows for the storage of 32 elements, in the following way: the first 8 elements being present directly within the structure, and if the path is deeply nested, an additional 24 elements within a supplementary structure allocated on-demand from kernel pools. The functions that operate on this object are CmpComputeComponentHashes, CmpExpandPathInfo, CmpValidateComponents, CmpGetComponentNameAtIndex, CmpGetComponentHashAtIndex, and CmpCleanupPathInfo.
Interestingly, I discovered an off-by-one bug in the CmpComputeComponentHashes function, which allows an attacker to write 25 values into a 24-element array. However, due to a fortunate coincidence, path info structures are allocated from a special lookaside list with allocation sizes significantly larger than the length of the structure itself. As a result, this buffer overflow is not exploitable in practice, which has also been confirmed by Microsoft. More information about this issue, as well as the reversed definition of this structure, can be found in my original report.
Key notificationsIn PDB
Structure name
Description
✅
_CM_NOTIFY_BLOCK
The first time RegNotifyChangeKeyValue or the underlying NtNotifyChangeMultipleKeys syscall is called on a given handle, a notify block structure is assigned to the corresponding key body object. This structure serves as the central control point for all notification requests made on that handle in the future. It also stores the configuration defined in the initial API call, which, once set, cannot be changed without closing and reopening the key. This is in line with the official MSDN documentation:
"This function should not be called multiple times with the same value for the hKey but different values for the bWatchSubtree and dwNotifyFilter parameters. The function will succeed but the changes will be ignored. To change the watch parameters, you must first close the key handle by calling RegCloseKey, reopen the key handle by calling RegOpenKeyEx, and then call RegNotifyChangeKeyValue with the new parameters."
The !reg notifylist command in WinDbg can list all active notify blocks in the system, allowing you to check which keys are currently being monitored for changes.
❌
Post block
Each post block object corresponds to a single wait for changes to a given key. Many post block objects can be assigned to one notify block object at the same time. The network of relationships in this structure becomes even more complex when using the NtNotifyChangeMultipleKeys syscall with a non-empty SubordinateObjects argument, in which case two separate post blocks share a third data structure (the so-called post block union). However, the details of this topic are beyond the scope of this post.
The WinDbg !reg postblocklist command allows you to see how many active post blocks are assigned to each process/thread, but unfortunately, it does not show any detailed information about their contents.
Registry callbacksIn PDB
Structure name
Description
✅
REG_*_INFORMATION
These structures are used for supplying callbacks with precise information about operations performed on the registry, and are part of the documented Windows interface. Consequently, not only their definitions but also detailed descriptions of the meaning of each field are published directly by Microsoft. A complete list of these structures can be found on MSDN, e.g., on the EX_CALLBACK_FUNCTION callback function (wdm.h) page.
However, I have found in my research that in addition to the official registry callback interface, there is also a less official extension that Microsoft uses internally in VRegDriver, the module that supports differencing hives. If a given client, instead of using the official CmRegisterCallbackEx function, calls the internal CmpRegisterCallbackInternal function with the fifth argument set to 1, this callback will be internally marked as "extended". Extended callbacks, in addition to the information provided by the standard structures, also receive a handful of additional information related to differencing hives and layered keys. At the time of writing, the differences occur in the structures representing the RegNtPreLoadKey, RegNtPreCreateKeyEx, RegNtPreOpenKeyEx actions and their "post" counterparts.
❌
Callback descriptor
The structure represents a single registry callback registered through the CmRegisterCallback or CmRegisterCallbackEx API. Once allocated, it is attached to a double-linked list represented by the global CallbackListHead object.
❌
Object context descriptor
A descriptor structure for a key body-specific context that can be assigned through the CmSetCallbackObjectContext API. This descriptor is then inserted into a linked list that starts at _CM_KEY_BODY.ContextListHead.
❌
Callback context
An internal structure used in the CmpCallCallBacksEx function to store the current state during the callback invocation process. For example, it's used to invoke the appropriate "post" type callbacks in case of an error in one of the "pre" type callbacks. These objects are freed by the dedicated CmpFreeCallbackContext function, which additionally caches a certain number of allocations in the global CmpCallbackContextSList list. This allows future requests for objects of this type to be quickly fulfilled.
Registry virtualizationIn PDB
Structure name
Description
❌
Replication stack
A core task of registry virtualization is the replication of keys, which involves creating an identical copy of a given key structure. This occurs under the path HKU\<SID>_Classes\VirtualStore when an application, subject to virtualization, attempts to create a key in a location where it lacks proper permissions. The entire operation is coordinated by the CmpReplicateKeyToVirtual function and consists of two main stages. First, a "replication stack" object is created and initialized in the CmpBuildVirtualReplicationStack function. This object specifies the precise key structure to be created within the virtualization process. Second, the actual creation of these keys based on this object occurs within the CmpDoBuildVirtualStack function.
TransactionsIn PDB
Structure name
Description
✅
_KTRANSACTION
A structure corresponding to a KTM transaction object, which is created by the CreateTransaction function or its low-level equivalent NtCreateTransaction.
❌
Lightweight transaction object
A direct counterpart of _KTRANSACTION, but for lightweight transactions, created by the NtCreateRegistryTransaction system call. It is very simple and only consists of a bitmask of the current transaction state, a push lock for synchronization, and a pointer to the corresponding _CM_TRANS object.
✅
_CM_KCB_UOW
The structure represents a single, active transactional operation linked to a specific key. In some scenarios, one logical operation corresponds to one such object (e.g., the UoWSetSecurityDescriptor type). In other cases, multiple UoWs are created for a single operation (e.g., UoWAddThisKey assigned to a newly created key, and UoWAddChildKey assigned to its parent).
This critical structure has multiple functions. The key ones are connecting to KCB intent locks and keeping any pending state related to a given operation, both before and during the transaction commit phase.
✅
_CM_UOW_*
Auxiliary sub-structures of _CM_KCB_UOW, which store information about the temporary state of the registry associated with a specific type of transactional operation. Specifically, the four structures are: _CM_UOW_KEY_STATE_MODIFICATION, _CM_UOW_SET_SD_DATA, _CM_UOW_SET_VALUE_KEY_DATA and _CM_UOW_SET_VALUE_LIST_DATA.
✅
_CM_TRANS
A descriptor of a specific registry transaction, usually associated with a particular hive. In special cases, if operations are performed on multiple hives within a single transaction, then multiple _CM_TRANS objects may exist for it. Given the address of the _CM_TRANS object, it is possible to list all operations associated with this transaction in WinDbg using the !reg uowlist command.
✅
_CM_RM
A descriptor of a specific resource manager. It only exists if the given hive has KTM transactions enabled, and never exists for app hives or hives loaded with the REG_HIVE_NO_RM flag.
Think of this structure as being associated with one set of .blf / .regtrans-ms log files, which usually means one _CM_RM structure is assigned to one hive. The exception is system hives (e.g. SOFTWARE, SYSTEM etc.) which all share the same resource manager that exists under the CmRmSystem global variable.
Given the address of a _CM_RM object in WinDbg, you can list all associated transactions using the !reg translist command.
✅
_CM_INTENT_LOCK
This structure represents an intent lock, with two instances (KCBLock and KeyLock) residing in the KCB. Their primary function is to ensure key consistency by preventing the assignment of two different transactions that contain conflicting modifications of a key. Given the object's address, WinDbg's !reg ixlock command can display some details about it.
❌
Serialized log records
KTM transacted registry operations are logged to .blf files on disk to enable consistent state restoration in case of unexpected shutdown during transaction commit. The CmAddLogForAction function serializes the _CM_KCB_UOW object into a flat buffer and writes it to the log file using the CLFS interface. While the _CM_KCB_UOW structure can be found in public symbols, their corresponding serialized representations cannot. Notably, there was an information disclosure vulnerability (CVE-2023-28271) that was directly related to these structures.
❌
Rollback packet
When a client performs a non-transactional operation that modifies a key, and there's an active transaction associated with that key, the transaction must be rolled back before the operation can be executed to prevent an inconsistent state. This is achieved using a structure that contains a list of transactions to be rolled back. This structure is passed to the CmpAbortRollbackPacket function, which carries out the rollback. Although the official layout of this structure is unknown, in practice it is quite simple, consisting of three fields: the current capacity, the current fill level of the list, and a pointer to a dynamically allocated array of transactions.
Differencing hives (VRegDriver)In PDB
Structure name
Description
❌
IOCTL input structures
The VRegDriver module works by creating the \Device\VRegDriver device, and communicates with its clients by supporting nine distinct IOCTLs within the corresponding VrpIoctlDeviceDispatch handler function. These IOCTLs, exclusively accessible to administrator users, facilitate loading and unloading differencing hives, configuring registry redirections for specific containers, and a few other operations. Each IOCTL requires a specific input data structure, none of which are officially documented. Therefore, practical use of this interface necessitates reverse engineering the required structures to understand their initialization. An example of a reversed structure, corresponding to IOCTL 0x220008 and provisionally named VRP_LOAD_DIFFERENCING_HIVE_INPUT, was showcased in blog post #4. This enabled the creation of a proof-of-concept exploit for a differencing hive vulnerability (CVE-2023-36404), demonstrating the ability to load custom hives and, consequently, expose the flaw.
❌
Silo context
This silo-specific context structure is set by the VRegDriver during silo initialization using the PsInsertPermanentSiloContext function. It is later retrieved by PsGetPermanentSiloContext and used during both IOCTL handling and path translation for containerized processes. A brief analysis suggests that it primarily contains the GUID of the associated silo, a push lock used for synchronization, and a user-configured list of namespaces for the given container, which is a set of source and target paths between which redirection should occur.
❌
Key context
This structure stores the context specific to a particular key being subject to path translation within a silo. It is usually allocated for each key opened within the context of a containerized process, and assigned to its key body using the CmSetCallbackObjectContext API. It primarily stores the original path of the key before translation — as the client believes it has access to — and several other auxiliary fields.
❌
Callback context (open/create)
The callback-specific context structure stores shared data between "pre" and "post" callbacks for a given operation. This context is generally accessed through the CallContext field within the REG_*_INFORMATION structure relevant to the specific operation. In practice, VRegDriver only has one instance of a special structure defined for this purpose, used when handling the RegNtPreCreateKeyEx/RegNtPreOpenKeyEx callbacks. It saves specific data (RootObject, CompleteName, RemainingName) before the open/create request, to restore their original values in the "post" callback.
❌
Extra parameter
This structure also appears to be used for temporarily storing the original key path during translation. However, its scope encompasses the entire key creation/opening process, rather than just a single callback. This means it can store information across callbacks, even when symbolic links or write-through hives are encountered during path traversal, causing the CmpParseKey function to return STATUS_REPARSE or STATUS_REPARSE_GLOBAL and restart the path lookup process. Although the concept of a whole operation context seems broadly applicable, currently there is only one type of "extra parameter" being used, represented by the GUID VRP_ORIGINAL_KEY_NAME_PARAMETER_GUID {85b8669a-cfbb-4ac0-b689-6daabfe57722}.
Layered keysIn PDB
Structure name
Description
✅
_CM_KCB_LAYER_INFO
This is likely the only structure related to layered keys whose definition is public. It is part of every KCB and contains information about the placement of the key in the global, "vertical" tree of layered key instances. In practice, this means that it stores a pointer to the KCB at one level lower (its parent, so to speak), and the head of a linked list with KCBs at one level higher (KCB.LayerHeight+1), if any exist.
❌
Key node stack
A stack containing all instances of a given layered key, starting from its level all the way down to level zero (the base key). Each key in this structure is represented by a (Hive, KeyCell) pair. If the key actually exists at a given level (KeyCell ≠ -1, indicating a state other than Merge-Unbacked), it is also represented by a direct, resolved pointer to its _CM_KEY_NODE structure.
Since Windows 10 introduced support for layered keys, many places in the code that previously identified a single key as _CM_KEY_NODE* now require passing the entire key node stack structure. This is because operations on layered keys usually require knowledge of the state of lower level keys (e.g. their layered semantics, subkeys, values), not just the key represented by the handle used by the caller.
Places where the key node stack structure is used can be identified by calls to its related helper functions, such as those for initialization (CmpInitializeKeyNodeStack) and cleanup (CmpCleanupKeyNodeStack), as well as any others containing the string "KeyNodeStack".
❌
KCB stack
This structure, analogous to the key node stack, represents keys using KCBs. Its use is most clearly revealed by references to the CmpStartKcbStack and CmpStartKcbStackForTopLayerKcb functions in code, though many other internal routines with "KcbStack" in their names also operate on it.
Both the KCB stack and the key node stack share an optimization where the first two levels are stored inline, with additional levels allocated in kernel pools only when necessary. This is likely due to the fact that most systems, even those with layered keys, typically only use one level of nesting (two levels total). Thus, this optimization avoids costly memory allocation and deallocation in these common scenarios.
❌
Enum stack
This data structure allows for the enumeration of subkeys within a given layered key. Its primary use is within the CmpEnumerateLayeredKey function, which serves as the handler for the NtEnumerateKey operation specifically for layered keys. At an even higher level, this corresponds to the RegEnumKeyExW API function. The complexity of this structure is evident by the fact that there are 19 internal helper functions, all starting with the name CmpKeyEnumStack, that operate on it.
❌
Enum resume context
This data structure, directly tied to the subkey enumeration, primarily serves as an optimization mechanism. After executing a specific number (N) of enumeration steps, it stores the internal state of the enum stack. This allows subsequent requests for subkey N+1 to resume the enumeration process from the previous point, bypassing the need to repeat the initial steps. Linked to a specific handle, it is stored within _CM_KEY_BODY.EnumerationResumeContext.
The KCB.SequenceNumber field, directly related to this structure, monitors whether a given key has significantly changed since a previous point in time. This enables the CmpKeyEnumStackVerifyResumeContext helper function to determine if the current registry state is consistent enough for the existing enumeration resume context to be used for further enumeration, or if the entire process needs to be restarted.
❌
Value enum stack
This data structure, used to enumerate values for layered keys, is similarly complex as those used to list subkeys. The main function utilizing it is CmEnumerateValueFromLayeredKey. Additionally, there are 10 helper functions named CmpValueEnumStack[...] that operate on this structure.
❌
Sorted value enum stack
The structure is similar to the standard value enum stack, but is used to iterate over the values of a given layered key while preserving lexicographical order. Helper functions from the CmpSortedValueEnumStack[...] family (9 in total) correspond to this structure. This functionality is used exclusively in the CmpGetValueCountForKeyNodeStack function, which is responsible for returning the number of values for a given key.
The reason for the existence of this mechanism in parallel with the regular "value enum stack" is not entirely clear, but I suspect it serves as an optimization for value counting operations. This is supported by the fact that while layered keys first appeared in Windows 10 1607 (Redstone, build 14393), the sorted value enum stack was not introduced until the later version of Windows 10 1703 (Redstone 2, build 15063). In the first iteration of the layered key implementation, CmpGetValueCountForKeyNodeStack was implemented using the standard value enum stack. This lends credibility to the hypothesis that these mechanisms are functionally equivalent, but the "sorted" version is faster at counting unique values when direct access to them is not required.
❌
Subtree enumerator
This structure enables the enumeration of both the direct subkeys of a layered key and all its deeper descendants. It is relatively complex, and its associated functions begin with CmpSubtreeEnumerator[...] (also 9 in total). This mechanism is primarily needed to implement the "rename" operation on layered keys. First, it allows verification that the caller has KEY_READ and DELETE permissions for all descendant keys in the subtree, and second, it enables setting the LayerSemantics value for these descendants to Supersede-Tree (0x3).
❌
Discard/replace context
This data structure is employed during key deletion to ensure that KCB structures corresponding to higher-level Merge-Unbacked keys reliant on the deleted key are also marked as deleted. Subsequently, "fresh" KCB objects representing the non-existent key are inserted into the tree in their place. The two primary functions associated with this mechanism are CmpPrepareDiscardAndReplaceKcbAndUnbackedHigherLayers and CmpCommitDiscardAndReplaceKcbAndUnbackedHigherLayers.
ConclusionThe goal of this post was to provide a thorough overview of the structures used in the Configuration Manager subsystem in Windows, with particular emphasis on the most important and frequently used ones, i.e. those describing hives and keys. I wanted to share this knowledge because there are not many publicly available sources that accurately describe the registry's operation from the implementation side, especially relevant to the most recent code developments in Windows 10 and 11. I would also like to once again use this opportunity to appeal to Microsoft to make more information available through public PDB symbols – this would greatly facilitate the work of security researchers in the future.
This post concludes the part of the series focusing solely on the inner workings of the registry. In the next, seventh installment, we will shift our perspective and examine the registry's role in the overall security of the system, with a deep focus on vulnerability research. Stay tuned!