Project Zero
The Windows Registry Adventure #6: Kernel-mode objects
Posted by Mateusz Jurczyk, Google Project Zero
Welcome back to the Windows Registry Adventure! In the previous installment of the series, we took a deep look into the internals of the regf hive format. Understanding this foundational aspect of the registry is crucial, as it illuminates the design principles behind the mechanism, as well as its inherent strengths and weaknesses. The data stored within the regf file represents the definitive state of the hive. Knowing how to parse this data is sufficient for handling static files encoded in this format, such as when writing a custom regf parser to inspect hives extracted from a hard drive. However, for those interested in how regf files are managed by Windows at runtime, rather than just their behavior in isolation, there's a whole other dimension to explore: the multitude of kernel-mode objects allocated and maintained throughout the lifecycle of an active hive. These auxiliary objects are essential for several reasons:
- To track all currently loaded hives, their properties (e.g., load flags), their memory mappings, and the relationships between them (especially for delta hives overlaid on top of each other).
- To synchronize access to keys and hives within the multithreaded Windows environment.
- To cache hive information for faster access compared to direct memory mapping lookups.
- To integrate the registry with the NT Object Manager and support standard operations (opening/closing handles, setting/querying security descriptors, enforcing access checks, etc.).
- To manage the state of pending transactions before they are fully committed to the underlying hive.
To address these diverse requirements, the Windows kernel employs numerous interconnected structures. In this post, we will examine some of the most critical ones, how they function, and how they can be effectively enumerated and inspected using WinDbg. It's important to note that Microsoft provides official definitions only for some registry-related structures through PDB symbols for ntoskrnl.exe. In many cases, I had to reverse-engineer the relevant code to recover structure layouts, as well as infer the types and names of particular fields and enums. Throughout this write-up, I will clearly indicate whether each structure definition is official or reverse-engineered. If you spot any inaccuracies, please let me know. The definitions presented here are primarily derived from Windows Server 2019 with the March 2022 patches (kernel build 10.0.17763.2686), which was the kernel version used for the majority of my registry code analysis. However, over 99% of registry structure definitions appear to be identical between this version and the latest Windows 11, making the information directly applicable to the latest systems as well.
Hive structuresGiven that hives are the most intricate type of registry object, it's not surprising that their kernel-mode descriptors are equally complex and lengthy. The primary hive descriptor structure in Windows, known as _CMHIVE, spans a substantial 0x12F8 bytes – exceeding 4 KiB, the standard memory page size on x86-family architectures. Contained within _CMHIVE, at offset 0, is another structure of type _HHIVE, which occupies 0x600 bytes, as depicted in the diagram below:
This relationship mirrors that of other common Windows object pairs, such as _EPROCESS / _KPROCESS and _ETHREAD / _KTHREAD. Because _HHIVE is always allocated as a component of the larger _CMHIVE structure, their pointer types are effectively interchangeable. If you encounter a decompiled access using a _HHIVE* pointer that extends beyond the size of the structure, it almost certainly indicates a reference to a field within the encompassing _CMHIVE object.
But why are two distinct structures dedicated to representing a single registry hive? While technically not required, this separation likely serves to delineate fields associated with different abstraction layers of the hive. Specifically:
- _HHIVE manages the low-level aspects of the hive, including the hive header, bins, and cells, as well as in-memory mappings and synchronization state with its on-disk counterpart (e.g., dirty sectors).
- _CMHIVE handles more abstract information about the hive, such as the cache of security descriptors, pointers to high-level kernel objects like the root Key Control Block (KCB), and the associated transaction resource manager (_CM_RM structure).
The next subsections will provide a deeper look into the responsibilities and inner workings of these two structures.
_HHIVE structure overviewThe primary role of the _HHIVE structure is to manage the memory-related state of a hive. This allows higher-level registry code to perform operations such as allocating, freeing, and marking cells as "dirty" without needing to handle the low-level implementation details. The _HHIVE structure comprises 49 top-level members, most of which will be described in larger groups below:
0: kd> dt _HHIVE
nt!_HHIVE
+0x000 Signature : Uint4B
+0x008 GetCellRoutine : Ptr64 _CELL_DATA*
+0x010 ReleaseCellRoutine : Ptr64 void
+0x018 Allocate : Ptr64 void*
+0x020 Free : Ptr64 void
+0x028 FileWrite : Ptr64 long
+0x030 FileRead : Ptr64 long
+0x038 HiveLoadFailure : Ptr64 Void
+0x040 BaseBlock : Ptr64 _HBASE_BLOCK
+0x048 FlusherLock : _CMSI_RW_LOCK
+0x050 WriterLock : _CMSI_RW_LOCK
+0x058 DirtyVector : _RTL_BITMAP
+0x068 DirtyCount : Uint4B
+0x06c DirtyAlloc : Uint4B
+0x070 UnreconciledVector : _RTL_BITMAP
+0x080 UnreconciledCount : Uint4B
+0x084 BaseBlockAlloc : Uint4B
+0x088 Cluster : Uint4B
+0x08c Flat : Pos 0, 1 Bit
+0x08c ReadOnly : Pos 1, 1 Bit
+0x08c Reserved : Pos 2, 6 Bits
+0x08d DirtyFlag : UChar
+0x090 HvBinHeadersUse : Uint4B
+0x094 HvFreeCellsUse : Uint4B
+0x098 HvUsedCellsUse : Uint4B
+0x09c CmUsedCellsUse : Uint4B
+0x0a0 HiveFlags : Uint4B
+0x0a4 CurrentLog : Uint4B
+0x0a8 CurrentLogSequence : Uint4B
+0x0ac CurrentLogMinimumSequence : Uint4B
+0x0b0 CurrentLogOffset : Uint4B
+0x0b4 MinimumLogSequence : Uint4B
+0x0b8 LogFileSizeCap : Uint4B
+0x0bc LogDataPresent : [2] UChar
+0x0be PrimaryFileValid : UChar
+0x0bf BaseBlockDirty : UChar
+0x0c0 LastLogSwapTime : _LARGE_INTEGER
+0x0c8 FirstLogFile : Pos 0, 3 Bits
+0x0c8 SecondLogFile : Pos 3, 3 Bits
+0x0c8 HeaderRecovered : Pos 6, 1 Bit
+0x0c8 LegacyRecoveryIndicated : Pos 7, 1 Bit
+0x0c8 RecoveryInformationReserved : Pos 8, 8 Bits
+0x0c8 RecoveryInformation : Uint2B
+0x0ca LogEntriesRecovered : [2] UChar
+0x0cc RefreshCount : Uint4B
+0x0d0 StorageTypeCount : Uint4B
+0x0d4 Version : Uint4B
+0x0d8 ViewMap : _HVP_VIEW_MAP
+0x110 Storage : [2] _DUAL
SignatureEqual to 0xBEE0BEE0, it is a unique signature of the _HHIVE / _CMHIVE structures. It may be useful in digital forensics for identifying these structures in raw memory dumps, and is yet another reference to bees in the Windows registry implementation.
Function pointersNext up, there are six function pointers, initialized in HvHiveStartFileBacked and HvHiveStartMemoryBacked, and pointing at internal kernel handlers for the following operations:
Pointer name
Pointer value
Operation
GetCellRoutine
HvpGetCellPaged or HvpGetCellFlat
Translate cell index to virtual address
ReleaseCellRoutine
HvpReleaseCellPaged or HvpReleaseCellFlat
Release previously translated cell index
Allocate
CmpAllocate
Allocate kernel memory within global registry quota
Free
CmpFree
Free kernel memory within global registry quota
FileWrite
CmpFileWrite
Write data to hive file
FileRead
CmpFileRead
Read data from hive file
As we can see, these functions provide the basic functionality of operating on kernel memory, cell indexes, and the hive file. In my opinion, the most important of them is GetCellRoutine, whose typical destination, HvpGetCellPaged, performs the cell map walk in order to translate a cell index into the corresponding address within the hive mapping.
It is natural to think that these function pointers could prove useful for exploitation if an attacker managed to corrupt them through a buffer overflow or a use-after-free condition. That was indeed the case in Windows 10 and earlier, but in Windows 11, these calls are now de-virtualized, and most call sites reference one of HvpGetCellPaged / HvpGetCellFlat and HvpReleaseCellPaged / HvpReleaseCellFlat directly, without referring to the pointers. This is great for security, as it completely eliminates the usefulness of those fields in any offensive scenarios.
Here's an example of a GetCellRoutine call in Windows 10, disassembled in IDA Pro:
And the same call in Windows 11:
Hive load failure informationThis is a pointer to a public _HIVE_LOAD_FAILURE structure, which is passed as the first argument to the SetFailureLocation function every time an error occurs while loading a hive. It can be helpful in tracking which validity checks have failed for a given hive, without having to trace the entire loading process.
Base blockA pointer to a copy of the hive header, represented by the _HBASE_BLOCK structure.
Synchronization locksThere are two locks with the following purpose:
- FlusherLock – synchronizes access to the hive between clients changing data inside cells and the flusher thread;
- WriterLock – synchronizes access to the hive between writers that modify the bin/cell layout.
They are officially of type _CMSI_RW_LOCK, but they boil down to _EX_PUSH_LOCK, and they are used with standard kernel APIs such as ExAcquirePushLockSharedEx.
Dirty blocks informationBetween offsets 0x58 and 0x84, _HHIVE stores several data structures representing the state of synchronization between the in-memory and on-disk instances of the hive.
Hive flagsFirst of all, there are two flags at offset 0x8C that indicate if the hive mapping is flat and if the hive is read-only. Secondly, there is a 32-bit HiveFlags member that stores further flags which aren't (as far as I know) included in any public Windows symbols. I have managed to reverse-engineer and infer the meaning of the constants I have observed, resulting in the following enum:
enum _HV_HIVE_FLAGS
{
HIVE_VOLATILE = 0x1,
HIVE_NOLAZYFLUSH = 0x2,
HIVE_PRELOADED = 0x10,
HIVE_IS_UNLOADING = 0x20,
HIVE_COMPLETE_UNLOAD_STARTED = 0x40,
HIVE_ALL_REFS_DROPPED = 0x80,
HIVE_ON_PRELOADED_LIST = 0x400,
HIVE_FILE_READ_ONLY = 0x8000,
HIVE_SECTION_BACKED = 0x20000,
HIVE_DIFFERENCING = 0x80000,
HIVE_IMMUTABLE = 0x100000,
HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL = 0x800000,
};
Below is a one-liner explanation of each flag:
- HIVE_VOLATILE: the hive exists in memory only; set, e.g., for \Registry and \Registry\Machine\HARDWARE.
- HIVE_NOLAZYFLUSH: changes to the hive aren't automatically flushed to disk and require a manual flush; set, e.g., for \Registry\Machine\SAM.
- HIVE_PRELOADED: the hive is one of the default, system ones; set, e.g., for \Registry\Machine\SOFTWARE, \Registry\Machine\SYSTEM, etc.
- HIVE_IS_UNLOADING: the hive is currently being loaded or unloaded in another thread and shouldn't be accessed before the operation is complete.
- HIVE_COMPLETE_UNLOAD_STARTED: the unloading process of the hive has started in CmpCompleteUnloadKey.
- HIVE_ALL_REFS_DROPPED: all references to the hive through KCBs have been dropped.
- HIVE_ON_PRELOADED_LIST: the hive is linked into a linked-list via the PreloadedHiveList field.
- HIVE_FILE_READ_ONLY: the underlying hive file is read-only and shouldn't be modified; indicates that the hive was loaded with the REG_OPEN_READ_ONLY flag set.
- HIVE_SECTION_BACKED: the hive is mapped in memory using section views.
- HIVE_DIFFERENCING: the hive is a differencing one (version 1.6, loaded under \Registry\WC).
- HIVE_IMMUTABLE: the hive is immutable and cannot be modified; indicates that it was loaded with the REG_IMMUTABLE flag set.
- HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL: the kernel always maintains a local copy of every page of the hive, either by locking it in physical memory or creating a private copy through the CoW mechanism.
Between offsets 0xA4 to 0xCC, there are a number of fields having to do with log file management, i.e. the .LOG1/.LOG2 files accompanying the main hive file on disk.
Hive versionThe Version field stores the minor version of the hive, which should theoretically be an integer between 3–6. However, as mentioned in the previous blog post, it is possible to set it to an arbitrary 32-bit value either by specifying a major version equal to 0 and any desired minor version, or by enticing the kernel to recover the hive header from a log file, and abusing the fact that the HvAnalyzeLogFiles function is more permissive than HvpGetHiveHeader. Nevertheless, I haven't found any security implications of this behavior.
View mapThe view map holds all the essential information about how the hive is mapped in memory. The specific implementation of registry memory management has evolved considerably over the years, with its details changing between consecutive system versions. In the latest ones, the view map is represented by the top-level _HVP_VIEW_MAP public structure:
0: kd> dt _HVP_VIEW_MAP
nt!_HVP_VIEW_MAP
+0x000 SectionReference : Ptr64 Void
+0x008 StorageEndFileOffset : Int8B
+0x010 SectionEndFileOffset : Int8B
+0x018 ProcessTuple : Ptr64 _CMSI_PROCESS_TUPLE
+0x020 Flags : Uint4B
+0x028 ViewTree : _RTL_RB_TREE
The semantics of its respective fields are as follows:
- SectionReference: Contains a kernel-mode handle to a section object corresponding to the hive file, created via ZwCreateSection in CmSiCreateSectionForFile.
- StorageEndFileOffset: Stores the maximum size of the hive that can be represented with file-backed sections at any given time. Initially set to the size of the loaded hive, it can dynamically increase or decrease at runtime for mutable (normal) hives.
- SectionEndFileOffset: Represents the size of the hive file section at the time of loading. It is never modified past the first initialization in HvpViewMapStart, and seems to be mostly used as a safeguard against extending an immutable hive file beyond its original size.
- ProcessTuple: A structure of type _CMSI_PROCESS_TUPLE, it identifies the host process of the hive's section views. This field currently always points to the global CmpRegistryProcess object, which corresponds to the dedicated "Registry" process that hosts all hive mappings in the system. However, this field could enable a more fine-grained separation of hive mappings across multiple processes, should Microsoft choose to implement such a feature.
- Flags: Represents a set of memory management flags relevant to the entire hive. These flags are not publicly documented; however, through reverse engineering, I have determined their purpose to be as follows:
- VIEW_MAP_HIVE_FILE_IMMUTABLE (0x1): Indicates that the hive has been loaded as immutable, meaning no data is ever saved back to the underlying hive file.
- VIEW_MAP_MUST_BE_KEPT_LOCAL (0x2): Indicates that all of the hive data must be persistently stored in memory, and not just accessible through file-backed sections. This is likely to protect against double-fetch conditions involving hives loaded from remote network shares.
- VIEW_MAP_CONTAINS_LOCKED_PAGES (0x4): Indicates that some of the hive's pages are currently locked in physical memory using ZwLockVirtualMemory.
- ViewTree: This is the root of a view tree structure, which contains the descriptors of each continuous section view mapped in memory.
Overall, the implementation of low-level hive memory management in Windows is more complex than might initially seem necessary. This complexity arises from the kernel's need to gracefully handle a variety of corner cases and interactions. For example, hives may be loaded as immutable, which indicates that the hive may be operated on in memory, but changes must not be flushed to disk. Simultaneously, the system must support recovering data from .LOG files, including the possibility of extending the hive beyond its original on-disk length. At runtime, it must also be possible to efficiently modify the registry data, as well as shrink and extend it on demand. To further complicate matters, Windows enforces different rules for locking hive pages in memory depending on the backing volume of the file, carefully balancing optimal memory usage and system security guarantees. These and many other factors collectively contribute to the complexity of hive memory management.
To better understand how the view tree is organized, let's first analyze the general logic of the hive mapping code.
The hive mapping logicThe main kernel function responsible for mapping a hive in memory is HvLoadHive. It implements the overall logic and coordinates various sub-routines responsible for performing more specialized tasks, in the following order:
- Header Validation: The kernel reads and inspects the hive's header to ascertain its integrity, ensuring that the hive has not been tampered with or corrupted. Relevant function: HvpGetHiveHeader.
- Log Analysis: The kernel processes the hive's transaction logs, scrutinising them to identify any pending changes or inconsistencies that necessitate recovery procedures. Relevant function: HvAnalyzeLogFiles.
- Initial Section Mapping: A section object is created based on the hive file, and further segmented into multiple views, each aligned to 4 KiB boundaries and capped at 2 MiB. At this point, the kernel prioritizes the creation of an initial mapping without focusing on the granular layout of individual bins within the hive. Relevant function: HvpViewMapStart.
- Cell Map Initialization: The cell map, a component that translates cell indexes to memory address, is initialized. Its entries are configured to point to the newly created views. Relevant function: HvpMapHiveImageFromViewMap.
- Log Recovery (if required): If the preceding log analysis reveals the need for data recovery, the kernel attempts to restore data integrity. This is the earliest point at which the newly created memory mappings may already be modified and marked as "dirty", indicating that their contents have been altered and require synchronisation with the on-disk representation. Relevant function: HvpPerformLogFileRecovery.
- Bin Mapping: In this final stage, the kernel establishes definitive memory mappings for each bin within the hive, ensuring that each bin occupies a contiguous region of memory. This process may necessitate creating new views, eliminating existing ones, or adjusting their boundaries to accommodate the specific arrangement of bins. Relevant function: HvpRemapAndEnlistHiveBins.
Now that we understand the primary components of the loading process, we can examine the internal structure of the section view tree in more detail.
The view treeLet's consider an example hive consisting of three bins of sizes 256 KiB, 2 MiB and 128 KiB, respectively. After step 3 ("Initial Section Mapping"), the section views created by the kernel are as follows:
As we can see, at this point, the kernel doesn't concern itself with bin boundaries or continuity: all it needs to achieve is to make every page of the hive accessible through a section view for log recovery purposes. In simple terms, the way that HvpViewMapStart (or more specifically, HvpViewMapCreateViewsForRegion) works is it creates as many 2 MiB views as necessary, followed by one last view that covers the remaining part of the file. So in our example, we have the first view that covers bin 1 and the beginning of bin 2, and the second view that covers the trailing part of bin 2 and the entire bin 3. It's important to note that memory continuity is only guaranteed within the scope of a single view, and views 1 and 2 may be mapped at completely different locations in the virtual address space.
Later in step 6, the system ensures that every bin is mapped as a contiguous block of memory before handing off the hive to the client. This is done by iterating through all the bins, and for every bin that spans more than one view in the current view map, the following operations are performed:
- If the start and/or the end of the bin fall into the middle of existing views, these views are truncated from either side. Furthermore, if there are any views that are fully covered by the bin, they are freed and removed from the tree.
- A new, dedicated section view is created for the bin and inserted into the view tree.
In our hypothetical scenario, the resulting view layout would be as follows:
As we can see, the kernel shrinks views 1 and 2, and creates a new view 3 corresponding to bin 2 to fill the gap. The final layout of the binary tree of section view descriptors is illustrated below:
Knowing this, we can finally examine the structure of a single view tree entry. It is not included in the public symbols, but I named it _HVP_VIEW. My reverse-engineered version of its definition is as follows:
struct _HVP_VIEW
{
RTL_BALANCED_NODE Node;
LARGE_INTEGER ViewStartOffset;
LARGE_INTEGER ViewEndOffset;
SSIZE_T ValidStartOffset;
SSIZE_T ValidEndOffset;
PBYTE MappingAddress;
SIZE_T LockedPageCount;
_HVP_VIEW_PAGE_FLAGS PageFlags[];
};
The role of each particular field is documented below:
- Node: This is the structure used to link all of the entries into a single red-black tree, passed to helper kernel functions such as RtlRbInsertNodeEx and RtlRbRemoveNode.
- ViewStartOffset and ViewEndOffset: This offset pair specifies the overall byte range covered by the underlying section view object in the hive file. Their difference corresponds to the cumulative length of the red and green boxes in a single row in the diagrams above.
- ValidStartOffset and ValidEndOffset: This offset pair specifies the valid range of the hive accessible through this view, i.e. the green rectangles in the diagrams. It must always be a subset of the [ViewStartOffset, ViewEndOffset] range, and may dynamically change while re-mapping bins (as just shown in this section), as well as when shrinking and extending the hive.
- MappingAddress: This is the base address of the section view mapping in memory, as returned by ZwMapViewOfSection. It is valid in the context of the process specified by _HVP_VIEW_MAP.ProcessTuple (currently always the "Registry" process). It covers the entire range between [ViewStartOffset, ViewEndOffset], but only pages between [ValidStartOffset, ValidEndOffset] are accessible, and the rest of the section view is marked as PAGE_NOACCESS.
- LockedPageCount: Specifies the number of pages locked in virtual memory using ZwLockVirtualMemory within this view.
- PageFlags: A variable-length array that specifies a set of flags for each memory page in the [ViewStartOffset, ViewEndOffset] range.
I haven't found any (un)official sources documenting the set of supported page flags, so below is my attempt to name them and explain their meaning:
Flag
Value
Description
VIEW_PAGE_VALID
0x1
Indicates if the page is valid – true for pages between [ValidStartOffset, ValidEndOffset], false otherwise. If this flag is clear, all other flags are irrelevant/unused.
The flag is set:
- When creating section views during hive loading, first the initial ones in HvpViewMapStart, and then the bin-specific ones in HvpRemapAndEnlistHiveBins.
- When extending an active hive in HvpViewMapExtendStorage.
The flag is cleared:
- When trimming the existing views in HvpRemapAndEnlistHiveBins to make room for new ones.
- When shrinking the hive in HvpViewMapShrinkStorage.
VIEW_PAGE_COW_BY_CALLER
0x2
Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as initiated by a client action, e.g. a registry operation that modified data in a cell and thus resulted in marking the page as dirty.
The flag is set:
- When dirtying a hive cell, in HvpViewMapMakeViewRangeCOWByCaller.
The flag is cleared:
- When flushing the registry changes to disk, in HvpViewMapMakeViewRangeUnCOWByCaller.
VIEW_PAGE_COW_BY_POLICY
0x4
Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as required by the policy that all pages of non-local hives (hives loaded from volumes other than the system volume) must always remain in memory.
The flag is set:
- In HvpViewMapMakeViewRangeValid, as an alternative way of keeping a local copy of the hive pages in memory (if locking fails, or the caller doesn't want the pages locked).
- In HvpViewMapMakeViewRangeCOWByCaller, when converting previously locked pages to the "CoW by policy" state.
- In HvpMappedViewConvertRegionFromLockedToCOWByPolicy, when lazily converting previously locked pages to the "CoW by policy" state in a thread that runs every 60 seconds (as indicated by CmpLazyLocalizeIntervalInSeconds).
The flag is cleared:
- In HvpViewMapMakeViewRangeUnCOWByPolicy, which currently only ever seems to happen for hives loaded from the system volume, i.e. "\SystemRoot" and "\OSDataRoot", as listed in the global CmpWellKnownVolumeList array.
VIEW_PAGE_WRITABLE
0x8
Indicates if the page is currently marked as writable, typically as a result of a modifying operation on the page that hasn't been yet flushed to disk.
The flag is set:
- In HvpViewMapMakeViewRangeCOWByCaller, when marking a cell as dirty.
The flag is cleared:
- In HvpViewMapMakeViewRangeUnCOWByCaller, when flushing the hive changes to disk.
- In HvpViewMapSealRange, when setting the memory as read-only for miscellaneous reasons (after performing log file recovery, etc.).
VIEW_PAGE_LOCKED
0x10
Indicates if the page is currently locked in physical memory.
The flag is set:
- In HvpViewMapMakeViewRangeValid if the caller requests page locking, and there is enough space left in the 64 MiB working set of the Registry process. In practice, this boils down to locking the initial 2 MiB hive mappings created in HvpViewMapStart for all app hives and for normal hives outside of the system disk volume.
The flag is cleared:
- Whenever the state of the page changes to CoW-by-policy or Invalid in the following functions:
- HvpViewMapMakeViewRangeCOWByCaller
- HvpMappedViewConvertRegionFromLockedToCOWByPolicy
- HvpViewMapMakeViewRangeUnCOWByPolicy
- HvpViewMapMakeViewRangeInvalid
The semantics of most of the flags are straightforward, but perhaps VIEW_PAGE_COW_BY_POLICY and VIEW_PAGE_LOCKED warrant a slightly longer explanation. The two flags are mutually exclusive, and they represent nearly identical ways to achieve the same goal: ensure that a copy of each hive page remains resident in memory or a pagefile. Under normal circumstances, the kernel could simply create the necessary section views in their default form, and let the memory management subsystem decide how to handle their pages most efficiently. However, one of the guarantees of the registry is that once a hive has been loaded, it must remain operational for as long as it is active in the system. On the other hand, section views have the property that (parts of) their underlying data may be completely evicted by the kernel, and later re-read from the original storage medium such as the hard drive. So, it is possible to imagine a situation where:
- A hive is loaded from a removable drive (e.g. a CD-ROM or flash drive) or a network share,
- Due to high memory pressure from other applications, some of the hive pages are evicted from memory,
- The removable drive with the hive file is ejected from the system,
- A client subsequently tries to operate on the hive, but parts of it are unavailable and cannot be fetched again from the original source.
This could cause some significant problems and make the registry code fail in unexpected ways. It would also constitute a security vulnerability: the kernel assumes that once it has opened and sanitized the hive file, its contents remain consistent for as long as the hive is used. This is achieved by opening the file with exclusive access, but if the hive data was ever re-read by the Windows memory manager, a malicious removable drive or an attacker-controlled network share could ignore the exclusivity request and provide different, invalid data on the second read. This would result in a kind of "double fetch" condition and potentially lead to kernel memory corruption.
To address both the reliability and security concerns, Windows makes sure to never evict pages corresponding to hives for which exclusive access cannot be guaranteed. This covers hives loaded from a location other than the system volume, and since Windows 10 19H1, also all app hives regardless of the file location. The first way to achieve this is by locking the pages directly in physical memory with a ZwLockVirtualMemory call. It is used for the initial ≤ 2 MiB section views created while loading a hive, up to the working set limit of the Registry process currently set at 64 MiB. The second way is by taking advantage of the copy-on-write mechanism – that is, marking the relevant pages as PAGE_WRITECOPY and subsequently touching each of them using the HvpViewMapTouchPages helper function. This causes the memory manager to create a private copy of each memory page containing the same data as the original, thus preventing them from ever being unavailable for registry operations.
Between the two types of resident pages, the CoW type effectively becomes the default option in the long term. Eventually most pages converge to this state, even if they initially start as locked. This is because locked pages transition to CoW on multiple occasions, e.g. when converted by the background CmpDoLocalizeNextHive thread that runs every 60 seconds, or during the modification of a cell. On the other hand, once a page transitions to the CoW state, it never reverts to being locked. A diagram illustrating the transitions between the page residence states in a hive loaded from removable/remote storage is shown below:
For normal hives loaded from the system volume (i.e. without the VIEW_MAP_MUST_BE_KEPT_LOCAL flag set), the state machine is much simpler:
As a side note, CVE-2024-43452 was an interesting bug that exploited a flaw in the page residency protection logic. The bug arose because some data wasn't guaranteed to be resident in memory and could be fetched twice from a remote SMB share during bin mapping. This occurred early in the hive loading process, before page residency protections were fully in place. The kernel trusted the data from the second read without re-validation, allowing it to be maliciously set to invalid values, resulting in kernel memory corruption.
Cell mapsAs discussed in Part 5, almost every cell contains references to other cells in the hive in the form of cell indexes. Consequently, virtually every registry operation involves multiple rounds of translating cell indexes into their corresponding virtual addresses in order to traverse the registry structure. Section views are stored in a red-black tree, so the search complexity is O(log n). This may seem decent, but if we consider that on a typical system, the registry is read much more often than it is extended/shrunk, it becomes apparent that it makes sense to further optimize the search operation at the cost of a less efficient insertion/deletion. And this is exactly what cell maps are: a way of trading a faster search complexity of O(1) for slower insertion/deletion complexity of O(n) instead of O(log n). Thanks to this technique, HvpGetCellPaged – perhaps the hottest function in the Windows registry implementation – executes in constant time.
In technical terms, cell maps are pagetable-like structures that divide the 32-bit hive address space into smaller, nested layers consisting of so-called directories, tables, and entries. As a reminder, the layout of cell indexes and cell maps is illustrated in the diagram below, based on a similar diagram in the Windows Internals book, which itself draws from Mark Russinovich's 1999 article, Inside the Registry:
Given the nature of the data structure, the corresponding cell map walk involves dereferencing three nested arrays based on the subsequent 1, 10 and 9-bit parts of the cell index, and then adding the final 12-bit offset to the page-aligned address of the target block. The internal kernel structures matching the respective layers of the cell map are _DUAL, _HMAP_DIRECTORY, _HMAP_TABLE and _HMAP_ENTRY, all publicly accessible via the ntoskrnl.exe PDB symbols. The entry point to the cell map is the Storage array at the end of the _HHIVE structure:
0: kd> dt _HHIVE
nt!_HHIVE
[...]
+0x118 Storage : [2] _DUAL
The index into the two-element array represents the storage type, 0 for stable and 1 for volatile, so a single _DUAL structure describes a 2 GiB view of a specific storage space:
0: kd> dt _DUAL
nt!_DUAL
+0x000 Length : Uint4B
+0x008 Map : Ptr64 _HMAP_DIRECTORY
+0x010 SmallDir : Ptr64 _HMAP_TABLE
+0x018 Guard : Uint4B
+0x020 FreeDisplay : [24] _FREE_DISPLAY
+0x260 FreeBins : _LIST_ENTRY
+0x270 FreeSummary : Uint4B
Let's examine the semantics of each field:
- Length: Expresses the current length of the given storage space in bytes. Directly after loading the hive, the stable length is equal to the size of the hive on disk (including any data recovered from log files, minus the 4096 bytes of the header), and the volatile space is empty by definition. Only cell map entries within the [0, Length - 1] range are guaranteed to be valid.
- Map: Points to the actual directory structure represented by _HMAP_DIRECTORY.
- SmallDir: Part of the "small dir" optimization, discussed in the next section.
- Guard: Its specific role is unclear, as the field is always initialized to 0xFFFFFFFF upon allocation and never used afterwards. I expect that it is some kind of debugging remnant from the early days of the registry development, presumably related to the small dir optimization.
- FreeDisplay: A data structure used to optimize searches for free cells during the cell allocation process. It consists of 24 buckets, each corresponding to a specific cell size range and represented by the _FREE_DISPLAY structure, indicating which pages in the hive may potentially contain free cells of the given length.
- FreeBins: The head of a doubly-linked list that links the descriptors of entirely empty bins in the hive, represented by the _FREE_HBIN structures.
- FreeSummary: A bitmask indicating which buckets within FreeDisplay have any hints set for the given cell size. A zero bit at a given position means that there are no free cells of the specific size range anywhere in the hive.
The next level in the cell map hierarchy is the _HMAP_DIRECTORY structure:
0: kd> dt _HMAP_DIRECTORY
nt!_HMAP_DIRECTORY
+0x000 Directory : [1024] Ptr64 _HMAP_TABLE
As we can see, it is simply a 1024-element array of pointers to _HMAP_TABLE:
0: kd> dt _HMAP_TABLE
nt!_HMAP_TABLE
+0x000 Table : [512] _HMAP_ENTRY
Further, we get a 512-element array of pointers to the final level of the cell map, _HMAP_ENTRY:
0: kd> dt _HMAP_ENTRY
nt!_HMAP_ENTRY
+0x000 BlockOffset : Uint8B
+0x008 PermanentBinAddress : Uint8B
+0x010 MemAlloc : Uint4B
This last level contains a descriptor of a single page in the hive and warrants a deeper analysis. Let's start by noting that the four least significant bits of PermanentBinAddress correspond to a set of undocumented flags that control various aspects of the page behavior. I was able to reverse-engineer them and partially recover their names, largely thanks to the fact that some older Windows 10 builds contained non-inlined functions operating on these flags, with revealing names like HvpMapEntryIsDiscardable or HvpMapEntryIsTrimmed:
enum _MAP_ENTRY_FLAGS
{
MAP_ENTRY_NEW_ALLOC = 0x1,
MAP_ENTRY_DISCARDABLE = 0x2,
MAP_ENTRY_TRIMMED = 0x4,
MAP_ENTRY_DUMMY = 0x8,
};
Here's a brief summary of their meaning based on my understanding:
- MAP_ENTRY_NEW_ALLOC: Indicates that this is the first page of a bin. Cell indexes pointing into this page must specify an offset within the range of [0x20, 0xFFF], as they cannot fall into the first 32 bytes that correspond to the _HBIN structure.
- MAP_ENTRY_DISCARDABLE: Indicates that the whole bin is empty and consists of a single free cell.
- MAP_ENTRY_TRIMMED: Indicates that the page has been marked as "trimmed" in HvTrimHive. More specifically, this property is related to hive reorganization, and is set during the loading process on some number of trailing pages that only contain keys accessed during boot, or not accessed at all since the last reorganization. The overarching goal is likely to prevent introducing unnecessary fragmentation in the hive by avoiding mixing together keys with different access histories.
- MAP_ENTRY_DUMMY: Indicates that the page is allocated from the kernel pool and isn't part of a section view.
With this in mind, let's dive into the details of each _HMAP_ENTRY structure member:
- PermanentBinAddress: The lower 4 bits contain the above flags. The upper 60 bits represent the base address of the bin mapping corresponding to this page.
- BlockOffset: This field has a dual functionality. If the MAP_ENTRY_DISCARDABLE flag is set, it is a pointer to a descriptor of a free bin, _FREE_HBIN, linked into the _DUAL.FreeBins linked list. If it is clear (the typical case), it expresses the offset of the page relative to the start of the bin. Therefore, the virtual address of the block's data in memory can be calculated as (PermanentBinAddress & (~0xF)) + BlockOffset.
- MemAlloc: If the MAP_ENTRY_NEW_ALLOC flag is set, it contains the size of the bin, otherwise it is zero.
And this concludes the description of how cell maps are structured. Taking all of it into account, the implementation of the HvpGetCellPaged function starts to make a lot of sense. Its pseudocode comes down to the following:
_CELL_DATA *HvpGetCellPaged(_HHIVE *Hive, HCELL_INDEX Index) {
_HMAP_ENTRY *Entry = &Hive->Storage[Index >> 31].Map
->Directory[(Index >> 21) & 0x3FF]
->Table[(Index >> 12) & 0x1FF];
return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + (Index & 0xFFF) + 4;
}
The same process is followed, for example, by the implementation of the WinDbg !reg cellindex extension, which also translates a pair of a hive pointer and a cell index into the virtual address of the cell.
The small dir optimizationThere is one other implementation detail about the cell maps worth mentioning here – the small dir optimization. Let's start with the observation that a majority of registry hives in Windows are relatively small, below 2 MiB in size. This can be easily verified by using the !reg hivelist command in WinDbg, and taking note of the values in the "Stable Length" and "Volatile Length" columns. Most of them usually contain values between several kilobytes to hundreds of kilobytes. This would mean that if the kernel allocated the full first-level directory for these hives (taking up 1024 entries × 8 bytes = 8 KiB on 64-bit platforms), they would still only use the first element in it, leading to a non-trivial waste of memory – especially in the context of the early 1990's when the registry was first implemented. In order to optimize this common scenario, Windows developers employed an unconventional approach to simulate a 1-item long "array" with the SmallDir member of type _HMAP_TABLE in the _DUAL structure, and have the _DUAL.Map pointer point at it instead of a separate pool allocation when possible. Later, whenever the hive grows and requires more than one element of the cell map directory, the kernel falls back to the standard behavior and performs a normal pool allocation for the directory array.
A revised diagram illustrating the cell map layout of a small hive is shown below:
Here, we can see that indexes 1 through 1023 of the directory array are invalid. Instead of correctly initialized _HMAP_TABLE structures, they point into "random" data corresponding to other members of the _DUAL and the larger _CMHIVE structure that happen to be located after _DUAL.SmallDir. Ordinarily, this is merely a low-level detail that doesn't have any meaningful implications, as all actively loaded hives remain internally consistent and always contain cell indexes that remain within the bounds of the hive's storage space. However, if we look at it through the security lens of hive-based memory corruption, this behavior suddenly becomes very interesting. If an attacker was able to implant an out-of-bounds cell index with the directory index greater than 0 into a hive, they would be able to get the kernel to operate on invalid (but deterministic) data as part of the cell map walk, and enable a powerful arbitrary read/write primitive. In addition to the small dir optimization, this technique is also enabled by the fact that the HvpGetCellPaged routine doesn't perform any bounds checks of the cell indexes, instead blindly trusting that they are always valid.
If you are curious to learn more about the exploitation aspect of out-of-bounds cell indexes, it was the main subject of my Practical Exploitation of Registry Vulnerabilities in the Windows Kernel talk given at OffensiveCon 2024 (slides and video recording are available). I will also discuss it in more detail in one of the future blog posts focused specifically on the security impact of registry vulnerabilities.
_CMHIVE structure overviewBeyond the first member of type _HHIVE at offset 0, the _CMHIVE structure contains more than 3 KiB of further information describing an active hive. This data relates to concepts more abstract than memory management, such as the registry tree structure itself. Below, instead of a field-by-field analysis, we'll focus on the general categories of information within _CMHIVE, organized loosely by increasing complexity of the data structures:
- Reference count: a 32-bit refcount primarily used during short-term operations on the hive, to prevent the object from being freed while actively operated on. These are used by the thin wrappers CmpReferenceHive and CmpDereferenceHive.
- File handles and sizes: handles and current sizes of the hive files on disk, such as the main hive file (.DAT) and the accompanying log files (.LOG, .LOG1, .LOG2). The handles are stored in FileHandles array, and the sizes reside in ActualFileSize and LogFileSizes.
- Text strings: some informational strings that may prove useful when trying to identify a hive based on its _CMHIVE structure. For example, the hive file name is stored in FileUserName, and the hive mount point path is stored in HiveRootPath.
- Timestamps: there are several timestamps that can be found in the hive descriptor, such as DirtyTime, UnreconciledTime or LastWriteTime.
- List entries: instances of the _LIST_ENTRY structure used to link the hive into various double-linked lists, such as the global list of hives in the system (HiveList, starting at nt!CmpHiveListHead), or the list of hives within a common trust class (TrustClassEntry).
- Synchronization mechanisms: various objects used to synchronize access to the hive as a whole, or some of its parts. Examples include HiveRundown, SecurityLock and HandleClosePendingEvent.
- Unload history: a 128-element array that stores the number of steps that have been successfully completed in the process of unloading the hive. Its specific purpose is unclear, it might be a debugging artifact retained from older versions of Windows.
- Late unload state: objects related to deferred unloading of registry hives (LateUnloadWorkItemState, LateUnloadFinishedEvent, LateUnloadWorkItem).
- Hive layout information: the hive reorganization process in Windows tries to optimize hives by grouping together keys accessed during system runtime, followed by keys accessed during system boot, followed by completely unused keys. If a hive is structured according to this order during load, the kernel saves information about the boundaries between the three distinct areas in the BootStart, UnaccessedStart and UnaccessedEnd members of _CMHIVE.
- Flushing state and dirty block information: any state that has to do with marking cells as dirty and synchronizing their contents to disk. There are a significant number of fields related to the functionality, with names starting with "Flush...", "Unreconciled..." and "CapturedUnreconciled...".
- Volume context: a pointer to a public _CMP_VOLUME_CONTEXT structure, which provides extended information about the disk volume of the hive file. As an example, it is used in the internal CmpVolumeContextMustHiveFilePagesBeKeptLocal routine to determine whether the volume is a system one, and consequently whether certain security/reliability assumptions are guaranteed for it or not.
- KCB table and root KCB: a table of the globally visible KCB (Key Control Block) structures corresponding to keys in the hive, and a pointer to the root key's KCB. I will discuss KCBs in more detail in the "Key structures" section below.
- Security descriptor cache: a cache of all security descriptors present in the hive, allocated from the kernel pool and thus accessible more efficiently than the underlying hive mappings. In my bug reports, I have often taken advantage of the security cache as a straightforward way to demonstrate the exploitability of security descriptor use-after-frees. A security node UAF can be easily converted into an UAF of its pool-based cached object, which then reliably triggers a Blue Screen of Death when Special Pool is enabled. The security cache of any given hive can be enumerated using the !reg seccache command in WinDbg.
- Transaction-related objects: a pointer to a _CM_RM structure that describes the Resource Manager object associated with the hive, if "heavyweight" transactions (i.e. KTM transactions) are enabled for it.
Last but not least, _CMHIVE has its own Flags field that is different from _HHIVE.Flags. As usual, the flags are not documented, so the listing below is a product of my own analysis:
enum _CM_HIVE_FLAGS
{
CM_HIVE_UNTRUSTED = 0x1,
CM_HIVE_IN_SID_MAPPING_TABLE = 0x2,
CM_HIVE_HAS_RM = 0x8,
CM_HIVE_IS_VIRTUALIZABLE = 0x10,
CM_HIVE_APP_HIVE = 0x20,
CM_HIVE_PROCESS_PRIVATE = 0x40,
CM_HIVE_MUST_BE_REORGANIZED = 0x400,
CM_HIVE_DIFFERENCING_WRITETHROUGH = 0x2000,
CM_HIVE_CLOUDFILTER_PROTECTED = 0x10000,
};
A brief description of each of them is as follows:
- CM_HIVE_UNTRUSTED: the hive is "untrusted" in the sense of registry symbolic links; in other words, it is not one of the default system hives loaded on boot. The distinction is that trusted hives can freely link to all other hives in the system, while untrusted ones can only link to hives within their so-called trust class. This is to prevent confused deputy-style privilege escalation attacks in the system.
- CM_HIVE_IN_SID_MAPPING_TABLE: the hive is linked into an internal data structure called the "SID mapping table" (nt!CmpSIDToHiveMapping), used to efficiently look up the user class hives mounted at \Registry\User\<SID>_Classes for the purposes of registry virtualization.
- CM_HIVE_HAS_RM: KTM transactions are enabled for this hive, meaning that the corresponding .blf and .regtrans-ms files are present in the same directory as the main hive file. The flag is clear if the hive is an app hive or if it was loaded with the REG_HIVE_NO_RM flag set.
- CM_HIVE_IS_VIRTUALIZABLE: accesses to this hive may be subject to registry virtualization. As far as I know, the only hive with this flag set is currently HKLM\SOFTWARE, which seems in line with the official documentation.
- CM_HIVE_APP_HIVE: this is an app hive, i.e. it was loaded under \Registry\A with the REG_APP_HIVE flag set.
- CM_HIVE_PROCESS_PRIVATE: this hive is private to the loading process, i.e. it was loaded with the REG_PROCESS_PRIVATE flag set.
- CM_HIVE_MUST_BE_REORGANIZED: the hive fragmentation threshold (by default 1 MiB) has been exceeded, and the hive should undergo the reorganization process at the next opportunity. The flag is simply a means of communication between the CmCheckRegistry and CmpReorganizeHive internal routines, both of which execute during hive loading.
- CM_HIVE_DIFFERENCING_WRITETHROUGH: this is a delta hive loaded in the writethrough mode, which technically means that the DIFF_HIVE_WRITETHROUGH flag was specified in the DiffHiveFlags member of the VRP_LOAD_DIFFERENCING_HIVE_INPUT structure, as discussed in Part 4.
- CM_HIVE_CLOUDFILTER_PROTECTED: new flag added in December 2024 as part of the fix for CVE-2024-49114. It indicates that the hive file has been protected against being converted to a Cloud Filter placeholder by setting the "$Kernel.CFDoNotConvert" extended attribute (EA) on the file in CmpAdjustFileCFSafety.
This concludes the documentation of the hive descriptor structure, arguably the largest and most complex object in the Windows registry implementation.
Key structuresThe second most important objects in the registry are keys. They can be basically thought of as the essence of the registry, as nearly every registry operation involves them in some way. They are also the one and only registry element that is tightly integrated with the Windows NT Object Manager. This comes with many benefits, as client applications can operate on the registry using standardized handles, and can leverage automatic security checks and object lifetime management. However, this integration also presents its own challenges, as it requires the Configuration Manager to interact with the Object Manager correctly and handle its intricacies and edge cases securely. For this reason, internal key-related structures play a crucial role in the registry implementation. They help organize key state in a way that simplifies keeping it up-to-date and internally consistent. For security researchers, understanding these structures and their semantics is invaluable. This knowledge enables you to quickly identify bugs in existing code or uncover missing handling of unusual but realistic conditions.
The two fundamental key structures in the Windows kernel are the key body (_CM_KEY_BODY) and key control block (_CM_KEY_CONTROL_BLOCK). The key body is directly associated with a key handle in the NT Object Manager, similar to the role that the _FILE_OBJECT structure plays for file handles. In other words, this is the initial object that the kernel obtains whenever it calls ObReferenceObjectByHandle to reference a user-supplied handle. There may concurrently exist a number of key body structures associated with a single key, as long as there are several programs holding active handles to the key. Conversely, the key control block represents the global state of a specific key and is used to manage its general properties. This means that for most keys in the system, there is at most one KCB allocated at a time. There may be no KCB for keys that haven't been accessed yet (as they are initialized by the kernel lazily), and there may be more than one KCB for the same registry path if the key has been deleted and created again (these two instances of the key are treated as separate entities, with one of them being marked as deleted/non-existent). Taking this into account, the relationship between key bodies and KCBs is many-to-one, with all of the key bodies of a single KCB being connected in a doubly-linked list, as shown in the diagram below:
The following subsections provide more detail about each of these two structures.
Key bodyThe key body structure is allocated and initialized in the internal CmpCreateKeyBody routine, and freed by the NT Object Manager when all references to the object are dropped. It is a relatively short and simple object with the following definition:
0: kd> dt _CM_KEY_BODY
nt!_CM_KEY_BODY
+0x000 Type : Uint4B
+0x004 AccessCheckedLayerHeight : Uint2B
+0x008 KeyControlBlock : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x010 NotifyBlock : Ptr64 _CM_NOTIFY_BLOCK
+0x018 ProcessID : Ptr64 Void
+0x020 KeyBodyList : _LIST_ENTRY
+0x030 Flags : Pos 0, 16 Bits
+0x030 HandleTags : Pos 16, 16 Bits
+0x038 Trans : _CM_TRANS_PTR
+0x040 KtmUow : Ptr64 _GUID
+0x048 ContextListHead : _LIST_ENTRY
+0x058 EnumerationResumeContext : Ptr64 Void
+0x060 RestrictedAccessMask : Uint4B
+0x064 LastSearchedIndex : Uint4B
+0x068 LockedMemoryMdls : Ptr64 Void
Let's quickly go over each field:
- Type: for normal keys (i.e. almost all of them), this field is set to a magic value of 0x6B793032 ('ky02'). However, for predefined keys, this is the 32-bit value of the link's target key with the highest bit set. This member is therefore used to distinguish between regular keys and predefined ones, for example in CmObReferenceObjectByHandle. Predefined keys have been now largely deprecated, but it is still possible to observe a non-standard Type value by opening a handle to one of the two last remaining ones: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009 and CurrentLanguage under the same path.
- AccessCheckedLayerHeight: a new field added in November 2023 as part of the fix for CVE-2023-36404. It is used for layered keys and contains the index of the lowest layer in the key stack that was access-checked when opening the key. It is later taken into account during other registry operations, in order to avoid leaking data from lower-layer, more restrictive keys that could have been created since the handle was opened.
- KeyControlBlock: a pointer to the corresponding key control block.
- NotifyBlock: an optional pointer to the notify block associated with this handle. This is related to the key notification functionality in Windows and is described in more detail in the "Key notification structures" section below.
- ProcessID: the PID of the process that created the handle. It doesn't seem to serve any purpose in the kernel other than to be enumerable using the NtQueryOpenSubKeysEx system call (which requires SeRestorePrivilege, and is therefore available to administrators only).
- KeyBodyList: the list entry used to link all the key bodies within a single KCB together.
- Flags: a set of flags concerning the specific key body. Here's my interpretation of them based on reverse engineering:
- KEY_BODY_HIVE_UNLOADED (0x1): indicates that the underlying hive of the key has been unloaded and is no longer active.
- KEY_BODY_DONT_RELOCK (0x2): this seems to be a short-term flag used to communicate between CmpCheckKeyBodyAccess/CmpCheckOpenAccessOnKeyBody and the nested CmpDoQueryKeyName routine, in order to indicate that the key's KCB is already locked and shouldn't be relocked again.
- KEY_BODY_DONT_DEINIT (0x4): if this flag is set, CmpDeleteKeyObject returns early and doesn't proceed with the regular deinitialization of the key body object. However, it is unclear if/where the flag is set in the code, as I personally haven't found any instances of it happening during my analysis.
- KEY_BODY_DELETED (0x8): indicates that the key has been deleted since the handle was opened, and it no longer exists.
- KEY_BODY_DONT_VIRTUALIZE (0x10): indicates that registry virtualization is disabled for this handle, as a result of opening the key with the (undocumented but present in SDK headers) REG_OPTION_DONT_VIRTUALIZE flag.
- HandleTags: from the kernel perspective, this is simply a general purpose 16-bit storage that can be set by clients on a per-handle basis using NtSetInformationKey with the KeySetHandleTagsInformation information class, and queried with NtQueryKey and the KeyHandleTagsInformation information class. As far as I know, the kernel doesn't dictate how this field should be used and leaves it up to the registry clients. In practice, it seems to be mostly used for purposes related to WOW64 and the Registry Redirector, storing flags such as KEY_WOW64_64KEY (0x100) and KEY_WOW64_32KEY (0x200), as well as some internal ones. The WOW64 functionality is implemented in KernelBase.dll, and functions such as ConstructKernelKeyPath and LocalBaseRegOpenKey are a good starting point for reverse engineering, if you're curious to learn more. I have also observed the 0x1000 handle tag being set in the internal IopApplyMutableTagToRegistryKey kernel routine for keys such as HKLM\System\ControlSet001\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}\0000, but I'm unsure of its meaning.
- Trans: Indicates the transactional state of the handle. If the handle is not transacted (i.e. it wasn't opened with one of RegOpenKeyTransacted or RegCreateKeyTransacted), it is set to zero. Otherwise, the lowest bit specifies the type of the transaction: 0 for KTM and 1 for lightweight transactions. The remaining bits form a pointer to the associated transaction object, either of the TmTransactionObjectType type (represented by the _KTRANSACTION structure), or of the CmRegistryTransactionType type (represented by a non-public structure that I've personally named _CM_LIGHTWEIGHT_TRANS_OBJECT).
- KtmUow: if the handle is associated with a KTM transaction, this field stores the GUID that uniquely identifies it. For non-transacted and lightweight-transacted handles, the field is unused.
- ContextListHead: this is the head of the doubly-linked list of contexts that have been associated with the key body using the CmSetCallbackObjectContext function. It is related to the registry callbacks functionality; see also the Specifying Context Information MSDN article for more details.
- EnumerationResumeContext: this is part of an optimization of the subkey enumeration process of layered keys (implemented in CmpEnumerateLayeredKey). Performing full enumeration of a layered key from scratch up to the given index is a very complex task, and repeating it over and over for each iteration of an enumeration loop would be very inefficient. The resume context helps address the problem for sequential enumeration by saving the intermediate state reached at an NtEnumerateKey call with a given index, and being able to resume from it when a request for index+1 comes next. It also has the added benefit of making it possible to stop and restart the enumeration process in the scope of a single system call, which is used to pause the operation and temporarily release some locks if the code detects that the registry is particularly congested. This happens at the intersection of the CmEnumerateKey and CmpEnumerateLayeredKey functions, with the latter potentially returning STATUS_RETRY and the former resuming the operation if such a situation arises.
- RestrictedAccessMask, LastSearchedIndex, LockedMemoryMdls: relatively new fields introduced in Windows 10 and 11, which I haven't looked very deeply into and thus won't discuss in detail here.
After a key handle is translated into the corresponding _CM_KEY_BODY structure using the ObReferenceObjectByHandle(CmKeyObjectType) call, typically early in the execution of a registry-related system call, there are three primary operations that are usually performed. First, the kernel does a key status check by evaluating the expression KeyBody.Flags & 9 to determine if the key is associated with an unloaded hive (flag 0x1) or has been deleted (flag 0x8). This check is essential because most registry operations are only permitted on active, existing keys, and enforcing this condition is a fundamental step for guaranteeing registry state consistency. Second, the code accesses the KeyControlBlock pointer, which provides further access to the hive pointer (KCB.KeyHive), the key's cell index (KCB.KeyCell), and other necessary fields and data structures required to perform any meaningful read/write actions on the key. Finally, the code checks the key body's Trans/KtmUow members to determine if the handle is part of a transaction, and if so, the transaction is used as additional context for the action requested by the caller. Accesses to other members of the _CM_KEY_BODY structure are less frequent and serve more specialized purposes.
Key control blockThe key control block object can be thought of as the heart of the Windows kernel registry tree representation. It is effectively the descriptor of a single key in the system, and the second most important key-related object after the key node. It is always allocated from the kernel pool, and serves four main purposes:
- Mirrors frequently used information from the key node to make it faster to access by the kernel code. This includes building an efficient, in-memory representation of the registry tree to optimize the traversal time when referring to registry paths.
- Works as a single point of reference for all active handles to a specific key, and helps synchronize access to the key in the multithreaded Windows environment.
- Represents any pending, transacted state of the registry key that has been introduced by a client, but not fully committed yet.
- Represents any complex relationships between registry keys that extend beyond the internal structure of the hive. The primary example are differencing hives, which are overlaid on top of each other, and whose corresponding keys form so-called key stacks.
Blog post #2 in this series highlighted the dramatic growth of the registry codebase across successive Windows versions, illustrating the subsystem's steady expansion over the last few decades. Similarly, the size of the Key Control Block (KCB) itself has nearly doubled in time, from 168 bytes in Windows XP x64 to 312 bytes in the latest Windows 11 release. This expansion underscores the increasing amount of information associated with every registry key, which the kernel must manage consistently and securely.
The KCB structure layout is present in the PDB symbols and can be displayed in WinDbg:
0: kd> dt _CM_KEY_CONTROL_BLOCK
nt!_CM_KEY_CONTROL_BLOCK
+0x000 RefCount : Uint8B
+0x008 ExtFlags : Pos 0, 16 Bits
+0x008 Freed : Pos 16, 1 Bit
+0x008 Discarded : Pos 17, 1 Bit
+0x008 HiveUnloaded : Pos 18, 1 Bit
+0x008 Decommissioned : Pos 19, 1 Bit
+0x008 SpareExtFlag : Pos 20, 1 Bit
+0x008 TotalLevels : Pos 21, 10 Bits
+0x010 KeyHash : _CM_KEY_HASH
+0x010 ConvKey : _CM_PATH_HASH
+0x018 NextHash : Ptr64 _CM_KEY_HASH
+0x020 KeyHive : Ptr64 _HHIVE
+0x028 KeyCell : Uint4B
+0x030 KcbPushlock : _EX_PUSH_LOCK
+0x038 Owner : Ptr64 _KTHREAD
+0x038 SharedCount : Int4B
+0x040 DelayedDeref : Pos 0, 1 Bit
+0x040 DelayedClose : Pos 1, 1 Bit
+0x040 Parking : Pos 2, 1 Bit
+0x041 LayerSemantics : UChar
+0x042 LayerHeight : Int2B
+0x044 Spare1 : Uint4B
+0x048 ParentKcb : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x050 NameBlock : Ptr64 _CM_NAME_CONTROL_BLOCK
+0x058 CachedSecurity : Ptr64 _CM_KEY_SECURITY_CACHE
+0x060 ValueList : _CHILD_LIST
+0x068 LinkTarget : Ptr64 _CM_KEY_CONTROL_BLOCK
+0x070 IndexHint : Ptr64 _CM_INDEX_HINT_BLOCK
+0x070 HashKey : Uint4B
+0x070 SubKeyCount : Uint4B
+0x078 KeyBodyListHead : _LIST_ENTRY
+0x078 ClonedListEntry : _LIST_ENTRY
+0x088 KeyBodyArray : [4] Ptr64 _CM_KEY_BODY
+0x0a8 KcbLastWriteTime : _LARGE_INTEGER
+0x0b0 KcbMaxNameLen : Uint2B
+0x0b2 KcbMaxValueNameLen : Uint2B
+0x0b4 KcbMaxValueDataLen : Uint4B
+0x0b8 KcbUserFlags : Pos 0, 4 Bits
+0x0b8 KcbVirtControlFlags : Pos 4, 4 Bits
+0x0b8 KcbDebug : Pos 8, 8 Bits
+0x0b8 Flags : Pos 16, 16 Bits
+0x0bc Spare3 : Uint4B
+0x0c0 LayerInfo : Ptr64 _CM_KCB_LAYER_INFO
+0x0c8 RealKeyName : Ptr64 Char
+0x0d0 KCBUoWListHead : _LIST_ENTRY
+0x0e0 DelayQueueEntry : _LIST_ENTRY
+0x0e0 Stolen : Ptr64 UChar
+0x0f0 TransKCBOwner : Ptr64 _CM_TRANS
+0x0f8 KCBLock : _CM_INTENT_LOCK
+0x108 KeyLock : _CM_INTENT_LOCK
+0x118 TransValueCache : _CHILD_LIST
+0x120 TransValueListOwner : Ptr64 _CM_TRANS
+0x128 FullKCBName : Ptr64 _UNICODE_STRING
+0x128 FullKCBNameStale : Pos 0, 1 Bit
+0x128 Reserved : Pos 1, 63 Bits
+0x130 SequenceNumber : Uint8B
I will not document each member individually, but will instead cover them in larger groups according to their common themes and functions.
Reference countKey Control Blocks are among the most frequently referenced registry objects, as almost every persistent registry operation involves an associated KCB. These blocks are referenced in various ways: by a subkey's KCB.ParentKcb pointer, a symbolic link key's KCB.LinkTarget pointer, through the global KCB tree, via open key handles (and the corresponding key bodies), in pending transacted operations (e.g., the _CM_KCB_UOW.KeyControlBlock pointer), and so on.
For system stability and security, it's crucial to accurately track all these active KCB references. This is done using the RefCount field, the first member in the KCB structure (offset 0x0). Historically a 16-bit field, it became a 32-bit integer, and on modern systems, it is a native word size—typically 64-bits on most computers. Whenever kernel code needs to operate on a KCB or store a pointer to it, it should increment the RefCount using functions from the CmpReferenceKeyControlBlock family. Conversely, when a KCB reference is no longer needed, functions like CmpDereferenceKeyControlBlock should decrement the count. When RefCount reaches zero, the kernel knows the structure is no longer in use and can safely free it.
Besides standard reference counting, KCBs employ optimizations to delay certain memory management processes. This avoids excessive KCB allocation and deallocation when a KCB is briefly unreferenced. Two mechanisms are used: delay deref and delay close. The former delays the actual refcount decrement, while the latter postpones object deallocation even after RefCount reaches zero. Callers must use the specialized function CmpDelayDerefKeyControlBlock for the delayed dereference.
From a low-level security perspective, it's worth considering potential issues related to the reference counting. Integer overflow might seem like a possibility, but it's practically impossible due to the field's width and additional overflow protection present in the CmpReferenceKeyControlBlock-like functions. A more realistic concern is a scenario where the kernel accidentally decrements the refcount by a larger value than the number of released references. This could lead to premature KCB deallocation and a use-after-free condition. Therefore, accurate KCB reference counting is a crucial area to investigate when researching Windows for registry vulnerabilities.
Basic key informationAs mentioned earlier, one of the most important types of information in the KCB is the unique identifier of the key in the hive, consisting of the _HHIVE descriptor pointer (KeyHive) and the corresponding key cell index (KeyCell). Very frequently, the kernel uses these two members to obtain the address of the key node mapping, which resembles the following pattern in the decompiled code:
_HHIVE *Hive = Kcb->KeyHive;
_CM_KEY_NODE *KeyNode = Hive->GetCellRoutine(Hive, Kcb->KeyCell);
//
// Further operations on KeyNode...
//
Cached data from the key nodeWhenever some information about a key needs to be queried based on its handle, it is generally more efficient to read it from the KCB than the key node. The reason is that a pool-based KCB access requires fewer memory fetches (it avoids the cell map walk), bypasses the context switch to the Registry process, and eliminates the potential need to page in hive data from disk. Consequently, the following types of information are cached inside KCBs:
- Key name, which is stored in a public _CM_NAME_CONTROL_BLOCK structure and pointed to by the NameBlock member. Every unique key name in the system has its own instance of the _CM_NAME_CONTROL_BLOCK object, which is reference-counted and shared across all KCBs of keys with that name. This is an optimization designed to prevent storing multiple redundant copies of the same string in kernel memory.
- Flags, stored in the Flags member and being an exact copy of the _CM_KEY_NODE.Flags value. There is also the KcbUserFlags field that caches the value of _CM_KEY_NODE.UserFlags, and KcbVirtControlFlags, which caches the value of _CM_KEY_NODE.VirtControlFlags. The semantics of all of these bitmasks were discussed in Part 5.
- Security descriptor, stored in a separate _CM_KEY_SECURITY_CACHE structure and pointed to by CachedSecurity.
- Subkey count, stored in the SubKeyCount field. It expresses the cumulative number of the key's stable and volatile subkeys, i.e. it is equal to the sum of _CM_KEY_NODE.SubKeyCounts[0] and SubKeyCounts[1].
- Value list, stored in the ValueList structure of type _CHILD_LIST, and equivalent to _CM_KEY_NODE.ValueList.
- Key limits, represented by KcbMaxNameLen, KcbMaxValueNameLen and KcbMaxValueDataLen. They correspond to the key node fields with the same names without the "Kcb" prefix.
- Fully qualified path, stored in FullKCBName. It is lazily initialized in the internal CmpConstructAndCacheName function, either when resolving a symbolic link, or as a result of calling the documented CmCallbackGetKeyObjectID API. A previously initialized path may be marked as stale by setting FullKCBNameStale (the least significant bit of the FullKCBName pointer).
It is essential for system security that the information found in KCBs is always synchronized with their key node counterparts. This is one of the most fundamental assumptions of the Windows registry implementation, and failure to guarantee it typically results in memory corruption or other severe security vulnerabilities.
Extended flagsIn addition to the flags fields that simply mirror the corresponding values from the key node, like Flags, KcbUserFlags and KcbVirtControlFlags, there is also a set of extended flags that are KCB-specific. They are stored in the following fields:
+0x008 ExtFlags : Pos 0, 16 Bits
+0x008 Freed : Pos 16, 1 Bit
+0x008 Discarded : Pos 17, 1 Bit
+0x008 HiveUnloaded : Pos 18, 1 Bit
+0x008 Decommissioned : Pos 19, 1 Bit
+0x008 SpareExtFlag : Pos 20, 1 Bit
[...]
+0x040 DelayedDeref : Pos 0, 1 Bit
+0x040 DelayedClose : Pos 1, 1 Bit
+0x040 Parking : Pos 2, 1 Bit
For the eight explicitly defined flags, here's a brief explanation:
- Freed: the KCB has been freed, but the underlying pool allocation may still be alive as part of the CmpFreeKCBListHead (older systems) or CmpKcbLookaside (Windows 10 and 11) lookaside lists.
- Discarded: the KCB has been unlinked from the global KCB tree and is not available for name-based lookups, but there may still be active references to it via open handles. It is typically set for keys that have been deleted, and for old instances of keys that have been renamed.
- HiveUnloaded: the underlying hive has been unloaded.
- Decommissioned: the KCB is no longer used (its reference count dropped to zero) and it is ready to be freed, but it hasn't been freed just yet.
- SpareExtFlag: as the name suggests, this is a spare bit that may be associated with a new flag in the future.
- DelayedDeref: the key is subject to a "delayed deref" mechanism, due to having been dereferenced using CmpDelayDerefKeyControlBlock instead of CmpDereferenceKeyControlBlock. This serves to defer the actual dereferencing of the KCB by some time, anticipating its near-future need and thus avoiding a redundant free-allocate sequence.
- DelayedClose: the key is subject to a "delayed close" mechanism, which is similar to delayed deref, but it involves delaying the freeing of a KCB structure even if its refcount has dropped to zero.
- Parking: the purpose of this bit is unclear, and it seems to be currently unused.
Last but not least, the ExtFlags member stores a further set of flags, which can be expressed as the following enum:
enum _CM_KCB_EXT_FLAGS
{
CM_KCB_NO_SUBKEY = 0x1,
CM_KCB_SUBKEY_ONE = 0x2,
CM_KCB_SUBKEY_HINT = 0x4,
CM_KCB_SYM_LINK_FOUND = 0x8,
CM_KCB_KEY_NON_EXIST = 0x10,
CM_KCB_NO_DELAY_CLOSE = 0x20,
CM_KCB_INVALID_CACHED_INFO = 0x40,
CM_KCB_READ_ONLY_KEY = 0x80,
CM_KCB_READ_ONLY_SUBKEY = 0x100,
};
Let's break it down:
- CM_KCB_NO_SUBKEY, CM_KCB_SUBKEY_ONE, CM_KCB_SUBKEY_HINT: these flags are currently obsolete, and were originally related to an old performance optimization. CM_KCB_NO_SUBKEY indicated that the key had no subkeys. CM_KCB_SUBKEY_ONE indicated that the key had exactly one subkey, and its 32-bit hint value was stored in KCB.HashKey. Finally, CM_KCB_SUBKEY_HINT indicated that the hints of all subkeys were stored in a dynamically allocated buffer pointed to by KCB.IndexHint. According to my analysis, none of the flags seem to be used in modern versions of Windows, even though their related fields in the KCB structure still exist.
- CM_KCB_SYM_LINK_FOUND: indicates that the key is a symbolic link whose target KCB has already been resolved during a previous access, and is cached in KCB.CachedChildList.RealKcb (older systems) or KCB.LinkTarget (Windows 10 and 11). It is an optimization designed to speed up the process of traversing symlinks, by performing the path lookup only once and later referring directly to the cached KCB where possible.
- CM_KCB_KEY_NON_EXIST: this is another deprecated flag that existed in historical implementations of the registry, but doesn't seem to be used anymore.
- CM_KCB_NO_DELAY_CLOSE: indicates that the key mustn't be subject to the "delayed close" mechanism, and instead should be freed as soon as all references to it are dropped.
- CM_KCB_INVALID_CACHED_INFO: this flag simply indicates that the IndexHint/HashKey/SubKeyCount fields contain out-of-date information that shouldn't be relied on.
- CM_KCB_READ_ONLY_KEY: this key is designated as read-only and, therefore, is not modifiable. The flag can be set by using the undocumented NtLockRegistryKey system call, which can only be called from kernel-mode. Shout out to James Forshaw who wrote an interesting post about it on his blog.
- CM_KCB_READ_ONLY_SUBKEY: the exact meaning and usage of the flag is unclear, but it appears to be enabled for keys with at least one descendant subkey marked as read-only. Specifically, the internal CmLockKeyForWrite function (the main routine behind NtLockRegistryKey's logic) sets it iteratively for every parent key of the read-only key, up to and including the hive's root.
To optimize access, the KCB stores the first four key body handles in the KeyBodyArray for fast, lockless access. The KeyBodyListHead field maintains the head of a doubly-linked list for any additional handles.
KCB lockThe KcbPushlock member within the KCB structure is a lock used to synchronize access to the key during various registry system calls. This lock is passed to standard kernel pushlock APIs, such as ExAcquirePushLockSharedEx, ExAcquirePushLockExclusiveEx, and ExReleasePushLockEx
Transacted stateThe key control block is central to managing the transacted state of registry keys, maintaining pending changes in memory before they are committed to the hive. Several fields within the KCB are specifically dedicated to this function:
- KCBUoWListHead: This field is a list head that anchors a list of Unit of Work (UoW) structures. Each UoW represents a specific action taken within a transaction, such as creating, deleting a key or setting or deleting a value. This list allows the system to track all pending transactional operations related to a particular key, and it is crucial for ensuring atomicity, as it records the operations that must be applied or rolled back as a single unit.
- TransKCBOwner: This field is used to identify the transaction object that "owns" the key. It is set on the KCBs of transactionally created keys, and signifies that the key is currently only visible in the context of the specific transaction. Once the transaction commits, this field is cleared, and the key becomes visible in the global registry tree.
- KCBLock and KeyLock: Two so-called intent locks of type _CM_INTENT_LOCK, which are used to ensure that no two transactions can be associated with a single key if their respective operations could invalidate each other's state. According to my understanding, KCBLock protects the consistency of the KCB in this regard, and KeyLock protects the key node. The !reg ixlock WinDbg command is designed to display the internal state of these locks.
- TransValueCache: This field is a structure that caches value entries associated with a particular KCB, if at least one of its values has been modified in an active transaction. Before a value is set, modified or deleted within a transaction for the first time, a copy of the current value list is taken and stored here. When a transaction is committed, the TransValueCache state is applied back to the key's persistent value list. On rollback, the list is simply discarded.
- TransValueListOwner: This field is a pointer to a transaction that currently "owns" the TransValueCache. At any given time, for each key, there may be at most one active transaction that has any pending operations involving the key's values.
These fields collectively form the core transaction management within the Windows Registry. Ever since their introduction in Windows Vista, they need to be correctly handled as part of every registry action, be it a read/write one, a transacted/non-transacted one etc. This is because the kernel must potentially incorporate any transacted state in any information queries, and must similarly pay attention not to allow the existence of two contradictory transactions at the same time, and not to allow a non-transacted operation to break any assumptions of an active transaction without invalidating it first. And any bugs related to managing the transacted state may have significant security implications, with some interesting examples being CVE-2023-21748 and CVE-2023-23420. The specific structures used to store the transacted state, such as _CM_TRANS or _CM_KCB_UOW, are discussed in more detail in the "Transaction structures" section below.
Layered key stateLayered keys were introduced in Windows 10 version 1607 to support containerisation through differencing hives. Because overlaying hives on top of each other is primarily a runtime concept, the Key Control Block (KCB) is the natural place to hold the state related to this feature, and there are three main members involved in this process:
- LayerSemantics: This 2-bit field indicates the state of a key within the layering system. It is an exact copy of the key's _CM_KEY_NODE.LayerSemantics value, cached in KCB for easier/quicker access. For a detailed overview of its possible values, please refer to Part 5.
- LayerHeight: This field specifies the level of the key within the differencing hive stack. A higher LayerHeight indicates that the key is higher up in the stack of layered hives, and a value of zero is used for base hives (i.e. normal non-differencing hives loaded on the host system).
- LayerInfo: This is a pointer to a _CM_KCB_LAYER_INFO structure, which describes the key's position within the stack of differencing hives. Among other things, it contains a pointer to the lower layer on the key stack, and the head of a list of layers above the current one.
The specifics of the structures associated with this functionality are discussed in the "Layered keys" section below.
KCB tree structureWhile key bodies are a common way to access KCB structures, they're not the only method. They are integral when you have an open handle to a key, as operations on the handle follow the handle → key body → KCB translation path. However, looking up keys by name or path is also crucial. Whether a key is opened or created, it relies on either an existing handle and a relative path (single subkey name or a longer path with backslash-separated names), or an absolute path starting with "\Registry\". In this scenario, the kernel needs to quickly check if a KCB exists for the given key and to obtain its address if it does. To achieve this, KCBs are organized into their own tree structure, which the kernel can traverse. The tree is rooted in CmpRegistryRootObject (specifically CmpRegistryRootObject->KeyControlBlock, as CmpRegistryRootObject itself is the key body representing the \Registry key), and mirrors the current registry layout from a high-level perspective.
Let's highlight several key points:
- KCB Existence: There's no guarantee that a corresponding KCB exists for every registry key. KCBs are allocated lazily only when a key is opened, created, or when a KCB that depends on the one being created is about to be allocated.
- Consistent KCB Tree Structure: The KCB tree structure is always consistent. If a KCB exists for a key, then KCBs for all its ancestors up to the root \Registry key must also exist.
- Cached Information in KCBs: KCBs contain cached information from the key node, plus additional runtime information that may not yet be in the hive (e.g., pending transactions). Before performing any operation on a key, it's crucial to consult its KCB.
- KCB Uniqueness: At any given time, there can be only one KCB corresponding to a specific key attached to the tree. It's possible for multiple KCBs of the same key to exist in memory, but only if some of them correspond to deleted instances, in which case they are no longer visible in the global tree (only through the handles, until they are closed). Before creating a new KCB, the kernel should always ensure that there isn't an existing one, and if there is, use it. Failing to maintain this invariant can lead to severe consequences, as illustrated by CVE-2023-23420.
- KCB Tree and Hives: The KCB tree combines key descriptors from different hives and therefore must implement support for "exit nodes" and "entry nodes", as described in the previous blog post. Both exit and entry nodes have corresponding KCBs that can be viewed and analyzed in WinDbg. Resolving transitions between exit and entry nodes generally involves reading the (_HHIVE*, root cell index) pair from the exit node and then locating and navigating to the corresponding KCB in the destination hive. To speed up this process, the kernel uses an optimization that sets the CM_KCB_SYM_LINK_FOUND flag (0x8) in the exit node's KCB and stores the entry node's KCB address in KCB.LinkTarget, simulating a resolved symbolic link and avoiding the need to look up the entry's KCB every time the key is traversed. In the diagram above, entry keys are marked in blue, exit nodes in orange, and the special connection between them by the connector with black squares.
- Key Depth: Every open key in the system has a depth in the global tree, representing the number of nesting levels separating it from the root. This value is stored in the TotalLevels field. For example, the root key \Registry has a depth of 1, and the key \Registry\Machine\Software\Microsoft\Windows has a depth of 5.
- Parent KCB Pointer: Every initialized KCB structure (whether attached to the tree or not) contains a pointer to its parent KCB in the ParentKcb field. The only exception is the global root \Registry, for which this pointer is NULL.
Now that we understand how the KCB tree works conceptually, let's examine how it is represented in memory. Interestingly, the KCB structure itself doesn't store a list of its subkeys. Instead, it relies on a simple 32-bit hash of the text string for fast lookups by name. The hash is calculated by multiplying successive characters of the string by powers of 37, where the first character is multiplied by the highest power and the last by the lowest (370, which is 1). This allows for a straightforward iterative implementation, shown below in C code:
uint32_t HashString(const std::string& str) {
uint32_t hash = 0;
for (size_t i = 0; i < str.size(); i++) {
hash = hash * 37 + toupper(str[i]);
}
return hash;
}
Some example outputs of the algorithm are:
HashString("Microsoft") = 0x7f00cd26
HashString("Windows") = 0x2f7de68b
HashString("CurrentVersion") = 0x7e25f69d
To calculate the hash of a path with multiple components, the same algorithm steps are repeated. However, in this case, the hashes of the successive path parts are treated similarly to the letters in the previous example. Therefore, the following formula is used to calculate the hash of the full "Microsoft\Windows\CurrentVersion" path:
0x7f00cd26 × 372 + 0x2f7de68b × 371 + 0x7e25f69d × 370 = 0x86a158ea
The hash value calculated for each key, based on its path relative to the hive's root, is stored in KCB.ConvKey.Hash. Consequently, the hash value for the standard system key HKLM\Software\Microsoft\Windows\CurrentVersion is 0x86a158ea.
Every hive has a directory of the KCBs within it, structured as a hashmap with a fixed number of buckets. Each bucket comprises a linked list of the KCBs located there. Internally, this directory is referred to as the "KCB cache" and is represented by the following two fields in the _CMHIVE structure:
+0x670 KcbCacheTable : Ptr64 _CM_KEY_HASH_TABLE_ENTRY
+0x678 KcbCacheTableSize : Uint4B
KcbCacheTable is a pointer to a dynamically allocated array of _CM_KEY_HASH_TABLE_ENTRY structures, and KcbCacheTableSize specifies the number of buckets (i.e., the number of elements in the KcbCacheTable array). In practice, the size of this KCB cache is 128 buckets for the virtual \Registry hive, 512 for the vast majority of hives loaded in the system, and 1024 for two specific system hives: HKLM\Software and HKLM\System. Given a specific key with a name hash denoted as ConvKey, its KCB can be found in the cache bucket indexed as follows:
TmpHash = 101027 * (ConvKey ^ (ConvKey >> 9));
CacheIndex = (TmpHash ^ (TmpHash >> 9)) & (Hive->KcbCacheTableSize - 1);
//
// Kcb can be found in Hive->KcbCacheTable[CacheIndex]
//
The operation of translating a key's path hash to its KCB cache table index (excluding the modulo KcbCacheTableSize step) is called "finalization". There's even a WinDbg helper command that can perform this action for us: !reg finalize. We can test it on the hash we calculated for the "Microsoft\Windows\CurrentVersion" path:
0: kd> !reg finalize 0x86a158ea
Finalized Hash for Hash=0x86a158ea: 0xc2c65312
So, the finalized hash is 0xc2c65312, and since the KCB cache hive size of the SOFTWARE hive is 1024, this means that the index of the HKLM\Software\Microsoft\Windows\CurrentVersion key in the array will be the lowest 10 bits, or 0x312. We can verify that our calculations are correct by finding the SOFTWARE hive in memory and listing the keys located in its individual buckets:
0: kd> !reg hivelist
ah...
| ffffe10d2dad4000 | 4da2000 | ffffe10d2da78000 | 3a6000 | ffffe10d3489f000 | ffffe10d2d8ff000 | emRoot\System32\Config\SOFTWARE
...
0: kd> !reg openkeys ffffe10d2dad4000
...
Index 312: 86a158ea kcb=ffffe10d2d576a30 cell=000a58e8 f=00200000 \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION
...
As we can see, our calculations have been proven to be accurate. We could achieve a similar result with the !reg hashindex command, which takes the address of the _HHIVE object and the ConvKey for a given key, and then prints out information about the corresponding bucket.
Within a single bucket in the KCB cache, all the KCBs are linked together in a singly-linked list starting at the _CM_KEY_HASH_TABLE_ENTRY.Entry pointer. The subsequent elements are accessible through the _CM_KEY_HASH.NextHash field, which points to the KCB.KeyHash structure in the next KCB on the list. A diagram of this data structure is shown below:
Now that we understand how the KCB objects are internally organized, let's examine how name lookups are implemented. Suppose we want to take a single step through a path and find the KCB of the next subkey based on its parent KCB and the key name. The process is as follows (assuming the parent is not an exit node):
- Get the pointer to the hive descriptor on which we are currently operating from ParentKcb->KeyHive.
- Calculate the hash of the subkey name based on its full path relative to the hive in which it is located.
- Calculate the appropriate index in the KCB cache based on the name hash and iterate through the linked list, comparing:
- The hash of the key name.
- The pointer to the parent KCB.
- If both of the above match, perform a full comparison of the key name. If it matches, we have found the subkey.
The process is particularly interesting because it is not based on directly iterating through the subkeys of a given key, but instead on iterating through all the keys in the particular cache bucket. Thanks to the use of hashing, the vast majority of checks of potential candidates for the sought-after subkey are reduced to a single comparison of two 32-bit numbers, making the whole process quite efficient. The performance is mostly dependent on the total number of keys in the hive and the number of hash collisions for the specific cache index.
If you'd like to dive deeper into the implementation of KCB tree traversal, I recommend analyzing the internal function CmpFindKcbInHashEntryByName, which performs a single step through the tree as described above. Another useful function to analyze is CmpPerformCompleteKcbCacheLookup, which recursively searches the tree to find the deepest KCB object corresponding to one of the elements of a given path.
For those experimenting in WinDbg, here are a few useful commands related to KCBs and their trees:
- !reg findkcb: This command finds the address of the KCB in the global tree that corresponds to the given fully qualified registry path, if it exists.
- !reg querykey: Similar to the command above, but in addition to providing the KCB address, it also prints the hive descriptor address, the corresponding key node address, and information about subkeys and values of the given key.
- !reg kcb: This command prints basic information about a key based on its KCB. Its advantage is that it translates flag names into their textual equivalents (e.g., CompressedName, NoDelete, HiveEntry, etc.), but it often doesn't provide the specific information one is looking for. In that case, it might be necessary to use the dt _CM_KEY_CONTROL_BLOCK command to dump the entire structure.
So far, this blog post has described only a few of the most important registry structures, which are essential to know for anyone conducting research in this area. However, in total, there are over 150 different structures used in the Windows kernel and related to the registry, and only about half are documented through debug symbols or on Microsoft's website. While it's impossible to detail the operation and function of all of these structures in one article, this section aims to at least provide an overview of a majority of them, to note which of them are publicly available, and to briefly describe how they are used internally.
The layout of many structures corresponding to the most complex mechanisms is publicly unknown at the time of writing and requires significant time and energy to reconstruct. Even then, the correct meaning of each field and flag cannot be guaranteed. Therefore, the information below should be used with caution and verified against the specific Windows version(s) in question before relying on it in any way.
Key opening/creationIn PDB
Structure name
Description
❌
Parse context
Given that the registry is integrated with the standard Windows object model, all operations on registry paths (both absolute and relative) must be performed through the standard NT Object Manager interface.
For example, the NtCreateKey syscall calls the CmCreateKey helper function. At this point, there are no further calls to Configuration Manager, but instead, there is a call to ObOpenObjectByNameEx (a more advanced version of ObOpenObjectByName). Several levels down, the kernel will transfer execution back to the registry code, specifically to the CmpParseKey callback, which is the entry point responsible for handling all path operations (i.e., all key open/create actions). This means that the CmCreateKey and CmpParseKey functions, which work together, cannot pass an arbitrary number of input and output arguments to each other. They only have one pointer (ParseContext) at their disposal, which can serve as a communication channel. Thus, the agreement between these functions is that the pointer points to a special "parse context" structure, which has three main roles:
- Pass the input configuration of a given operation, e.g. information about:
- operation mode (open/create),
- transactionality of the operation,
- following of symbolic links,
- flags related to WOW64 functionality,
- optional class data of the created key.
- Pass some return information, such as whether the key was opened or created,
- Cache certain information within a single "parse" request, e.g.:
- information on whether registry virtualization is enabled for a given process,
- when following a symbolic link, a pointer to the originating hive descriptor, in order to check whether the given transition is allowed within the hive trust class,
- when following a symbolic link, a pointer to the KCB of its target (or the closest possible ancestor).
Reconstructing the layout of this structure is a critical step in getting a better understanding of how the key opening/creation process works internally.
❌
Path info
When a client references a key by name, one of the first actions taken by the CmpParseKey function (or more specifically, CmpDoParseKey) is to take the string representing that name (absolute or relative), break it into individual parts separated by backslashes, and calculate the 32-bit hashes for each of them. This ensures that parsing only occurs once and doesn't need to be repeated. The structure where the result of this operation is stored is called "path info".
According to the documentation, a single registry path reference can contain a maximum of 32 levels of nesting. Therefore, the path info structure allows for the storage of 32 elements, in the following way: the first 8 elements being present directly within the structure, and if the path is deeply nested, an additional 24 elements within a supplementary structure allocated on-demand from kernel pools. The functions that operate on this object are CmpComputeComponentHashes, CmpExpandPathInfo, CmpValidateComponents, CmpGetComponentNameAtIndex, CmpGetComponentHashAtIndex, and CmpCleanupPathInfo.
Interestingly, I discovered an off-by-one bug in the CmpComputeComponentHashes function, which allows an attacker to write 25 values into a 24-element array. However, due to a fortunate coincidence, path info structures are allocated from a special lookaside list with allocation sizes significantly larger than the length of the structure itself. As a result, this buffer overflow is not exploitable in practice, which has also been confirmed by Microsoft. More information about this issue, as well as the reversed definition of this structure, can be found in my original report.
Key notificationsIn PDB
Structure name
Description
✅
_CM_NOTIFY_BLOCK
The first time RegNotifyChangeKeyValue or the underlying NtNotifyChangeMultipleKeys syscall is called on a given handle, a notify block structure is assigned to the corresponding key body object. This structure serves as the central control point for all notification requests made on that handle in the future. It also stores the configuration defined in the initial API call, which, once set, cannot be changed without closing and reopening the key. This is in line with the official MSDN documentation:
"This function should not be called multiple times with the same value for the hKey but different values for the bWatchSubtree and dwNotifyFilter parameters. The function will succeed but the changes will be ignored. To change the watch parameters, you must first close the key handle by calling RegCloseKey, reopen the key handle by calling RegOpenKeyEx, and then call RegNotifyChangeKeyValue with the new parameters."
The !reg notifylist command in WinDbg can list all active notify blocks in the system, allowing you to check which keys are currently being monitored for changes.
❌
Post block
Each post block object corresponds to a single wait for changes to a given key. Many post block objects can be assigned to one notify block object at the same time. The network of relationships in this structure becomes even more complex when using the NtNotifyChangeMultipleKeys syscall with a non-empty SubordinateObjects argument, in which case two separate post blocks share a third data structure (the so-called post block union). However, the details of this topic are beyond the scope of this post.
The WinDbg !reg postblocklist command allows you to see how many active post blocks are assigned to each process/thread, but unfortunately, it does not show any detailed information about their contents.
Registry callbacksIn PDB
Structure name
Description
✅
REG_*_INFORMATION
These structures are used for supplying callbacks with precise information about operations performed on the registry, and are part of the documented Windows interface. Consequently, not only their definitions but also detailed descriptions of the meaning of each field are published directly by Microsoft. A complete list of these structures can be found on MSDN, e.g., on the EX_CALLBACK_FUNCTION callback function (wdm.h) page.
However, I have found in my research that in addition to the official registry callback interface, there is also a less official extension that Microsoft uses internally in VRegDriver, the module that supports differencing hives. If a given client, instead of using the official CmRegisterCallbackEx function, calls the internal CmpRegisterCallbackInternal function with the fifth argument set to 1, this callback will be internally marked as "extended". Extended callbacks, in addition to the information provided by the standard structures, also receive a handful of additional information related to differencing hives and layered keys. At the time of writing, the differences occur in the structures representing the RegNtPreLoadKey, RegNtPreCreateKeyEx, RegNtPreOpenKeyEx actions and their "post" counterparts.
❌
Callback descriptor
The structure represents a single registry callback registered through the CmRegisterCallback or CmRegisterCallbackEx API. Once allocated, it is attached to a double-linked list represented by the global CallbackListHead object.
❌
Object context descriptor
A descriptor structure for a key body-specific context that can be assigned through the CmSetCallbackObjectContext API. This descriptor is then inserted into a linked list that starts at _CM_KEY_BODY.ContextListHead.
❌
Callback context
An internal structure used in the CmpCallCallBacksEx function to store the current state during the callback invocation process. For example, it's used to invoke the appropriate "post" type callbacks in case of an error in one of the "pre" type callbacks. These objects are freed by the dedicated CmpFreeCallbackContext function, which additionally caches a certain number of allocations in the global CmpCallbackContextSList list. This allows future requests for objects of this type to be quickly fulfilled.
Registry virtualizationIn PDB
Structure name
Description
❌
Replication stack
A core task of registry virtualization is the replication of keys, which involves creating an identical copy of a given key structure. This occurs under the path HKU\<SID>_Classes\VirtualStore when an application, subject to virtualization, attempts to create a key in a location where it lacks proper permissions. The entire operation is coordinated by the CmpReplicateKeyToVirtual function and consists of two main stages. First, a "replication stack" object is created and initialized in the CmpBuildVirtualReplicationStack function. This object specifies the precise key structure to be created within the virtualization process. Second, the actual creation of these keys based on this object occurs within the CmpDoBuildVirtualStack function.
TransactionsIn PDB
Structure name
Description
✅
_KTRANSACTION
A structure corresponding to a KTM transaction object, which is created by the CreateTransaction function or its low-level equivalent NtCreateTransaction.
❌
Lightweight transaction object
A direct counterpart of _KTRANSACTION, but for lightweight transactions, created by the NtCreateRegistryTransaction system call. It is very simple and only consists of a bitmask of the current transaction state, a push lock for synchronization, and a pointer to the corresponding _CM_TRANS object.
✅
_CM_KCB_UOW
The structure represents a single, active transactional operation linked to a specific key. In some scenarios, one logical operation corresponds to one such object (e.g., the UoWSetSecurityDescriptor type). In other cases, multiple UoWs are created for a single operation (e.g., UoWAddThisKey assigned to a newly created key, and UoWAddChildKey assigned to its parent).
This critical structure has multiple functions. The key ones are connecting to KCB intent locks and keeping any pending state related to a given operation, both before and during the transaction commit phase.
✅
_CM_UOW_*
Auxiliary sub-structures of _CM_KCB_UOW, which store information about the temporary state of the registry associated with a specific type of transactional operation. Specifically, the four structures are: _CM_UOW_KEY_STATE_MODIFICATION, _CM_UOW_SET_SD_DATA, _CM_UOW_SET_VALUE_KEY_DATA and _CM_UOW_SET_VALUE_LIST_DATA.
✅
_CM_TRANS
A descriptor of a specific registry transaction, usually associated with a particular hive. In special cases, if operations are performed on multiple hives within a single transaction, then multiple _CM_TRANS objects may exist for it. Given the address of the _CM_TRANS object, it is possible to list all operations associated with this transaction in WinDbg using the !reg uowlist command.
✅
_CM_RM
A descriptor of a specific resource manager. It only exists if the given hive has KTM transactions enabled, and never exists for app hives or hives loaded with the REG_HIVE_NO_RM flag.
Think of this structure as being associated with one set of .blf / .regtrans-ms log files, which usually means one _CM_RM structure is assigned to one hive. The exception is system hives (e.g. SOFTWARE, SYSTEM etc.) which all share the same resource manager that exists under the CmRmSystem global variable.
Given the address of a _CM_RM object in WinDbg, you can list all associated transactions using the !reg translist command.
✅
_CM_INTENT_LOCK
This structure represents an intent lock, with two instances (KCBLock and KeyLock) residing in the KCB. Their primary function is to ensure key consistency by preventing the assignment of two different transactions that contain conflicting modifications of a key. Given the object's address, WinDbg's !reg ixlock command can display some details about it.
❌
Serialized log records
KTM transacted registry operations are logged to .blf files on disk to enable consistent state restoration in case of unexpected shutdown during transaction commit. The CmAddLogForAction function serializes the _CM_KCB_UOW object into a flat buffer and writes it to the log file using the CLFS interface. While the _CM_KCB_UOW structure can be found in public symbols, their corresponding serialized representations cannot. Notably, there was an information disclosure vulnerability (CVE-2023-28271) that was directly related to these structures.
❌
Rollback packet
When a client performs a non-transactional operation that modifies a key, and there's an active transaction associated with that key, the transaction must be rolled back before the operation can be executed to prevent an inconsistent state. This is achieved using a structure that contains a list of transactions to be rolled back. This structure is passed to the CmpAbortRollbackPacket function, which carries out the rollback. Although the official layout of this structure is unknown, in practice it is quite simple, consisting of three fields: the current capacity, the current fill level of the list, and a pointer to a dynamically allocated array of transactions.
Differencing hives (VRegDriver)In PDB
Structure name
Description
❌
IOCTL input structures
The VRegDriver module works by creating the \Device\VRegDriver device, and communicates with its clients by supporting nine distinct IOCTLs within the corresponding VrpIoctlDeviceDispatch handler function. These IOCTLs, exclusively accessible to administrator users, facilitate loading and unloading differencing hives, configuring registry redirections for specific containers, and a few other operations. Each IOCTL requires a specific input data structure, none of which are officially documented. Therefore, practical use of this interface necessitates reverse engineering the required structures to understand their initialization. An example of a reversed structure, corresponding to IOCTL 0x220008 and provisionally named VRP_LOAD_DIFFERENCING_HIVE_INPUT, was showcased in blog post #4. This enabled the creation of a proof-of-concept exploit for a differencing hive vulnerability (CVE-2023-36404), demonstrating the ability to load custom hives and, consequently, expose the flaw.
❌
Silo context
This silo-specific context structure is set by the VRegDriver during silo initialization using the PsInsertPermanentSiloContext function. It is later retrieved by PsGetPermanentSiloContext and used during both IOCTL handling and path translation for containerized processes. A brief analysis suggests that it primarily contains the GUID of the associated silo, a push lock used for synchronization, and a user-configured list of namespaces for the given container, which is a set of source and target paths between which redirection should occur.
❌
Key context
This structure stores the context specific to a particular key being subject to path translation within a silo. It is usually allocated for each key opened within the context of a containerized process, and assigned to its key body using the CmSetCallbackObjectContext API. It primarily stores the original path of the key before translation — as the client believes it has access to — and several other auxiliary fields.
❌
Callback context (open/create)
The callback-specific context structure stores shared data between "pre" and "post" callbacks for a given operation. This context is generally accessed through the CallContext field within the REG_*_INFORMATION structure relevant to the specific operation. In practice, VRegDriver only has one instance of a special structure defined for this purpose, used when handling the RegNtPreCreateKeyEx/RegNtPreOpenKeyEx callbacks. It saves specific data (RootObject, CompleteName, RemainingName) before the open/create request, to restore their original values in the "post" callback.
❌
Extra parameter
This structure also appears to be used for temporarily storing the original key path during translation. However, its scope encompasses the entire key creation/opening process, rather than just a single callback. This means it can store information across callbacks, even when symbolic links or write-through hives are encountered during path traversal, causing the CmpParseKey function to return STATUS_REPARSE or STATUS_REPARSE_GLOBAL and restart the path lookup process. Although the concept of a whole operation context seems broadly applicable, currently there is only one type of "extra parameter" being used, represented by the GUID VRP_ORIGINAL_KEY_NAME_PARAMETER_GUID {85b8669a-cfbb-4ac0-b689-6daabfe57722}.
Layered keysIn PDB
Structure name
Description
✅
_CM_KCB_LAYER_INFO
This is likely the only structure related to layered keys whose definition is public. It is part of every KCB and contains information about the placement of the key in the global, "vertical" tree of layered key instances. In practice, this means that it stores a pointer to the KCB at one level lower (its parent, so to speak), and the head of a linked list with KCBs at one level higher (KCB.LayerHeight+1), if any exist.
❌
Key node stack
A stack containing all instances of a given layered key, starting from its level all the way down to level zero (the base key). Each key in this structure is represented by a (Hive, KeyCell) pair. If the key actually exists at a given level (KeyCell ≠ -1, indicating a state other than Merge-Unbacked), it is also represented by a direct, resolved pointer to its _CM_KEY_NODE structure.
Since Windows 10 introduced support for layered keys, many places in the code that previously identified a single key as _CM_KEY_NODE* now require passing the entire key node stack structure. This is because operations on layered keys usually require knowledge of the state of lower level keys (e.g. their layered semantics, subkeys, values), not just the key represented by the handle used by the caller.
Places where the key node stack structure is used can be identified by calls to its related helper functions, such as those for initialization (CmpInitializeKeyNodeStack) and cleanup (CmpCleanupKeyNodeStack), as well as any others containing the string "KeyNodeStack".
❌
KCB stack
This structure, analogous to the key node stack, represents keys using KCBs. Its use is most clearly revealed by references to the CmpStartKcbStack and CmpStartKcbStackForTopLayerKcb functions in code, though many other internal routines with "KcbStack" in their names also operate on it.
Both the KCB stack and the key node stack share an optimization where the first two levels are stored inline, with additional levels allocated in kernel pools only when necessary. This is likely due to the fact that most systems, even those with layered keys, typically only use one level of nesting (two levels total). Thus, this optimization avoids costly memory allocation and deallocation in these common scenarios.
❌
Enum stack
This data structure allows for the enumeration of subkeys within a given layered key. Its primary use is within the CmpEnumerateLayeredKey function, which serves as the handler for the NtEnumerateKey operation specifically for layered keys. At an even higher level, this corresponds to the RegEnumKeyExW API function. The complexity of this structure is evident by the fact that there are 19 internal helper functions, all starting with the name CmpKeyEnumStack, that operate on it.
❌
Enum resume context
This data structure, directly tied to the subkey enumeration, primarily serves as an optimization mechanism. After executing a specific number (N) of enumeration steps, it stores the internal state of the enum stack. This allows subsequent requests for subkey N+1 to resume the enumeration process from the previous point, bypassing the need to repeat the initial steps. Linked to a specific handle, it is stored within _CM_KEY_BODY.EnumerationResumeContext.
The KCB.SequenceNumber field, directly related to this structure, monitors whether a given key has significantly changed since a previous point in time. This enables the CmpKeyEnumStackVerifyResumeContext helper function to determine if the current registry state is consistent enough for the existing enumeration resume context to be used for further enumeration, or if the entire process needs to be restarted.
❌
Value enum stack
This data structure, used to enumerate values for layered keys, is similarly complex as those used to list subkeys. The main function utilizing it is CmEnumerateValueFromLayeredKey. Additionally, there are 10 helper functions named CmpValueEnumStack[...] that operate on this structure.
❌
Sorted value enum stack
The structure is similar to the standard value enum stack, but is used to iterate over the values of a given layered key while preserving lexicographical order. Helper functions from the CmpSortedValueEnumStack[...] family (9 in total) correspond to this structure. This functionality is used exclusively in the CmpGetValueCountForKeyNodeStack function, which is responsible for returning the number of values for a given key.
The reason for the existence of this mechanism in parallel with the regular "value enum stack" is not entirely clear, but I suspect it serves as an optimization for value counting operations. This is supported by the fact that while layered keys first appeared in Windows 10 1607 (Redstone, build 14393), the sorted value enum stack was not introduced until the later version of Windows 10 1703 (Redstone 2, build 15063). In the first iteration of the layered key implementation, CmpGetValueCountForKeyNodeStack was implemented using the standard value enum stack. This lends credibility to the hypothesis that these mechanisms are functionally equivalent, but the "sorted" version is faster at counting unique values when direct access to them is not required.
❌
Subtree enumerator
This structure enables the enumeration of both the direct subkeys of a layered key and all its deeper descendants. It is relatively complex, and its associated functions begin with CmpSubtreeEnumerator[...] (also 9 in total). This mechanism is primarily needed to implement the "rename" operation on layered keys. First, it allows verification that the caller has KEY_READ and DELETE permissions for all descendant keys in the subtree, and second, it enables setting the LayerSemantics value for these descendants to Supersede-Tree (0x3).
❌
Discard/replace context
This data structure is employed during key deletion to ensure that KCB structures corresponding to higher-level Merge-Unbacked keys reliant on the deleted key are also marked as deleted. Subsequently, "fresh" KCB objects representing the non-existent key are inserted into the tree in their place. The two primary functions associated with this mechanism are CmpPrepareDiscardAndReplaceKcbAndUnbackedHigherLayers and CmpCommitDiscardAndReplaceKcbAndUnbackedHigherLayers.
ConclusionThe goal of this post was to provide a thorough overview of the structures used in the Configuration Manager subsystem in Windows, with particular emphasis on the most important and frequently used ones, i.e. those describing hives and keys. I wanted to share this knowledge because there are not many publicly available sources that accurately describe the registry's operation from the implementation side, especially relevant to the most recent code developments in Windows 10 and 11. I would also like to once again use this opportunity to appeal to Microsoft to make more information available through public PDB symbols – this would greatly facilitate the work of security researchers in the future.
This post concludes the part of the series focusing solely on the inner workings of the registry. In the next, seventh installment, we will shift our perspective and examine the registry's role in the overall security of the system, with a deep focus on vulnerability research. Stay tuned!
Blasting Past Webp
An analysis of the NSO BLASTPASS iMessage exploit
Posted by Ian Beer, Google Project Zero
On September 7, 2023 Apple issued an out-of-band security update for iOS:
Around the same time on September 7th 2023, Citizen Lab published a blog post linking the two CVEs fixed in iOS 16.6.1 to an "NSO Group Zero-Click, Zero-Day exploit captured in the wild":
"[The target was] an individual employed by a Washington DC-based civil society organization with international offices...
The exploit chain was capable of compromising iPhones running the latest version of iOS (16.6) without any interaction from the victim.
The exploit involved PassKit attachments containing malicious images sent from an attacker iMessage account to the victim."
The day before, on September 6th 2023, Apple reported a vulnerability to the WebP project, indicating in the report that they planned to ship a custom fix for Apple customers the next day.
The WebP team posted their first proposed fix in the public git repo the next day, and five days after that on September 12th Google released a new Chrome stable release containing the WebP fix. Both Apple and Google marked the issue as exploited in the wild, alerting other integrators of WebP that they should rapidly integrate the fix as well as causing the security research community to take a closer look...
A couple of weeks later on September 21st 2023, former Project Zero team lead Ben Hawkes (in collaboration with @mistymntncop) published the first detailed writeup of the root cause of the vulnerability on the Isosceles Blog. A couple of months later, on November 3rd, a group called Dark Navy published their first blog post: a two-part analysis (Part 1 - Part 2) of the WebP vulnerability and a proof-of-concept exploit targeting Chrome (CVE-2023-4863).
Whilst the Isosceles and Dark Navy posts explained the underlying memory corruption vulnerability in great detail, they were unable to solve another fascinating part of the puzzle: just how exactly do you land an exploit for this vulnerability in a one-shot, zero-click setup? As we'll soon see, the corruption primitive is very limited. Without access to the samples it was almost impossible to know.
In mid-November, in collaboration with Amnesty International Security Lab, I was able to obtain a number of BLASTPASS PKPass sample files as well as crash logs from failed exploit attempts.
This blog post covers my analysis of those samples and the journey to figure out how one of NSO's recent zero-click iOS exploits really worked. For me that journey began by immediately taking three months of paternity leave, and resumed in March 2024 where this story begins:
Setting the sceneFor a detailed analysis of the root-cause of the WebP vulnerability and the primitive it yields, I recommend first reading the three blog posts I mentioned earlier (Isosceles, Dark Navy 1, Dark Navy 2.) I won't restate their analyses here (both because you should read their original work, and because it's quite complicated!) Instead I'll briefly discuss WebP and the corruption primitive the vulnerability yields.
WebPWebP is a relatively modern image file format, first released in 2010. In reality WebP is actually two completely distinct image formats: a lossy format based on the VP8 video codec and a separate lossless format. The two formats share nothing apart from both using a RIFF container and the string WEBP for the first chunk name. From that point on (12 bytes into the file) they are completely different. The vulnerability is in the lossless format, with the RIFF chunk name VP8L.
Lossless WebP makes extensive use of Huffman coding; there are at least 10 huffman trees present in the BLASTPASS sample. In the file they're stored as canonical huffman trees, meaning that only the code lengths are retained. At decompression time those lengths are converted directly into a two-level huffman decoding table, with the five largest tables all getting squeezed together into the same pre-allocated buffer. The (it turns out not quite) maximum size of these tables is pre-computed based on the number of symbols they encode. If you're up to this part and you're slightly lost, the other three blogposts referenced above explain this in detail.
With control over the symbol lengths it's possible to define all sorts of strange trees, many of which aren't valid. The fundamental issue was that the WebP code only checked the validity of the tree after building the decoding table. But the pre-computed size of the decoding table was only correct for valid trees.
As the Isosceles blog post points out, this means that a fundamental part of the vulnerability is that triggering the bug is detected, though after memory has been corrupted, and image parsing stops only a few lines of code later. This presents another exploitation mystery: in a zero-click context, how do you exploit a bug where every time the issue is triggered it also stops parsing any attacker-controlled data?
The second mystery involves the actual corruption primitive. The vulnerability will write a HuffmanCode structure at a known offset past the end of the huffman tables buffer:
// Huffman lookup table entry
typedef struct {
uint8_t bits;
uint16_t value;
} HuffmanCode;
As DarkNavy point out, whilst the bits and value fields are nominally attacker-controlled, in reality there isn't that much flexibility. The fifth huffman table (the one at the end of the preallocated buffer, part of which can get written out-of-bounds) only has 40 symbols, limiting value to a maximum value of 39 (0x27) and bits will be between 1 and 7 (for a second-level table entry). There's a padding byte between bits and value which makes the largest value that could be written out-of-bounds 0x00270007. And it just so happens that that's exactly the value which the exploit does write — and they likely didn't have that much choice about it.
There's also not much flexibility in the huffman table allocation size. The table allocation in the exploit is 12072 (0x2F28) bytes, which will get rounded up to fit within a 0x3000 byte libmalloc small region. The code lengths are chosen such that the overflow occurs like this:
To summarize: The 32-bit value 0x270007 will be written 0x58 bytes past the end of a 0x3000 byte huffman table allocation. And then WebP parsing will fail, and the decoder will bail out.
Déjà vu?Long-term readers of the Project Zero blog might be experiencing a sense of déjà vu at this point... haven't I already written a blog post about an NSO zero-click iPhone zero day exploiting a vulnerability in a slightly obscure lossless compression format used in an image parsed from an iMessage attachment?
BLASTPASS has many similarities with FORCEDENTRY, and my initial hunch (which turned out to be completely wrong) was that this exploit might take a similar approach to build a weird machine using some fancier WebP features. To that end I started out by writing a WebP parser to see what features were actually used.
TransformationIn a very similar fashion to JBIG2, WebP also supports invertible transformations on the input pixel data:
My initial theory was that the exploit might operate in a similar fashion to FORCEDENTRY and apply sequences of these transformations outside of the bounds of the image buffer to build a weird machine. But after implementing enough of the WebP format in python to parse every bit of the VP8L chunk it became pretty clear that it was only triggering the Huffman table overflow and nothing more. The VP8L chunk was only 1052 bytes, and pretty much all of it was the 10 Huffman tables needed to trigger the overflow.
What's in a pass?Although BLASTPASS is often referred to as an exploit for "the WebP vulnerability", the attackers don't actually just send a WebP file (even though that is supported in iMessage). They send a PassKit PKPass file, which contains a WebP. There must be a reason for this. So let's step back and actually take a look at one of the sample files I received:
171K sample.pkpass
$ file sample.pkpass
sample.pkpass: Zip archive data, at least v2.0 to extract, compression method=deflate
There are five files inside the PKPass zip archive:
60K background.png
5.5M logo.png
175B manifest.json
18B pass.json
3.3K signature
The 5.5MB logo.png is the WebP image, just with a .png extension instead of .webp:
$ file logo.png:
logo.png: RIFF (little-endian) data, Web/P image
The closest thing to a specification for the PKPass format appears to be the Wallet Developer Guide, and whilst it doesn't explicitly state that the .png files should actually be Portable Network Graphics images, that's presumably the intention. This is yet another parallel with FORCEDENTRY, where a similar trick was used to reach the PDF parser when attempting to parse a GIF.
PKPass files require a valid signature which is contained in manifest.json and signature. The signature has a presumably fake name and more timestamps indicating that the PKPass is very likely being generated and signed on the fly for each exploit attempt.
pass.json is just this:
{"pass": "PKpass"}
Finally background.png:
$ file background.png
background.png: TIFF image data, big-endian, direntries=15, height=16, bps=0, compression=deflate, PhotometricIntepretation=RGB, orientation=upper-left, width=48
Curious. Another file with a misleading extension; this time a TIFF file with a .png extension.
We'll return to this TIFF later in the analysis as it plays a critical role in the exploit flow, but for now we'll focus on the WebP, with one short diversion:
BlastdoorSo far I've only mentioned the WebP vulnerability, but the Apple advisory I linked at the start of this post mentions two separate CVEs:
The first, CVE-2023-41064 in ImageIO, is the WebP bug (though just to keep things confusing with a different CVE from the upstream WebP fix which is CVE-2023-4863 - they're the same vulnerability though).
The second, CVE-2023-41061 in "Wallet", is described in the Apple advisory as: "A maliciously crafted attachment may result in arbitrary code execution".
The Isosceles blog post hypothesises:
"Citizen Lab called this attack "BLASTPASS", since the attackers found a clever way to bypass the "BlastDoor" iMessage sandbox. We don't have the full technical details, but it looks like by bundling an image exploit in a PassKit attachment, the malicious image would be processed in a different, unsandboxed process. This corresponds to the first CVE that Apple released, CVE-2023-41061."
This theory makes sense — FORCEDENTRY had a similar trick where the JBIG2 bug was actually exploited inside IMTranscoderAgent instead of the more restrictive sandbox of BlastDoor. But in all my experimentation, as well as all the in-the-wild crash logs I've seen, this hypothesis doesn't seem to hold.
The PKPass file and the images enclosed within do get parsed inside the BlastDoor sandbox and that's where the crashes occur or the payload executes — later on we'll also see evidence that the NSExpression payload which eventually gets evaluated expects to be running inside BlastDoor.
My guess is that CVE-2023-41061 is more likely referring to the lax parsing of PKPasses which didn't reject images which weren't png's.
In late 2024, I received another set of in-the-wild crash logs including two which do in fact strongly indicate that there was also a path to hit the WebP vulnerability in the MobileSMS process, outside the BlastDoor sandbox! Interestingly, the timestamps indicate that these devices were targeted in November 2023, two months after the vulnerability was patched.
In those cases the WebP code was reached inside the MobileSMS process via a ChatKit CKPassPreviewMediaObject created by a CKAttachmentMessagePartChatItem.
What's in a WebP?I mentioned that the VP8L chunk in the WebP file is only around 1KB. Yet in the file listing above the WebP file is 5.5MB! So what's in the rest of it? Expanding out my WebP parser we see that there's one more RIFF chunk:
EXIF : 0x586bb8
exif is Intel byte alignment
EXIF has n_entries=1
tag=8769 fmt=4 n_components=1 data=1a
subIFD has n_entries=1
tag=927c fmt=7 n_components=586b8c data=2c
It's a (really really huge) EXIF - the standard format which cameras use to store image metadata — stuff like the camera model, exposure time, f-stop etc.
It's a tag-based format and pretty much all 5.5MB is inside one tag with the id 0x927c. So what's that?
Looking through an online list of EXIF tags just below the lens FocalLength tag and above the UserComment tag we spot 0x927c:
It's the very-vague-yet-fascinating sounding: "MakerNote - Manufacturer specific information."
Looking to Wikipedia for some clarification on what that actually is, we learn that
"the "MakerNote" tag contains information normally in a proprietary binary format."
Modifying the webp parser to now dump out the MakerNote tag we see:
$ file sample.makernote
sample.makernote: Apple binary property list
Apple's chosen format for the "proprietary binary format" is binary plist!
And indeed: looking through the ImageIO library in IDA there's a clear path between the WebP parser, the EXIF parser, the MakerNote parser and the binary plist parser.
unbplistingI covered the binary plist format in a previous blog post. That was the second time I'd had to analyse a large bplist. The first time (for the FORCEDENTRY sandbox escape) it was possible mostly by hand, just using the human-readable output of plutil. Last year, for the Safari sandbox escape analysis, the bplist was 437KB and I had to write a custom bplist parser to figure out what was going on. Keeping the exponential curve going this year the bplist was 10x larger again.
In this case it's fairly clear that the bplist must be a heap groom - and at 5.5MB, presumably a fairly complicated one. So what's it doing?
Switching ViewsI had a hunch that the bplist would use duplicate dictionary keys as a fundamental building block for the heap groom, but running my parser it didn't output any... until I realised that my tool stored the parsed dictionaries directly as python dictionaries before dumping them. Fixing the tools to instead keep lists of keys and values it became clear that there were duplicate keys. Lots of them:
In the Safari exploit writeup I described how I used different visualisation techniques to try to explore the structure of the objects, looking for patterns I could use to simplify what was going on. In this case, modifying the parser to emit well-formed curly brackets and indentation then relying on VS Code's automatic code-folding proved to work well enough for browsing around and getting a feel for the structure of the groom object.
Sometimes the right visualisation technique is sufficient to figure out what the exploit is trying to do. In this case, where the primitive is a heap-based buffer overflow, the groom will inevitably try to put two things next to each other in memory and I want to know "what two things?"
But no matter how long I stared and scrolled, I couldn't figure anything out. Time to try something different.
InstrumentationI wrote a small helper to load the bplist using the same API as the MakerNote parser and ran it using the Mac Instruments app:
Parsing the single 5.5MB bplist causes nearly half a million allocations, churning through nearly a gigabyte of memory. Just looking through this allocation summary it's clear there's lots of CFString and CFData objects, likely used for heap shaping. Looking further down the list there are other interesting numbers:
The 20'000 in the last line is far too round a number to be a coincidence. This number matches up with the number of __NSDictionaryM objects allocated:
Finally, at the very bottom of the list there are two more allocation patterns which stand out:
There are two sets of very large allocations: eighty 1MB allocations and 44 4MB ones.
I modified my bplist tool again to dump out each unique string or data buffer, along with a count of how many times it was seen and its hash. Looking through the file listing there's a clear pattern:
Object Size
Count
0x3FFFFF
44
0xFFFFF
80
0x3FFF
20
0x26A9
24978
0x2554
44
0x23FF
5822
0x22A9
4
0x1FFF
2
0x1EA9
26
0x1D54
40
0x17FF
66
0x13FF
66
0x3FF
322
0x3D7
404
0xF
112882
0x8
3
There are a large number of allocations which fall just below a "round" number in hexadecimal: 0x3ff, 0x13ff, 0x17ff, 0x1fff, 0x23ff, 0x3fff... That heavily hints that they are sized to fall exactly within certain allocator size buckets.
Almost all of the allocations are just filled with zeros or 'A's. But the 1MB one is quite different:
$ hexdump -C 170ae757_80.bin | head -n 20
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 80 26 00 00 01 00 00 00 |.........&......|
00000020 1f 00 00 00 00 00 00 00 10 00 8b 56 02 00 00 00 |...........V....|
00000030 b0 c3 31 16 02 00 00 00 60 e3 01 00 00 00 00 00 |..1.....`.......|
00000040 20 ec 46 58 02 00 00 00 00 00 00 00 00 00 00 00 | .FX............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 60 bf 31 16 02 00 00 00 |........`.1.....|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000004b0 00 00 00 00 00 00 00 00 10 c4 31 16 02 00 00 00 |..........1.....|
000004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000004e0 02 1c 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
000004f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000500 00 00 00 00 00 00 00 00 70 80 33 16 02 00 00 00 |........p.3.....|
00000510 b8 b5 e5 57 02 00 00 00 ff ff ff ff ff ff ff ff |...W............|
00000520 58 c4 31 16 02 00 00 00 00 00 00 00 00 00 00 00 |X.1.............|
00000530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000550 50 75 2c 18 02 00 00 00 01 00 00 00 00 00 00 00 |Pu,.............|
Further on in the hexdump of the 1MB object there's clearly an NSExpression payload - this payload is also visible just running strings on the WebP file. Matthias Frielingsdorf from iVerify gave a talk at BlackHat Asia with an initial analysis of this NSExpression payload, we'll return to that at the end of this blog post.
Equally striking (and visible in the hexdump above): there are clearly pointers in there. It's too early in the analysis to know whether this is a payload which gets rebased somehow, or whether there's a separate ASLR disclosure step.
On a slightly higher level this hexdump looks a little bit like an Objective-C or C++ object, though some things are strange. Why are the first 24 bytes all zero? Why isn't there an isa pointer or vtable? It looks a bit like there are a number of integer fields before the pointers, but what are they? At this stage of the analysis, I had no idea.
Thinking dynamicallyI had tried a lot to reproduce the exploit primitives on a real device; I built tooling to dynamically generate and sign legitimate PKPass files that I could send via iMessage to test devices and I could crash a lot, but I never seemed to get very far into the exploit - the iOS version range where the heap grooming works seems to be pretty small, and I didn't have an exact device and iOS version match to test on.
Regardless of what I tried: sending the original exploits via iMessage, sending custom PKPasses with the trigger and groom, rendering the WebP directly in a test app or trying to use the PassKit APIs to render the PKPass file the best I could manage dynamically was to trigger a heap metadata integrity check failure, which I assumed was indicative of the exploit failing.
(Amusingly, using the legitimate APIs to render the PKPass inside an app failed with an error that the PKPass file was malformed. And indeed, the exploit sample PKPass is malformed: it's missing multiple required files. But the "secure" PKPass BlastDoor parser entrypoint (PKPassSecurePreviewContextCreateMessagesPreview) is, in this regard at least, less strict and will attempt to render an incomplete and invalid PKPass).
Though getting the whole PKPass parsed was proving tricky, with a bit of reversing it was possible to call the correct underlying CoreGraphics APIs to render the WebP and also get the EXIF/MakerNote parsed. By then setting a breakpoint when the huffman tables were allocated I had hoped it would be obvious what the overflow target was. But it was actually totally unclear what the following object was: (Here X3 points to the start of the huffman tables which are 0x3000 bytes large)
(lldb) x/6xg $x3+0x3000
0x112000000: 0x0000000111800000 0x0000000000000000
0x112000010: 0x00000000001a1600 0x0000000000000004
0x112000020: 0x0000000000000001 0x0000000000000019
The first qword (0x111800000) is a valid pointer, but this is clearly not an Objective-C object, nor did it seem to look like any other recognizable object or have much to do with either the bplist or WebP. But running the tests a few times, there was a curious pattern:
(lldb) x/6xg $x3+0x3000
0x148000000: 0x0000000147800000 0x0000000000000000
0x148000010: 0x000000000019c800 0x0000000000000004
0x148000020: 0x0000000000000001 0x0000000000000019
The huffman table is 0x2F28 bytes, which the allocator rounds up to 0x3000. And in both of those test runs, adding the allocation size to the huffman table pointer yielded a suspiciously round number. There's no way that's a coincidence. Running a few more tests the table+0x3000 pointer is always 8MB aligned. I remembered from some presentations on the iOS userspace allocator I'd read that 8MB is a meaningful number. Here's one from Synaktiv:
8MB is the size of the iOS userspace default allocator's small rack regions. It looks like they might be trying to groom the allocator not to target application-specific data but allocator metadata. Time to dive into some libmalloc internals!
libmallocI'd suggest reading the two presentations linked above for a good overview of the iOS default userspace malloc implementation. Libmalloc manages memory on four levels of abstraction. From largest to smallest those are: rack, magazine, region and block. The size split between the tiny, small and large racks depends on the platform. Almost all the relevant allocations for this exploit come from the small rack, so that's the one I'll focus on.
Reading through the libmalloc source I noticed that the region trailer, whilst still called a trailer, has been now moved to the start of the region object. The small region manages memory in chunks of 8MB. That 8MB gets split up in to (for our purposes) three relevant parts: a header, an array of metadata words, then blocks of 512 bytes which form the allocations:
The first 0x28 bytes are a header where the first two fields form a linked-list of small regions:
typedef struct region_trailer {
struct region_trailer *prev;
struct region_trailer *next;
unsigned bytes_used;
unsigned objects_in_use;
mag_index_t mag_index;
volatile int32_t pinned_to_depot;
bool recirc_suitable;
rack_dispose_flags_t dispose_flags;
} region_trailer_t;
The small region manages memory in units of 512 bytes called blocks. On iOS allocations from the small region consist of contiguous runs of up to 31 blocks. Each block has an associated 16-bit metadata word called a small meta word, which itself is subdivided into a "free" flag in the most-significant bit, and a 15-bit count.
To mark a contiguous run of blocks as in-use (belonging to an allocation) the first meta word has its free flags cleared and the count set to the number of blocks in the run. On free, an allocation is first placed on a lookaside list for rapid reuse without freeing. But once an allocation really gets freed the allocator will attempt to greedily coalesce neighbouring chunks. While in-use runs can never exceed 31 blocks, free runs can grow to encompass the entire region.
The groomBelow you can see the state of the meta words array for the small region directly following the one containing the huffman table as its last allocation:
(lldb) x/200wh 0x148000028
0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000058: 0x0000 0x0003 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000
0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c
0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000
With some simple maths we can convert indexes in the meta words array into their corresponding heap pointers. Doing that it's possible to dump the memory associated with the allocations shown above. The larger 0x19, 0x18 and 0x1c allocations all seem to be generic groom allocations, but the two 0x3 block allocations appear more interesting. The first one (with the first metadata word at 0x14800005a, shown in yellow) is the code_lengths array which gets freed directly after the huffman table building fails. The blue 0x3 block run (with the first metadata word at 0x148000090) is the backing buffer for a CFSet object from the MakerNote and contains object pointers.
Recall that the corruption primitive will write the dword 0x270007 0x58 bytes off the end of the 0x3000 allocation (and that allocation happens to sit directly in front of this small region). That corruption has the following effect (shown in bold):
(lldb) x/200wh 0x148000028
0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000058: 0x0007 0x0027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000
0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c
0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000
It's changed the size of an in-use allocation from 3 blocks to 39 (or from 1536 to 19968 bytes). I mentioned before that the maximum size of an in-use allocation is meant to be 31 blocks, but this doesn't seem to be checked in every single free path. If things don't quite work out, you'll hit a runtime check. But if things do work out you end up with a situation like this:
(lldb) x/200wh 0x148000028
0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000
0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c
0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027
0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000
The yellow (0x8027) allocation now extends beyond its original three blocks and completely overlaps the following green (0x18) and blue (0x3) as well as the start of the purple (0x1c) allocation.
But as soon as this corruption occurs WebP parsing fails and it's not going to make any other allocations. So what are they doing? How are they able to leverage these overlapping allocations? I was pretty stumped.
One theory was that perhaps it was some internal ImageIO or BlastDoor specific object which reallocated the overlapping memory. Another theory was that perhaps the exploit had two parts; this first part which puts overlapping entries on the allocator freelist, then another file which is sent to exploit that? And maybe I was lacking that file? But then, why would there be that huge 1MB payload with NSExpressions in it? That didn't add up.
Puzzling piecesAs is so often the case, stepping back and not thinking about the problem for a while I realised that I'd completely overlooked and forgotten something critical. Right at the very start of the analysis I had run file on all the files inside the PKPass and noted that background.png was actually not a png but a TIFF. I had then completely forgotten that. But now the solution seemed obvious: the reason to use a PKPass versus just a WebP is that the PKPass parser will render multiple images in sequence, and there must be something in the TIFF which reallocates the overlapping allocation with something useful.
Libtiff comes with a suite of tools for parsing tiff files. tiffdump displays the headers and EXIF tags:
$ tiffdump background-15.tiff
background-15.tiff:
Magic: 0x4d4d <big-endian> Version: 0x2a <ClassicTIFF>
Directory 0: offset 68 (0x44) next 0 (0)
ImageWidth (256) SHORT (3) 1<48>
ImageLength (257) SHORT (3) 1<16>
BitsPerSample (258) SHORT (3) 4<8 8 8 8>
Compression (259) SHORT (3) 1<8>
Photometric (262) SHORT (3) 1<2>
StripOffsets (273) LONG (4) 1<8>
Orientation (274) SHORT (3) 1<1>
SamplesPerPixel (277) SHORT (3) 1<4>
StripByteCounts (279) LONG (4) 1<59>
PlanarConfig (284) SHORT (3) 1<1>
ExtraSamples (338) SHORT (3) 1<2>
700 (0x2bc) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
33723 (0x83bb) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
34377 (0x8649) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
ICC Profile (34675) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
The presence of the four 15KB buffers is notable, but they seemed to mostly just be zeros. Here's the output from tiffinfo:
$ tiffinfo -c -j -d -s -z background-15.tiff
=== TIFF directory 0 ===
TIFF Directory at offset 0x44 (68)
Image Width: 48 Image Length: 16
Bits/Sample: 8
Compression Scheme: AdobeDeflate
Photometric Interpretation: RGB color
Extra Samples: 1<unassoc-alpha>
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 4
Planar Configuration: single image plane
XMLPacket (XMP Metadata):
RichTIFFIPTC Data: <present>, 15347 bytes
Photoshop Data: <present>, 15347 bytes
ICC Profile: <present>, 15347 bytes
1 Strips:
0: [ 8, 59]
Strip 0:
00 00 00 00 00 00 00 00 84 13 00 00 01 00 00 00 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cd ab 34 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
This dumps the uncompressed TIFF strip buffer and this looks much more interesting! There's clearly some structure, though not a lot of it. Is this really enough to do something useful? It looks like there could be some sort of object, but I didn't recognise the structure, and had no idea how replacing an object with this would be useful. I explored two possibilities:
1) Alpha blending:This is actually the raw TIFF strip after decompression but before the rendering step which applies the alpha, so it was possible that this got rendered "on top" of another object. That seemed like a reasonable explanation for why the object seemed so sparse; perhaps the idea was to just "move" a pointer value. The first 16 bytes of the strip look like this:
00 00 00 00 00 00 00 00 84 13 00 00 01 00 00 00
which when viewed as two 64-bit values look like this:
0x0000000000000000 0x0000000100001384
It seemed sort-of plausible that rendering the 0x100001384 on top of another pointer might be a neat primitive, but there was something that didn't quite add up. This pointer-ish value is at the start of the strip buffer, so if the overlapping allocation got reallocated with this strip buffer directly, nothing interesting would happen, as the overlapping parts are further along. Maybe the overlapping buffer gets split up multiple times, but this was seeming less and less likely, and I couldn't reproduce this part of the exploit to actually observe what happened.
2) This is an object:The other theory I had was that this actually was an object. The 8 zero bytes at the start were certainly strange… so then what's the significance of the next 8 bytes?
84 13 00 00 01 00 00 00
I tried using lldb's memory find command to see if there were other instances of that exact byte sequence occurring in a test iOS app rendering the WebP then the TIFF using the CoreGraphics APIs:
(lldb) memory find -e 0x100001384 -- 0x100000000 0x200000000
data not found within the range.
Nope, plus it was very, very slow.
One thing I had noticed was that this byte sequence was similar to one near the start of the 1MB groom object:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 80 26 00 00 01 00 00 00 |.........&......|
00000020 1f 00 00 00 00 00 00 00 10 00 8b 56 02 00 00 00 |...........V....|
00000030 b0 c3 31 16 02 00 00 00 60 e3 01 00 00 00 00 00 |..1.....`.......|
They're not identical, but it seemed a strange coincidence.
I took a bunch of test app core dumps using lldb's process save-core command and wrote some simple code to search for similar-ish byte patterns. After some experimentation I managed to find something:
1c7b2600 49 d2 e4 29 02 00 00 01 84 13 00 00 02 00 00 00 |I..)............|
1c7b2610 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |B...............|
1c7b2620 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
1c7b2630 c0 92 d6 83 02 00 00 00 00 93 d6 83 02 00 00 00 |................|
Converting those coredump offsets into VM address and looking them up revealed:
(lldb) x/10xg 0x121E47600
0x121e47600: 0x0100000229e4d249 0x0000000200001384
0x121e47610: 0x0000000000000042 0x0000000000000000
0x121e47620: 0x0000000000000000 0x0000000000000000
(lldb) image lookup --address 0x229e4d248
Address: CoreFoundation[0x00000001dceed248] (CoreFoundation.__DATA_DIRTY.__objc_data + 7800)
Summary: (void *)0x0000000229e4d0e0: __NSCFArray
It's an NSCFArray, which is the Foundation (Objective-C) "toll-free bridged" version of the Core Foundation (C) CFArray type! This was the hint that I was looking for to identify the significance of the TIFF and that 1MB groom object, which also contains a similar byte sequence.
Cores and FoundationsEven though Apple hasn't updated the open-source version of CoreFoundation for almost a decade, the old source is still helpful. Here's what a CoreFoundation object looks like:
/* All CF "instances" start with this structure. Never refer to
* these fields directly -- they are for CF's use and may be added
* to or removed or change format without warning. Binary
* compatibility for uses of this struct is not guaranteed from
* release to release.
*/
typedef struct __CFRuntimeBase {
uintptr_t _cfisa;
uint8_t _cfinfo[4];
#if __LP64__
uint32_t _rc;
#endif
} CFRuntimeBase;
So the header is an Objective-C isa pointer followed by four bytes of _cfinfo, followed by a reference count. Taking a closer look at the uses of __cfinfo:
CF_INLINE CFTypeID __CFGenericTypeID_inline(const void *cf) {
// yes, 10 bits masked off, though 12 bits are
// there for the type field; __CFRuntimeClassTableSize is 1024
uint32_t *cfinfop = (uint32_t *)&(((CFRuntimeBase *)cf)->_cfinfo);
CFTypeID typeID = (*cfinfop >> 8) & 0x03FF; // mask up to 0x0FFF
return typeID;
}
It seems that the second byte in __cfinfo is a type identifier. And indeed, running expr (int) CFArrayGetTypeID() in lldb prints: 19 (0x13) which matches up with both the object found in the coredump as well as the strange (or now not so strange) object in the TIFF strip buffer.
X steps forwards, Y steps backLooking through more of the CoreFoundation code it seems that the object in the TIFF strip buffer is a CFArray with inline storage containing one element with the value 0x1234abcd. It also seems that it's possible for CF objects to have NULL isa pointers, which explains why the first 8 bytes of the fake object are zero.
This is interesting, but it still doesn't actually get us any closer to figuring out what the next step of the exploit actually is. If the CFArray is meant to overlap with something, then what? And what interesting side-effects could having an CFArray with only a single element with the value 0x1234abcd possibly have?
This seems like one step forward and two steps back, but there's something else which we can now figure out: what that 1MB groom object actually is. Let's take a look at the start of it again:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 80 26 00 00 01 00 00 00 |.........&......|
00000020 1f 00 00 00 00 00 00 00 10 00 8b 56 02 00 00 00 |...........V....|
00000030 b0 c3 31 16 02 00 00 00 48 e3 01 00 00 00 00 00 |..1.....H.......|
00000040 20 ec 46 58 02 00 00 00 00 00 00 00 00 00 00 00 | .FX............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 60 bf 31 16 02 00 00 00 |........`.1.....|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
It looks like another CF object, starting at +0x10 in the buffer with the same NULL isa pointer, a reference count of 1 and a __cfinfo of {0x80, 0x26, 0, 0}. The type identifiers aren't actually fixed, they're allocated dynamically via calls to _CFRuntimeRegisterClass like this:
CFTypeID CFArrayGetTypeID(void) {
static dispatch_once_t initOnce;
dispatch_once(&initOnce, ^{ __kCFArrayTypeID = _CFRuntimeRegisterClass(&__CFArrayClass); });
return __kCFArrayTypeID;
}
The CFTypeIDs are really just indexes into the __CFRuntimeClassTable array, and even though the types are allocated dynamically the ordering seems sufficiently stable that the hardcoded type values in the exploit work. 0x26 is the CFTypeID for CFReadStream:
struct _CFStream {
CFRuntimeBase _cfBase;
CFOptionFlags flags;
CFErrorRef error;
struct _CFStreamClient *client;
void *info;
const struct _CFStreamCallBacks *callBacks;
CFLock_t streamLock;
CFArrayRef previousRunloopsAndModes;
dispatch_queue_t queue;
};
Looking through the CFStream code it seems to call various callback functions during object destruction — that seems like a very likely path towards code execution, though with some significant caveats:
Caveat I: It's still unclear how an overlapping allocation in the small malloc region could lead to a CFRelease being called on this 1MB allocation.
Caveat II: What about ASLR? There have been some tricks in the past targeting "universal gadgets" which work across multiple slides. Nemo also had a neat objective-c trick for defeating ASLR in the past, so it's plausible that there's something like that here.
Caveat III: What about PAC? If it's a data-only attack then maybe PAC isn't an issue, but if they are trying to JOP they'd need a trick beyond just an ASLR leak, as all forward control flow edges should be protected by PAC.
Special DeliveryAround this time in my analysis Matthias Frielingsdorf offered me the use of an iPhone running 16.6, the same version as the targeted ITW victim. With Matthias' vulnerable iPhone, I was able to use the Dopamine jailbreak to attach lldb to MessagesBlastDoorService and after a few tries was able to reproduce the exploit right up to the CFRelease call on the fake CFReadStream, confirming that that part of my analysis was correct!
Collecting a few crashes led, yet again, to even more questions...
Caveat I: Mysterious PointersSimilar to the analysis of the huffman tables, there was a clear pattern in the fake object pointers, which this time were even stranger than the huffman tables. The crash site was here:
LDR X8, [X19,#0x30]
LDR X8, [X8,#0x58]
At this point X19 points to the fake CFReadStream object, and collecting a few X19 values there's a pretty clear pattern:
0x000000075f000010
0x0000000d4f000010
The fake object is inside a 1MB heap allocation, but all those fake object addresses are always 16 bytes above a 16MB-aligned address. It seemed really strange to me to end up with a pointer 0x10 bytes past such a round number. What kind of construct would lead to the creation of such a pointer? Even though I did have a debugger attached to MessagesBlastDoorService, it wasn't a time-travel debugger, so figuring out the history of such a pointer was non-trivial. Using the same core dump analysis techniques I could see that the pointer which would end up in X19 was also present in the backing buffer of the CFSet described earlier. But how did it get there?
Having found the strange CFArray inside the TIFF I was heavily biased towards believing that this must have something to do with it, so I wrote some tooling to modify the fake CFArray's in the TIFF in the exploit. The theory was that by messing with that CFArray, I could cause a crash when it was used and figure out what was going on. But making minor changes to the strip buffer didn't seem to have any effect — the exploit still worked! Even replacing the entire strip buffer with A's didn't stop the exploit working... What's going on?
Stepping backI had made a list of the primitives I thought might lead to the creation of such a strange looking pointer — first on the list was a partial pointer overwrite. But then why the CFArray? But now having shown that the CFArray can't be involved, it was time to go back to the list. And step back even further and make sure I'd really looked at all of that TIFF...
There were still those four other metadata buffers in the tiffdump output I'd shown earlier:
700 (0x2bc) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
33723 (0x83bb) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
34377 (0x8649) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
ICC Profile (34675) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>
I'd just dismissed them, but, maybe I shouldn't have done that? I had actually already dumped the full contents of each of those buffers and checked that there wasn't something else apart from the zeros. They were all zeros, except the third-to-last bytes which were 0x10, which I'd considered completely uninteresting. Uninteresting, unless you wanted to partially overwrite the three least-significant bytes of a little-endian pointer value with 0x000010 that is!
Let's look back at the SMALL metadata:
0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000
0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c
0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027
Each of those four metadata buffers in the TIFF is 15347 bytes, which is 0x3bf3 — looked at another way that's 0x3c00 (the size rounded up to the next 0x200 block size), minus 5, minus 8.
0x3c00 is exactly 30 0x200 byte blocks. Each 16-bit word in the metadata array shown above corresponds to one 0x200 block, where the overlapping chunk in yellow starts at 0x14800005a. Counting forwards 30 chunks means that the end of a 0x3c00 allocation overlaps perfectly with the end of the original blue three-chunk allocation:
0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000
0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c
0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027
This has the effect of overwriting all but the last 16 bytes of the blue allocation with zeros, then overwriting the three least-significant bytes of the second-to-last pointer-sized value with 0x10 00 00; which, if that memory happened to contain a pointer, has the effect of "shifting" that pointer down to the nearest 16MB boundary, then adding 0x10 bytes! (For those who saw my 2024 Offensivecon talk, this was the missing link between the overlapping allocations and code execution I mentioned.)
As mentioned earlier, that blue allocation starting with 0x0003 is the backing buffer of a CFSet object from the bplist inside the WebP MakerNote. The set is constructed in a very precise fashion such that the target pointer (the one to be rounded down) ends up as the second-to-last pointer in the backing buffer. The 1MB object is then also groomed such that it falls on a 16MB boundary below the object which the CFSet entry originally points to. Then when that CFSet is destructed it calls CFRelease on each object, causing the fake CFReadStream destructor to run.
Caveat II: ASLRWe've looked at the whole flow from huffman table overflow to CFRelease being invoked on a fake CFReadStream — but there's still stuff missing. The second open question I discussed earlier was ASLR. I had theorised that maybe it used a trick like a universal gadget, but is that the case?
In addition to the samples, I was also able to obtain a number of crash logs from failed exploit attempts where those samples were thrown, which meant I could figure out the ASLR slide of the MessagesBlastDoorService when the exploit failed. In combination with the target device and exact OS build (also contained in the crash log) I could then obtain the matching dyld_shared_cache, subtract the runtime ASLR slide from a bunch of the pointer-looking things in the 1MB object and take a look at them.
The simple answer is: the 1MB object contains a large number of hardcoded, pre-slid, valid pointers. There's no weird machine, tricks or universal gadget here. By the time the PKPass is built and sent by the attackers they already know both the target device type and build as well as the runtime ASLR slide of the MessagesBlastDoorService...
Based on analysis by iVerify, as well as analysis of earlier exploit chains published by Citizen Lab, my current working theory is that the large amount of HomeKit traffic seen in those cases is likely a separate ASLR/memory disclosure exploit.
Caveat III: Pointer AuthenticationIn the years since PAC was introduced we've seen a whole spectrum of interesting ways to either defeat, or just avoid, PAC. So what did these attackers do? To understand that let's follow the CFReadStream destruction code closely. (All these code snippets are from the most recently available version of CF from 2015, but the code doesn't seem to have changed much.)
Here's the definition of the CFReadStream:
static const CFRuntimeClass __CFReadStreamClass = {
0,
"CFReadStream",
NULL, // init
NULL, // copy
__CFStreamDeallocate,
NULL,
NULL,
NULL, // copyHumanDesc
__CFStreamCopyDescription
};
When a CFReadStream is passed to CFRelease, it will call __CFStreamDeallocate:
static void __CFStreamDeallocate(CFTypeRef cf) {
struct _CFStream *stream = (struct _CFStream *)cf;
const struct _CFStreamCallBacks *cb =
_CFStreamGetCallBackPtr(stream);
CFAllocatorRef alloc = CFGetAllocator(stream);
_CFStreamClose(stream);
_CFStreamGetCallBackPtr just returns the CFStream's callBacks field:
CF_INLINE const struct _CFStreamCallBacks *_CFStreamGetCallBackPtr(struct _CFStream *stream) {
return stream->callBacks;
}
Here's _CFStreamClose:
CF_PRIVATE void _CFStreamClose(struct _CFStream *stream) {
CFStreamStatus status = _CFStreamGetStatus(stream);
const struct _CFStreamCallBacks *cb =
_CFStreamGetCallBackPtr(stream);
if (status == kCFStreamStatusNotOpen ||
status == kCFStreamStatusClosed ||
(status == kCFStreamStatusError &&
__CFBitIsSet(stream->flags, HAVE_CLOSED)
))
{
// Stream is not open from the client's perspective;
// do not callout and do not update our status to "closed"
return;
}
if (! __CFBitIsSet(stream->flags, HAVE_CLOSED)) {
__CFBitSet(stream->flags, HAVE_CLOSED);
__CFBitSet(stream->flags, CALLING_CLIENT);
if (cb->close) {
cb->close(stream, _CFStreamGetInfoPointer(stream));
}
_CFStreamGetStatus extracts the status bitfield from the flags field:
#define __CFStreamGetStatus(x) __CFBitfieldGetValue((x)->flags, MAX_STATUS_CODE_BIT, MIN_STATUS_CODE_BIT)
Looking at the 1MB object again the flags field is the first non-base field:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 80 26 00 00 01 00 00 00 |.........&......|
00000020 1f 00 00 00 00 00 00 00 10 00 8b 56 02 00 00 00 |...........V....|
00000030 b0 c3 31 16 02 00 00 00 48 e3 01 00 00 00 00 00 |..1.....H.......|
00000040 20 ec 46 58 02 00 00 00 00 00 00 00 00 00 00 00 | .FX............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 60 bf 31 16 02 00 00 00 |........`.1.....|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
That gives a status code of 0x1f with all the other flags bits clear. This gets through the two conditional branches to reach this close callback call:
if (cb->close) {
cb->close(stream, _CFStreamGetInfoPointer(stream));
}
At this point we need to switch to looking at the assembly to see what's really happening:
__CFStreamClose
var_30= -0x30
var_20= -0x20
var_10= -0x10
var_s0= 0
PACIBSP
STP X24, X23, [SP,#-0x10+var_30]!
STP X22, X21, [SP,#0x30+var_20]
STP X20, X19, [SP,#0x30+var_10]
STP X29, X30, [SP,#0x30+var_s0]
ADD X29, SP, #0x30
MOV X19, X0
BL __CFStreamGetStatus
CBZ X0, loc_187076958
The fake CFReadStream is the first argument to this function, so passed in the X0 register. It's then stored into X19 so it survives the call to __CFStreamGetStatus.
Skipping ahead past the flag checks we reach the callback callsite (this is also the crash site seen earlier):
LDR X8, [X19,#0x30]
...
LDR X8, [X8,#0x58]
CBZ X8, loc_187076758
LDR X1, [X19,#0x28]
MOV X0, X19
BLRAAZ X8
Let's walk through each instruction in turn there:
First it loads the 64-bit value from X19+0x30 into X8:
LDR X8, [X19,#0x30]
Looking at the hexdump of the 1MB object above this will load the value 0x25846ec20.
From the crash reports we know the runtime ASLR slide of the MessagesBlastDoorService when this exploit was thrown was 0x3A8D0000, so subtracting that we can figure out where in the shared cache this pointer should point:
0x25846ec20-0x3A8D0000=0x21DB9EC20
It points into the __const segment of the TextToSpeechMauiSupport library in the shared cache:
The next instruction adds 0x58 to that TextToSpeechMauiSupport pointer and reads a 64-bit value from there:
LDR X8, [X8,#0x58] // x8 := [0x21DB9EC20+0x58]
This loads the pointer to the function _DataSectionWriter_CommitDataBlock from 0x21DB9EC78.
IDA is simplifying something for us here: the function pointer loaded there is actually signed with the A-family instruction key with a zero context. This signing happens transparently (either during load or when the page is faulted in).
The remaining four instructions then check that the pointer wasn't NULL, load X1 from offset +0x28 in the fake 1MB object, move the pointer to the fake object back into X0 and call the PAC'ed _DataSectionWriter_CommitDataBlock function pointer via BLRAAZ:
CBZ X8, loc_187076758
LDR X1, [X19,#0x28]
MOV X0, X19
BLRAAZ X8
Callback-Oriented ProgrammingA well-known attack against PAC is to swap two valid, PAC'ed pointers which are signed in the same way but point to different places (e.g. swapping two function pointers with different semantics, allowing you to exploit those semantic differences).
Since a large number of PAC-protected pointers are signed with the A-family instruction key with a zero-context value, there are a large number of pointers to choose from. "Just" having an ASLR defeat shouldn't be enough to achieve this though; surely you'd need to disclose the actual PAC'ed pointer value? But that's not what happened above.
Notice that the CFStream objects don't directly contain the callback function pointers — there's an extra level of indirection. The CFStream object contains a pointer to a callback structure, and that structure has the PAC'd function pointers. And crucially: that first pointer, the one to the callbacks structure, isn't protected by PAC. This means that the attackers can freely swap pointers to callback structures, operating one-level removed from the function pointers.
This might seem like a severe constraint, but the dyld_shared_cache is vast and there are easily enough pre-existing callback structures to build a "callback-oriented JOP" chain, chaining together unsigned pointers to signed function pointers.
The initial portion of the payload is a large callback-oriented JOP chain which is used to bootstrap the evaluation of the next payload stage, a large NSExpression.
SimilaritiesThere are a number of similarities between this exploit chain and PWNYOURHOME, an earlier exploit also attributed by CitizenLab to NSO, described in this blog post in April 2023.
That chain also had an initial stage targeting HomeKit, followed by a stage targeting MessagesBlastDoorService and also involving a MakerNote object — the Citizen Lab post claims that at the time the MakerNote was inside a PNG file. My guess would be that that PNG was being used as the delivery mechanism for the MakerNote bplist heap grooming primitives discussed in this post.
Based on Citizen Lab's description it also seems like PWNYOURHOME was leveraging a similar callback-oriented JOP technique, and it seems likely that there was also a HomeKit-based ASLR disclosure. The PWNYOURHOME post has a couple of extra details around a minor fix which Apple made, preventing parsing of "certain HomeKit messages unless they arrive from a plausible source." But there still aren't enough details to figure out the underlying vulnerability or primitive. It seems likely to me that the same issue, or a variant thereof was still in use in BLASTPASS.
Key materialMatthias from iVerify presented an initial analysis of the NSExpression payload at BlackHat Asia in April 2024. In early July 2024, Matthias and I took a closer look at the final stages of the NSExpression payload which decrypts an AES-encrypted NSExpression and executes it.
It seems very likely that the encrypted payload contains a BlastDoor sandbox escape. Although the BlastDoor sandbox profile is fairly restrictive it still allows access to a number of system services like notifyd, logd and mobilegestalt. In addition to the syscall attack surface there's also a non-trivial IOKit driver attack surface:
...
(allow iokit-open-user-client
(iokit-user-client-class "IOSurfaceRootUserClient")
(iokit-user-client-class "IOSurfaceAcceleratorClient")
(iokit-user-client-class "AGXDevice"))
(allow iokit-open-service)
(allow mach-derive-port)
(allow mach-kernel-endpoint)
(allow mach-lookup
(require-all
(require-not (global-name "com.apple.diagnosticd"))
(require-any
(global-name "com.apple.logd")
(global-name "com.apple.system.notification_center")
(global-name "com.apple.mobilegestalt.xpc"))))
...
(This profile snippet was generated using the Cellebrite labs' fork of SandBlaster)
In FORCEDENTRY the sandbox escape was contained directly in the NSExpression payload (though that was an escape from the less-restrictive IMTranscoderAgent sandbox). This time around it seems extra care has been taken to prevent analysis of the sandbox escape.
The question is: where does the key come from? We had a few theories:
- Perhaps the key is just obfuscated, and by completely reversing the NSExpression payload we can find it?
- Perhaps the key is derived from some target-specific information?
- Perhaps the key was somehow delivered in some other way and can be read from inside BlastDoor?
We spent a day analysing the NSExpression payload and concluded that the third theory appeared to be the correct one. The NSExpression walks up the native stack looking for the communication ports back to imagent. It then hijacks that communication, effectively taking over responsibility for parsing all subsequent incoming requests from imagent for "defusing" of iMessage payloads. The NSExpression loops 100 times, parsing incoming requests as XPC messages, reading the request xpc dictionary then the data xpc data object to get access to the raw, binary iMessage format. It waits until the device receives another iMessage with a specific format, and from that message extracts an AES key which is then used to decrypt the next NSExpression stage and evaluate it.
We were unable to recover any messages with the matching format and therefore unable to analyse the next stage of the exploit.
ConclusionIn contrast to FORCEDENTRY, BLASTPASS's separation of the ASLR disclosure and RCE phases mitigated the need for a novel weird machine. Whilst the heap groom was impressively complicated and precise, the exploit still relied on well-known exploitation techniques. Furthermore, the MakerNote bplist groom and callback-JOP PAC defeat techniques appear to have been in use for multiple years, based on similarities with Citizenlab's blogpost in 2023, which looked at devices compromised in 2022.
Enforcing much stricter requirements on the format of the bplist inside the MakerNote (for example: a size limit or a strict-parser mode which rejects duplicate keys) would seem prudent. The callback-JOP issue is likely harder to mitigate.
The HomeKit aspect of the exploit chain remains mostly a mystery, but it seems very likely that it was somehow involved in the ASLR disclosure. Samuel Groß's post "A Look at iMessage in iOS 14" in 2021, mentioned that Apple added support for re-randomizing the shared cache slide of certain services. Ensuring that BlastDoor has a unique ASLR slide could be a way to mitigate this.
This is the second in-the-wild NSO exploit which relied on simply renaming a file extension to access a parser in an unexpected context which shouldn't have been allowed.
FORCEDENTRY had a .gif which was really a .pdf.
BLASTPASS had a .png which was really a .webp.
A basic principle of sandboxing is treating all incoming attacker-controlled data as untrusted, and not simply trusting a file extension.
This speaks to a broader challenge in sandboxing: that current approaches based on process isolation can only take you so far. They increase the length of an exploit chain, but don't necessarily reduce the size of the initial remote attack surface. Accurately mapping, then truly reducing the scope of that initial remote attack surface should be a top priority.