- This work performs the first analysis of WeChat’s tracking ecosystem. Using reverse engineering methods to intercept WeChat’s network requests, we identified exactly what types of data the WeChat app is sending to its servers, and when.
- We found that the most fine-grained activity tracking data is sent during Mini Program execution. All Mini Programs, and thereby their users, are enrolled in usage tracking, meaning that a large amount of users’ activity in the Mini Program is sent to WeChat and not just the Mini Program developers themselves.
- Permission boundaries between Mini Programs and the host WeChat platform are unclear. As one consequence, we found that granting permissions such as location permission during the use of a Mini Program will also enable the larger transmission of geolocation data to WeChat.
With over 1.2 billion monthly active users, WeChat is the most popular messaging and social media platform in China and third in the world. According to some market research, network traffic from WeChat made up 34% of Chinese mobile traffic in 2018. WeChat has in many ways monopolized messaging in China, making it necessary for individuals in China to use. WeChat has also evolved beyond simply messaging. People commonly use WeChat as a social media platform to share updates with contacts, as a platform for conducting financial transactions, and also as a platform for downloading and using other programs, referred to as “Mini Programs.”
WeChat has not only become the default way to contact people in China, but its ecosystem also encompasses many other necessities of daily life, like performing financial transactions or calling taxis. Many inside and outside China, therefore, use WeChat out of necessity. Besides individuals in China, diaspora populations, family members, journalists, international activists, diplomats, people who do business in China, and just about anyone with a relationship in China also use WeChat out of necessity. WeChat also complies with Chinese government and local police requests for data and information, essentially becoming a mass surveillance tool for local authorities. It also operates a massive content censorship ecosystem for the features on its platform.
Understanding what data the WeChat application and ecosystem transmits, and to whom, may be especially important due to the heavily automated surveillance and content censorship ecosystem operated by the platform. For vulnerable populations that must use WeChat (for instance, domestic journalists and foreign correspondents, grassroots and diaspora activists), knowing the limitations of the app can protect them. This kind of risk assessment requires a more granular understanding of information flows within the WeChat ecosystem.
In the case of WeChat, a large portion of network communications, including when messaging, viewing WeChat’s “Moments” posts, or sometimes even when using WeChat Mini Programs, utilize a proprietary encryption protocol called MMTLS. The closed-source and undocumented nature of this network protocol has also made it difficult for researchers to conduct a thorough review of information flows from the application. In our study we had to develop our own tools to enable us to thoroughly study the information that flows back to WeChat servers.
Our primary research question investigates what information flows back to WeChat servers across the WeChat ecosystem. To set the stage for this work, we first reverse engineer the networking stack in order to develop instrumentation and tooling to study WeChat network requests. In a follow-up report, we will further discuss our full understanding of the network stack, including the custom network security measures employed by WeChat to encrypt message data.
We then use this tooling to systematically categorize and study the composition of network requests flowing from the WeChat client to the server in various contexts in the WeChat ecosystem.
WeChat is often described as a “super-app” not only due to the large number of features which it has accrued since its inception, but also because it is a popular platform for third-party applications. Originally developed as a chat app, the WeChat platform has grown to support voice calls, “Moments” social media posts, and file sharing. Specific to China, the app also serves as a major payment and financial transaction platform. Globally, the app also supports Mini Programs, which are a key feature relied on by a majority of WeChat users.
Despite WeChat’s global popularity, the platform has come under repeated scrutiny for security and privacy issues. The platform lacks end-to-end encryption of chat messages, giving Tencent visibility into all messages sent over the platform. Users with mainland China accounts are subjected to automated, keyword-based censorship of messages, and images they post or send via chat message are censored according to the text contained in those images and according to their similarity to images on an internal blacklist. While such censorship is commonly believed to only affect users with mainland China accounts, previous work has found how even the communications of non-China users are subjected to political surveillance and used to build up WeChat’s Chinese political censorship system.
What are Mini Programs and why are they important?
Mini Programs are lightweight applications that can be downloaded and launched within the WeChat program, and can also sync and link with users’ WeChat account and certain associated data, like their contact information. The breadth and variety of Mini Programs is essentially the same as any other application ecosystem, covering e-commerce, health, public services, gaming, and any other service you could feasibly imagine an app would be used for. Some Mini Programs deal with particularly sensitive data, such as health apps (e.g. Tencent Health), government service apps (e.g., the local contact tracing apps that were compulsory during the COVID-19 pandemic), or apps that perform financial transactions on behalf of the user (e.g., shopping apps like Pinduoduo or budgeting apps like “Small Ledger”/收款小账本).
The Mini Program ecosystem has also undergone criticism around various privacy and security issues. A 2020 report from CNCERT/CC found that 60 percent of the 50 banking applications that they investigated did not encrypt any user data transmitted over the network, among a litany of other common security issues. Another report investigated the third-party tracking ecosystem across 52 commonly used Mini Programs, found that many shared user data with third parties, and generally provided unsatisfactory notice and opt-outs to users.
Since the CNCERT report, WeChat has attempted to tighten user data controls within its Mini Program ecosystem. One such example is that web requests can no longer be directly made from the Mini Program, and instead must go through WeChat’s internal API. This way, WeChat can enforce which third-parties are contacted during Mini Program usage, and also enforce minimum security for network requests, in particular by ensuring that they are all encrypted using HTTPS.
Even though WeChat and Mini Programs are popular, data transmission and information flow across these third-party ecosystems is understudied when compared to Android and iOS. Previous work has attempted to use automated methods to analyze the Windows desktop version of WeChat, finding that various identifiers related to a user’s machine are transmitted to WeChat’s servers.
In this section we explain our methods for analyzing WeChat’s data flows and tracking. Our primary analysis method is the use of reverse engineering. In our report, we analyzed the Android version of the app, specifically version 8.0.23 (android:versionCode = 2160) released on May 26, 2022, downloaded from the WeChat website. We used an account registered to a U.S. number for the analysis, which changes the behavior of the application compared to a mainland Chinese number. Our setup may not be comprehensive, and the full limitations are discussed in a section below.
Reverse engineering methods
In this work, we utilized both static and dynamic analysis methods, static referring to those methods which do not involve running an analyzed application and dynamic referring to those which do. For static analysis, we used Jadx, a popular Android decompiler, to decompile WeChat’s Android Dex files into Java source code. We also used Ghidra and IDA Pro to analyze the native libraries bundled with WeChat.
For dynamic analysis, we analyzed the application installed on a rooted Google Pixel 4 phone and an emulated Android OS, using Frida to hook into the app’s functions and manipulate and export application memory. We also performed network analysis of WeChat’s network traffic using Wireshark. However, due to WeChat’s use of nonstandard cryptographic libraries like MMTLS, standard network traffic analysis tools that might work with HTTPS/TLS do not work for all of WeChat’s network activity. Our use of Frida was paramount for capturing the data and information flows we detail in this report. These Frida scripts are designed to intercept WeChat’s request data immediately before WeChat sends it to its MMTLS encryption module. The Frida scripts we used are published in our Github repository.
Using these dynamic analysis methods, we then manually simulate multiple common “user activities” that a user may perform in practice. The discrete user activities we tested are as follows:
- Account creation
- Account login
- Loading messages
- Sending messages
- Sending multimedia messages, including images and video
- Loading Moments feed
- Posting Moments
- Posting multimedia Moments, including images and video
- Commenting and interacting with Moments
- Mini Program usage
For each of these activities we identify the destination of all network requests made during them and record the types of data transmitted during them. We collect data under two primary modes: a “permissive” and “restrictive” mode. In a “permissive” mode, we grant the application all sensitive permissions, to demonstrate to what extent permissioned APIs are relied on during regular operation of the app.
To detect sensitive content flows, we identify various pieces of sensitive data related to the current user. For certain fields that are user-controllable, we use identifiable strings of text. We then attempt to find these pieces of data in WeChat’s plaintext, uncompressed transmissions. For instance, we identified the following information to track their transmissions:
- Network carrier
- Phone number
- User’s internal WeChat ID
- OS details – Android API version
- Device details – model, brand
For spoofable numeric data, such as geolocation coordinates or screen size on an Android emulator, we search for the combination of certain numbers (such as both the width and height, or all three geocoordinates) to avoid matching false positives. For fully user-controllable fields, we inject recognizable and descriptive names, such as “USERNAME_XXXXX” for the name of the account. A full list of identified information used to track content flows can be found in our Github repository.
In order to study WeChat behavior during Mini Program usage, we load “WeChat Example Mini Program,” which is a test application that can trigger various API calls. We also collect data from five popular Mini Programs across different categories from 2021 from a few blog posts, based on usage and search metrics, since there is no Mini Program ordering provided by WeChat. The five Mini Programs we studied were Meituan (food), Pinduoduo (online shopping), Didi Chuxing (taxi service), Tongcheng Travel (tourism), and Kuaishou (online video). In these five Mini Programs, we simply browsed their main interface without logging in to them.
This investigation only looks at client behavior, and is subject to other common limitations in privacy research that can only perform client analysis. Much of the data that is sent to WeChat servers may be required for functionality of the application. For instance, WeChat servers can certainly see chat messages since WeChat can censor them according to their content. We cannot say what WeChat is doing with the data that they collect, but we can make inferences about what is possible. Certain limited inferences about data sharing can be made based on prior work like this report, which indicates that messages sent by non-mainland-Chinese users are used to train censorship algorithms for mainland Chinese users. In this report, we focus on the version of WeChat for non-mainland-Chinese users.
Our investigation was also limited due to legal and ethical constraints. It has become increasingly difficult to obtain Chinese phone numbers for investigation due to the strict phone number and associated government ID requirements. Therefore, we did not test on Chinese phone numbers, which causes WeChat to behave differently. In addition, without a mainland Chinese account, the types of interaction with certain features and Mini Programs were limited. For instance, we did not perform financial transactions on the application.
In terms of our environment, we only investigated the Android application on a rooted Google Pixel 4 and an emulator for that same device, both on Android 12. Though we did not observe any operational differences in behavior between using the emulator and a regular device, there may be downstream effects to our use of an emulator to collect much of the data in this report.
We also only evaluated a recent version of WeChat (8.0.23 released on May 26, 2022). Testing different versions of WeChat, the backwards-compatibility of the servers with older versions of the application, and testing on a variety of Android operating systems with variations in API version, are great avenues for future work.
We also acknowledge that the scope of the Mini Programs work is limited compared to the size of the entire ecosystem, as we only take a few Mini Programs to investigate thoroughly. We attempt to offset this by focusing on the relationship between Mini Program API calls and data sent back to WeChat servers. A comprehensive and representative overview of the entire ecosystem would be a great avenue for future work.
Finally, the WeChat codebase is vast. Over the years, the application has grown to cover various features, including ones that may transmit especially sensitive data (such as linking financial data and performing financial transactions). In this work, we do not purport to cover all of them. In fact, our research packages our preliminary understanding of WeChat. We provide our most accurate understanding of the codebase, but acknowledge that there exist limitations in our methodology and the vast nature of the WeChat application.
In this section we detail our findings concerning WeChat’s tracking during regular application usage as well as during mini-program usage. Further technical detail, such as the structure of network data and how we captured this data, can be found in the Appendix and in our Github repository.
First-party tracking on Mini Programs
The data collection observed on Mini Programs is likely in-place to enable the application monitoring and analytics features provided by WeChat, namely, “We分析” or “WeAnalyze”. According to the “WeAnalyze Mini Programs data analytics platform integration guide”, there are two data dashboards that can be enabled by the Mini Program developer:
- Basic dashboard: No manual data upload needed. Contains visitor traffic data.
- Advanced dashboard: Need to upload data using the wx.reportEvent API in the Mini Program. The dashboard shows the uploaded data.
All Mini Programs we studied collect at least basic user traffic data, and there is no notice to users. To put this data collection into perspective, it would be an equivalent privacy violation if Google Play Store automatically injected Google Analytics tracking scripts into all applications that were available on the platform.
During regular Mini Program usage, user interactions are sent back to WeChat servers. In addition, extremely verbose logging data is sent to WeChat servers during Mini Program usage; we do not observe this type of logging data sent during any of our other experiments. Many of the requests observed, especially extraneous logging and activity tracking, do not serve a baseline functionality for the Mini Programs, instead records and tracks user behavior across the program.
Details of tracking during Mini Program usage
The Java class JsApiOperateRealtimeReport (which pings endpoint /wxartrappsvr/route), sends the following data back to WeChat servers:
"os_version": "Android 12",
AppBrandIDKeyBatchReport (/wxausrevent/wxaappidkeybatchreport) sends a large stream of logging data, including API calls made by the Mini Program and other internal API calls made by the host platform, to WeChat servers.
More examples of each type of request can be found in our Github repository.
Infrastructure and server locations
Upon startup, the application sends a regular HTTP GET request to the endpoint, http://dns.weixin.qq.com/cgi-bin/micromsg-bin/newgetdns, which is a formatted list of WeChat server domain names and associated IP addresses. This is combined with other MMTLS requests (for instance, we also observe MMTLS requests to /cgi-bin/micromsg-bin/getcdndns) to resolve certain IP addresses that the client might choose from. The full contents of this endpoint as we observed it can be found in our Github repository.
In this file, we can identify a number of other prefixes, including “ml”, “sh”, “sz”, in addition to the “hk” and “sg” prefixes we observed. “sz” may stand for Shenzhen, the location of Tencent’s headquarters. Though it was not observed in our testing, requests to “szshort.weixin.qq.com” are commonly observed in online posts, possibly from developers located in China. This domain is also mentioned in a test case in some of Tencent’s open-source networking code.
In general, we did not observe changes in the region prefix that was preferred by MMTLS requests. However, cross-referencing from existing data and observations by other researchers, we can conclude that the servers that are preferred by the WeChat application depend on a combination of IP address location, especially if the user is logged out, and the registered phone number locale (if the user is logged in).
WeChat makes use of some dangerous permissions (as defined by Google), in particular “Location” and “Files and media”, during regular operation of the application, if those permissions are granted to the application.
In addition, a network request on opening the app contains an encrypted header containing device serial numbers, specifically the IMEI1 and IMSI2. On Android 10 devices and above, where the API no longer permits obtaining the IMEI, a dummy string “1234567890ABCDEF” is sent in place of the IMEI. Operating systems that do not have this protection in place would likely send the full IMEI.
These data points demonstrate that WeChat makes opportunistic use of device access to many types of sensitive data, which means hardening the operating system and device access to certain permissions does restrict the amount and type of data that is available to WeChat. However, in practice, this landscape is complicated by the fact that WeChat also manages the permissions for its Mini Programs.
Any particular Mini Program can request an OS-level sensitive permission, such as reading storage or accessing the user location. In order to grant any Mini Program a particular permission, then, by design, WeChat must also request that permission from the operating system. There is no way to enable a particular permission for a Mini Program but not for the host WeChat application.
WeChat’s Mini Program ecosystem is often compared to an application ecosystem for a particular operating system; in this analogy, WeChat, or other super-apps, are akin to an operating system. OS-level permissioning systems have been studied at length in the past, and there are an increasing number of studies on the misuse of permission boundaries within super-apps.
Tracking during regular application usage
Here, we describe notable personal data tracking during general/regular application usage, outside the scope of what may be necessary for the operation of the WeChat application.
First, many WeChat MMTLS requests include, as a header:
- Platform data, specifically OS API version
- User’s internal WeChat UID
|Operating system details||sdk_gphone64_arm64arm64-v8a|
|WeChat internal UIN||a 10-digit number|
|WeChat internal UID (“MMGUID”)||a 15 character hex string”A9543d226cb39f0f_1683146442889″|
Table 1: Data items transmitted in WeChat MMTLS request headers.
When the user opens the WeChat application, or logs into the WeChat application, a comprehensive device report is sent, including:
|Phone number||Our U.S. testing phone number|
|IMEI||1234567890ABCDEF (for Android 10+)|
|Operating system details||sdk_gphone64_arm64arm64-v8a|
|Major OS release version||12 (Android 12)|
|Minor OS release version||8015633|
|Build display name||sdk_gphone64_arm64 12 S2B2.211203.006 8015633|
Table 2: Data items transmitted when a user opens or logs into WeChat.
WeChat sends various batch reports that send application usage and monitoring data, as well as location data, if particular permissions are enabled by the user. A class called “CliReportKV” that collects and sends detailed logging messages that are collected over a large period of time, and “NetTypeReporter” sends networking and location data at a regular interval back to WeChat servers.
Finally, we find that when running with all permissions enabled, WeChat accesses the location API on any regular startup of the application, while logged in. Enabling location permissions automatically enables the “People Nearby” feature, which broadcasts your location so you can identify and interact with WeChat users nearby.
Privacy policies and Weixin as “third party”
We identify additional privacy disclosure issues under this WeChat and Weixin dichotomy, and inconsistencies between the wording of these privacy policies to the observed behavior of the app.
From our discussions above, we make the following recommendations to the platform and platform users to improve user privacy.
Recommendations for WeChat:
- Remove forced enrollment of Mini Program analysis and tracking features, and change to an opt-in model. Currently, both developers and users are automatically enrolled into the We分析 tracking program with little notification. There is currently no way to opt-out of the program.
- Enact a more fine-grained permissioning model. Certain Mini Programs might need permissions, but users should have some guarantee that the host platform cannot abuse those permissions if granted.
- Allow users to opt-out of analytics tracking during usage of “Weixin” services. Users should be able to opt-out of tracking that is not necessary to the function of the app.
Recommendations for users:
- Use stricter permissions. In modern versions of Android, it is possible to restrict certain permissions (like location access) to when the application asks for it. Given the opportunistic use of enabled permissions by the application and the lack of a more fine-grained permissioning model between Mini Programs and the host WeChat platform, we recommend locking down application permissions.
- Update your device’s OS regularly for security features. Many new security features on modern versions of Android are working correctly to enforce permission boundaries and limit certain types of identifiers that are available to the application (such as IMEI). We recommend using an OS that has all of these features in place, and regularly updating for additional security features down the line.
We note that these improvements are incremental at best, and recommendations for users should be taken in context with a particular threat model. For users with certain high risk profiles, no amount of these adjustments will make WeChat completely safe to use. While we would generally suggest using alternative messaging applications if possible, we understand that most users use WeChat out of necessity, and that using more secure messaging applications can itself be a risk factor depending on the particular situation.
We would like to thank Jakub Dałek, Jonathan Mayer, Prateek Mittal, and Masashi Nishihata for their guidance and feedback on this project and report. Research for this project was supervised by Ron Deibert.
Appendix: Program components related to network requests
The contents of each MMTLS network message we captured included a serialized Protobuf object and a request URI that is internal to WeChat operation. From the deserialized, unencrypted data contents, associated metadata, and static analysis of code paths around these requests, we can infer the purpose and usage of this particular data.
WeChat internally provides a unified Java API for components to make network requests. Each API endpoint is represented in a Java class (which we refer to as “API class”) that is extended from the NetSceneBase class.
Each API class defines the following important properties:
- Internal URI, usually starts with /cgi-bin/.
- Request type, usually a 3 to 4 digit integer.
The most important API interfaces during request making are:
- NetSceneQueue.checkAndRun(): Starting a request of a specific API class. This function is a fixed implementation in NetSceneBase. This function is invoked from external components that need to make network requests.
- NetSceneBase.doScene() : Filling data fields. Implemented in API class.
- GYNetEndI.onGYNetEnd() : Callback when response is received. Implemented in the API class.
In the rest of this section, we illustrate the structure outlined above with a real API class: NetSceneEncryptCheckResUpdate. Note that most of the symbol names are given by us based on our understanding of its purpose.
In NetSceneEncryptCheckResUpdate, the request type is either 784 or 722 depending on the value of a configuration EcdhMgr.auth_info_prefs_use_new_ecdh.
NetSceneEncryptCheckResUpdate’s request URI is defined in its subclass EncryptCheckResUpdateRR. The URI also depends on the configuration value EcdhMgr.auth_info_prefs_use_new_ecdh.
NetSceneEncryptCheckResUpdate inherits most of its properties and methods from AbstractNetsceneCheckresUpdate, which implements doScene.
AbstractNetsceneCheckresUpdate also implements onGYNetEnd.
Finally, when is this request sent? We see NetSceneQueue.checkAndRun() called with NetSceneEncryptCheckResUpdate only in a simple wrapper within the class itself:
The wrapper checkandrun() is called from com.tencent.mm.ui.LauncherUI.onCreate(), which is called during the application startup. This fits our traffic observation that API request /cgi-bin/micromsg-bin/secencryptcheckresupdate gets sent during application startup.
More examples of request types, request data structures can be found in our Github repository.