This post has been updated November 19th, 2013 click here for updated content
This report describes the technical details of client-side censorship functionality in the LINE messenger client for Android, and a method for disabling it. This is the first in a series of research reports analyzing information controls and privacy in mobile messaging applications used in Asia. An introduction to the project can be found here.
On May 20, 2013, Twitter user @hirakujira reported finding a list of 150 blocked words within Lianwo (连我) , the Chinese version of the LINE application. This finding was prompted by a string in the program related to blocking capability. For more information on what specific keywords are being blocked, Jason Q. Ng translated the keyword lists from Chinese to English and describes context behind them. The first keyword list extracted by @hirakujira is described in a series of blog posts (full list available here) and the most recent keyword list uncovered by Citizen Lab and translated by Ng is available here.
A more in-depth analysis of the international LINE client is motivated by similar messages such as this one in the application’s resource files (found at res/values*/strings.xml), which clearly indicates that it is not just the Lianwo version of LINE that censors messages within the application:
<string name=”chathistory_chinese_user_word_error”>This message contains forbidden words and cannot be sent.</string>
Reverse engineering the LINE application reveals that when the user’s country is set to China, it will enable censorship functionality: it will download a list of censored words from Naver’s website, and then disallow sending any messages that contain any of those words, as well as replace those words with asterisks when received. The region data is encrypted in newer versions of LINE, likely due to users changing it in order to get free in-app downloadable content. Fortunately, it is still possible to disable censorship by changing the encrypted region.
The code analysis in this report was performed on LINE v3.8.5 for Android (APK MD5: 56c9076d56cc20f618df83eaf97a52dc), and the behavior was confirmed on an Android device running v3.9.3 downloaded directly from the Google Play store, current as of November 14, 2013. The presence of censorship functionality has been confirmed as far back as v3.4.2, released on January 18 2013, using APK files found at AndroidDrawer.
This verification makes it difficult for researchers outside of China to test and verify censorship implemented in LINE.
The application uses two URLs to retrieve a word list. The URLs are apparently not entered into the settings database if LINE is installed using a non-Chinese region, but can be found in the file app-config.properties at the root of the LINE .apk file. These URLs have been present since v3.4.2.
The URLs and their internal descriptions are:
The file bwi has information on the version of the bad words file, containing a string that looks like this: 21,4320,v21, One interesting thing to note is that the file is not directly accessible over HTTP as referenced in the above URL without a user agent string being set to “Android Mobile LA/xx”, where xx is a location code. It is, however, accessible via HTTPS without the UA string:
The raw bad words file is not actually hosted at the above URL; the filename above has the version number appended to it. For example, the v21 file is hosted at https://line.naver.jp/app/resources/bwraw.v21. As of October 31, 2013, the files for v21 and v20 are available. The file can be accessed over HTTPS without the user agent string described above, and over HTTP with it.
The bad word files are Base64 encoded and encrypted using AES in cipher block chaining (CBC) mode with PCKS#7 padding. Decryption uses a static key stored in the binary:
For reference, the 256-bit AES key is:
4c aa 91 2f 80 eb 3e d4 04 a5 2c 3c 48 d2 5b d6
f7 2e 81 92 db 55 9e 49 aa 1b 99 67 e7 51 75 a7
Once downloaded, the bad words file is stored in the application’s cache directory as cbw.dat. If this list is unavailable, LINE will default to using a smaller internal list:
The application also sets preference information on the downloaded word lists. The delay time of 10800000 milliseconds is equivalent to 3 hours:
The internal list of censored words is also Base64 encoded and encrypted using the same key, and has remained unchanged since its introduction in v3.4.2:
For a decrypted version of this list, see [here].
The bad words check is only performed if the region stored in LINE’s settings is “CN”:
Users in other regions are not subject to censorship, and changing the region code from CN to any other region should prevent the censorship functions from being invoked.
The LINE installation’s region is stored in a SQLite database on the Android device, in the file /data/data/jp.naver.line.android/databases/naver_line.
To access the database, the Android device must be rooted. The database can be opened using either a SQLite GUI program such as SQLite Editor, or by using the command line program sqlite3 from a shell on the Android device. This example uses the command line program, which is preinstalled on many Android devices.
Listing the tables shows the type of information stored in the database:
The database setting contains the region as well as other user account details. One interesting thing to note is that some of these values are Base642 encoded:
This change is a recent development: as of mid-2012, users were changing their region in order to download free stickers and in-app enhancements only available in other areas. Two examples of this can be found here and here. Compare the Base64 values above to a screenshot of the database from a blog post on July 25, 2012:
Source: Frank’s Blog
When unencoded, these settings do not show up as the expected plaintext; the reason they are Base64 encoded is not just for obfuscation, but also to convert encrypted binary data into printable characters. It is likely that the region is stored encrypted to prevent people from changing their region to have access to more downloadable content.
For example, decoding PROFILE_REGION gives 16 bytes of binary data:
00000000 26 c8 9b 97 37 68 d1 8c 83 42 43 f5 84 f2 83 0c |&...7h...BC.....|
By looking through the decompiled binary, we can see that this data is encrypted. Like the bad words file, it uses AES, but with slightly different options.
Unlike the static key used for the bad words file, the settings are encrypted using a key derived from the android_id value of the phone. More information on this value can be found here, and the value can be determined using a program such as Android Id Info.
This function generates the AES decryption key by taking the android_id identifier, hashing it using a standard Java function, then applying another transformation algorithm to it to derive the key. The magicNumber value is hardcoded into the program and is equal to 15485863.
AES decryption is performed in electronic codebook (ECB) mode with no padding:
Using our example from above, we can decrypt the string to determine the region:
To turn off censorship, we want to go in the reverse of the above example. On a China-region installation, changing the region code from CN to another region (e.g. CA or US) will disable censorship functions.To assist users we are releasing LINE Region Code Encrypter Tool developed by Seth Hardy and Greg Wiseman (Senior Data Visualization Developer, Citizen Lab) for changing regions in the LINE client to disable regionally-based keyword censorship in the application.
Update – November 19th, 2013
An article by Kaylene Hong of TheNextWeb provides a response from LINE Corporation regarding the presence of regionally-based keyword censorship in their chat program: “A LINE spokesperson drew a clear line between Lianwo and Line, emphasizing that the Chinese version of LINE is different from the global version.” However, all of the censorship functionality described in this report is implemented in the global version of LINE, and it is unclear whether the Lianwo version of LINE has additional functionality or is merely rebranded.
Incoming messages in LINE are also subject to censorship when the region is set to China. The LINE client will replace any words found in bad word files (internal or downloaded) with asterisk characters. As with outbound messages, inbound censorship requires a strict match, and can be bypassed by adding spaces or other punctuation.
1. Names such as this are not the original names assigned to the functions, which were removed from the program when it was compiled. These informative names are assigned by an analyst while exploring the program.
2. Values we are not looking at in this analysis have been changed to protect user data.