You may have never heard of YY, 9158, Sina Show, or GuaGua, but in China these are four of the most popular social video platforms (SVPs). Collectively these applications have over 1 billion registered users.

Social video platforms offer real-time video streaming and social networking features that enable users to broadcast content and interact with groups over video, voice, and text. One of the most popular uses is broadcasting karaoke performances. SVPs are primarily monetized through the sale of virtual goods (such as virtual roses) that users give to performers during broadcasts. While musical performances account for the majority of revenues, SVPs are expanding to gaming, education, financial analysis, and online dating applications.

Like other social media companies operating in China, SVPs face a complex array of regulations and are liable for content posted to their platforms. Companies are expected to invest in staff and technology for ensuring compliance with government regulations. Failure to comply with regulations can lead to fines or revocation of operating licenses.

Today, we shine a light into how content filtering and monitoring operate on these platforms with the release of our paper “Every Rose Has Its Thorn: Censorship and Surveillance on Social Video Platforms in China”, at the 2015 USENIX Free and Open Communications on the Internet (FOCI) workshop.

Through reverse engineering we find keyword censorship in all four platforms and keyword surveillance capabilities on YY. Each platform implements these controls on the client side (i.e., on the application itself rather than on a remote server), which allows us to extract the full keyword lists from the software binaries.  In total, we reveal a dataset of 17,547 unique keywords used to trigger censorship.  We translate and contextualize each keyword and group them into content categories.  This is the largest dataset of sensitive keywords currently available to researchers and builds on previous research we conducted on chat applications in China that produced a dataset of 4,256 unique censorship and surveillance keywords. Our dataset is available on our GitHub page.

Key Findings

Inconsistencies in the content and implementation in keyword lists across companies and platforms

We compare our dataset to previously extracted keyword data from chat applications used in China and find very limited keyword list overlap within SVPs and between other platforms. This result substantiates previous findings that suggest companies are only given general directives from authorities and have a degree of flexibility in the implementation.

Range of targeted content including criticism of the government and collective action

While there is limited direct overlap in unique keywords, across lists we see trends in the topics that are targeted including criticism of the government, and collective action. These findings serve as a counterpoint to previous work  from King et al. who posit that content related to collective action is heavily censored on Chinese social media while content critical of the government is often allowed to persist.

Diversity of tactics in implementing censorship undoubtedly lead to a diversity in what content is ultimately restricted. We thus offer a cautious note about applying any comprehensive theory about an ecosystem as varied and fast changing as the Chinese Internet.