ResearchFree Expression Online

Censored Contagion II A Timeline of Information Control on Chinese Social Media During COVID-19

Summary

August 2020 marks over half a year since the COVID-19 pandemic spread across the world. From the beginning of the pandemic, we have been tracking censorship related to COVID-19 on Chinese social media platforms.

In March, we released a report documenting censorship on WeChat (the preeminent chat application in China) and YY (a popular Chinese live streaming platform) between December 2019 and February 2020. On YY, we found that censorship of COVID-19 content on the platform started on December 31, 2019 at the very start of the outbreak. On WeChat, we documented a broad scope of censored COVID-19 content ranging from criticism of the government to general health information.

Since the publication of that report, we have conducted daily tests on WeChat and collected 2,174 censored keywords  related to COVID-19 between January 18 through May 14, 2020 (see Appendix for details on our methods). This data provides a view into how narratives and messaging on the pandemic are controlled and molded on social media in China. In this report, we present a timeline that groups a selection of censored keywords into themes representing key milestones over the last six months of the pandemic revealing a censored history of COVID-19. To supplement this timeline, we include digital illustrations such as comics or cartoons that we found censored on WeChat Moments related to COVID-19 and China’s handling of the outbreak. These illustrations were originally collected from Twitter between March 30 and May 22, 2020.

Control of information on Chinese social media is mediated through a system of intermediary liability in which companies are held responsible for the content on their platforms. While specific government directives may be issued to companies during sensitive periods, a general state of control is maintained by pushing responsibility for censorship down to the companies. Failure to comply with content control regulations can result in fines, revocation of business licenses, and other penalties.

The level of direct government guidance for each censorship decision made by Chinese social media companies during the pandemic is unclear. However, it is apparent that from the beginning of the virus there has been official pressure to restrict information and as COVID-19 spread, we have observed censorship of a wide range of speech from criticism of the Chinese government, rumours, and conspiracy theories to health information.

The timeline is organized into three thematic periods:

COVID-19 Outbreak in China

The first period from December 31, 2019 to March 2020 covers the emergence and spread of COVID-19 in China. Censored keywords in this period focus on early warning of the virus, interactions between China and the World Health Organization (WHO), general health information, and criticism of China’s response to COVID-19.

COVID-19 Goes Global

On March 11, 2020, the WHO declared COVID-19 a pandemic. As the virus became global, the focus of censored content went beyond issues in China to cover international responses to COVID-19 and international criticism of the Chinese government.

US Becomes Epicentre of the Pandemic

In late March, the U.S. became the global epicentre of the pandemic. As the virus devastated healthcare systems and communities across the country, questions and rumours surrounding the origin of COVID-19 were propagated by leaders in the U.S. and China, straining relations between the two countries. Censored content in this period includes conspiracy theories, U.S. criticism of China’s political system, critical and neutral references to China-US relations, and U.S. domestic politics.

 

COVID-19 Outbreaks in China

COVID-19 first emerged in Wuhan, China in late December 2019. The official narrative from the Chinese government regarding this early stage is that Beijing only learned of the issue fully in January. It did not make a public announcement capturing the seriousness of the infection until January 20, 2020 when President Xi addressed the nation. Beijing then took decisive action by locking down the city of Wuhan followed by Hubei province as a whole. Throughout this period, the Chinese government tightly controlled information and delayed the release of data to the WHO and the international community.

COVID-19 information on Chinese social media was also closely guarded. On December 31, 2019, a day after Dr. Li Wenliang and other medical professionals warned of the COVID-19 outbreak to colleagues, popular live streaming platform YY added 45 keywords to its censorship list, all of which made references to the then-unknown virus that displayed symptoms similar to SARS. The timing of this censorship shows there was official pressure to censor content on the virus at very early stages of the outbreak. On WeChat, we found a broad range of censored content including criticism of China’s response to the virus and basic health information, such as the fact that it spreads from human contact.

December 31 2019

The Huanan Seafood Wholesale Market in Wuhan, Hubei Province.
The Huanan Seafood Wholesale Market in Wuhan, Hubei Province. Photographer: Noel Celis/AFP via Getty Images [Source]

Livestreaming Platform YY Begins to Censor Keywords Related to an Unknown SARS-like Virus

A leaked document revealed that the Wuhan Municipal Health Commission (WMHC) issued a “gag order” on December 30, 2019 that prohibited medical workers from releasing any information regarding the outbreak to the public. On December 31, 2019, the WMHC released a public notice for the first time addressing recent pneumonia cases but rejected speculation of human-to-human transmission or infection among medical workers. On the same day, YY began censoring keywords referencing the then-unknown virus that displayed symptoms similar to SARS. These findings suggest that there was official pressure on social media companies to censor COVID-19-related content in late December.
Keyword Language Translation Date Found Censored
武汉不明肺炎 Simplified Chinese Unknown Wuhan pneumonia December 31, 2019
爆發sars疫情 Traditional Chinese SARS outbreak in Wuhan December 31, 2019
武汉海鲜市场 Simplified Chinese Wuhan Seafood Market December 31, 2019
武汉卫生委员会 Simplified Chinese Wuhan Health Committee December 31, 2019

Table 1: Sample censored keywords.

Censored image on Chinese social media
Censored image

December 30 – February, 2020

Memorial for Dr. Ai Fen
Lam Yik Fei for The New York Times [Source]

Early COVID-19 Warnings

On December 30, 2019, Dr. Ai Fen, a director at Wuhan Central Hospital, posted information on WeChat about the new virus. She was later reprimanded for her efforts and told not to spread information about COVID-19. That same day, Wuhan-based ophthalmologist Dr. Li Wenliang privately warned fellow doctors of a potential outbreak of COVID-19, and was subsequently reprimanded by Wuhan authorities for “spreading rumours.” After returning to his practice, Dr. Li was infected with COVID-19 and died around February 7, 2020. As a narrative depicting Dr. Li as a martyr grew amongst the public, references to him were blocked on WeChat.
Keyword Language Translation Date Found Censored
冠状病毒+人传人+李文亮 Simplified Chinese Coronavirus + human-to-human + Li Wenliang February 9, 2020
大陸+肺炎+中共+李文亮 Traditional Chinese China + Pneumonia + CPC + Li Wenliang February 11, 2020
诺贝尔和平奖+提名+李文亮 Simplified Chinese Nobel Peace Prize + Nomination + Li Wenliang February 11, 2020
新华社+烈士+中共+李文亮 Simplified Chinese Xinhua News Agency + Martyrs + CCP + Li Wenliang April 5, 2020

Table 2: Sample censored keywords.

Censored image on Chinese social media
Censored image

January 1-20, 2020

Chinese Foreign Ministry Spokesperson Hua Chunying conducting a daily briefing
Jason Lee/Reuters [Source]

Authorities in China Send Mixed Signals on Information Disclosure

Chinese Foreign Ministry Spokesperson Hua Chunying’s daily briefing on February 3, 2020 revealed that China had “notified the U.S. of the epidemic and our control measures altogether 30 times since January 3.” But it was not until January 20 that China’s National Health Commission informed the Chinese public about the pandemic situation for the first time. Zhong Nanshan, expert group leader of the NHC, confirmed “human-to-human” transmission in the media. Between January 1 and 19, Wuhan officials emphasized the outbreak could not be leaked out, especially “not to the media,” and insisted that the virus “could be contained.”

Before Chinese leadership confirmed person-to-person transmission of the virus, numerous groups criticized the regime’s downplaying of the virus and denial of a dangerous outbreak.

Keyword Language Translation Date Found Censored
1月3日起+30次向美方通报+疫情信息 Simplified Chinese Since January 3 + notified US of + epidemic February 5, 2020
人传人+排查+研究+病毒 Simplified Chinese Person-to-person + investigation + research + virus February 28, 2020

 

美国疾控中心+冠状病毒 Simplified Chinese US Center for Disease Control + Coronavirus February 11, 2020

Table 3: Sample censored keywords.

Censored image on Chinese social media
Censored image

January 28, 2020

WHO Director-General Dr. Tedros Adhanom met with Chinese leadership including Xi Jinping
Xinhua News Agency [Source]

WHO Meets with Chinese Leadership

On January 28, 2020, WHO Director-General Dr. Tedros Adhanom led a delegation to Beijing and met with Chinese leadership including Xi Jinping. At the meeting, Xi Jinping said “I’ve been personally deploying, personally instructing the prevention and control of the epidemic.” Part of Xi’s quote (in italics) was modified by Xinhua News Agency.

Keyword Language Translation Date Found Censored
某人+亲自 Simplified Chinese Someone + Himself
[“someone” is a code referencing Xi Jinping]
February 3, 2020
亲自+皇上 Simplified Chinese Himself
[“someone” is a code referencing Xi Jinping] + Emperor
February 3, 2020
世卫+亲自指挥 Simplified Chinese WHO + Personally instruct February 4, 2020

Table 4: Sample censored keywords.

Censored image on Chinese social media
Censored image

February – March 2020

Ren Zhiqiang, a Chinese real estate tycoon, attends a conference in Beijing last November. Ren, 54, is locked in a battle with the government over the question of free speech.
ChinaFotoPress via Getty Images [Source]

Domestic Criticism of China’s COVID-19 Response

References to criticism of China’s COVID-19 response from prominent domestic critics are broadly censored on WeChat. For example, references to Chinese real estate tycoon Ren Zhiqiang’s widely circulated article “The lives of the people are ruined by the virus and a seriously sick system” have been consistently censored on the platform. As a longstanding critic of the CCP, Zhiqiang is known as “Big Cannon Ren” and was reported missing on March 12, shortly after he published his scathing essay.

Keyword Language Translation Date Found Censored
掩盖事实+任志强 Simplified chinese Cover up the facts + Ren Zhiqiang March 17, 2020
任总+失联 Simplified chinese Mr. Ren + Missing March 29, 2020
任总+公开信 Simplified chinese President Ren + Open Letter April 12, 2020
紅二代+任大炮 Traditional Chinese Red Second Generation + Ren Cannon April 13, 2020

Table 5: Sample censored keywords.

Censored image on Chinese social media
Censored image

COVID-19 Goes Global

As COVID-19 spread beyond China’s borders, criticism of China’s early response to the virus increased. With Europe replacing China as COVID-19’s epicentre, sharp critiques of Chinese leadership were heard from leaders in France and Germany. Chinese diplomats responded in turn with their own criticisms of COVID-19 responses from Western governments.

Around this period we found censored keywords on WeChat referencing criticisms of China from Europe, both from global leaders and outspoken members of the media.

Mentions of the Red Cross and the WHO in conjunction with references to Chinese leadership were also censored. This blocking may be due to China’s aim to counter rumours that Chinese officials pressured the WHO to downplay the severity of the initial COVID-19 outbreak. Censorship as a result of such criticisms reflect the Chinese state’s continued efforts to portray their COVID-19 response as timely and adequate, as tensions began to rise internationally and critiques of their response were increasingly circulated on Chinese social media. By downplaying international criticisms on WeChat and highly publicizing instances of Chinese mask diplomacy, an effort to guide the narrative on China’s domestic and international epidemic response emerged.

March 11, 2020

Image of WHO Director-General Dr. Tedros Adhanom
Fabrice Coffrini/AFP/Getty Images [Source]

COVID-19 Becomes Global Pandemic

Concerned by the alarming levels of the virus’ spread, the WHO announced that the COVID-19 outbreak was a global health emergency on January 30. On March 11, the WHO declared COVID-19 a global pandemic. WeChat began blocking references to the WHO, the Red Cross, and to outbreaks occurring around the world.

Keyword Language Translation Date Found Censored
疫情+红会+4+政府+湖北 Simplified Chinese Epidemic situation + Red society + 4 + Government + Hubei February 4, 2020
湖北+红十字+政府+不知 Simplified Chinese Hubei + Red Cross + Government + I don’t know February 25, 2020
土耳其+中东+沙特+病毒 Simplified Chinese Turkey + Middle East + Saudi Arabia + Virus March 13, 2020
疫情+英國+政府+官員 Traditional Chinese Outbreak + UK + government + officials April 6, 2020
世卫+土共 Simplified Chinese WHO + Turkish Communist Party April 11, 2020

Table 6: Sample censored keywords.

Censored image on Chinese social media
Censored image

January – April 2020

Picture of South Korean President Moon Jae-in
South Korea Presidential Blue House/Yonhap via AP [Source]

COVID-19 Response by Other Countries

On January 20, 2020, the first confirmed cases of COVID-19 were reported in Japan, South Korea, and Thailand. The following day, the U.S. reported its first confirmed case in Washington state. In later weeks as the virus continued its rapid spread across Europe, EU country representatives clashed with Chinese leaders and diplomatic authorities on each others’ COVID-19 response. In one case, the Chinese Embassy in Paris announced in a blog post that “Residents of retirement homes were made to sign certificates of ‘waiver of emergency care’; the nursing staff of the Ehpad [state-funded care homes] abandoned their posts overnight, deserted collectively, leaving their residents to die of hunger and disease.” Following considerable backlash, the Chinese Ambassador to France publicly supported the statement, but our censorship tests show references to this incident were blocked on WeChat.

Keyword Language English Translation Date Found Censored
文在寅+中央+總統+病毒 Traditional Chinese Wen Zaiyin + Central + President + Virus February 26, 2020
普京+中俄+俄罗斯+病毒 Simplified Chinese Putin + China Russia + Russia + Virus March 1, 2020
駐法大使+疫情+外交官+外長+不符合 Traditional Chinese Ambassador to France + epidemic situation + diplomat + foreign minister + non-conformance April 15, 2020
駐法大使+擅離職守+生命+西方國家 Traditional Chinese Ambassador to France + Dismissal + Life + Western Countries April 17, 2020

Table 7: Sample censored keywords.

Censored image on Chinese social media
Censored image

February – April 2020

Image of a worker in a warehouse
Reuters [Source]

China’s Mask Diplomacy and Blowback

As Italy, Spain, and other European countries overtook China in total number of COVID-19 deaths, China began engaging in “mask diplomacy” by offering medical supplies to the region. In the meantime, state media and government officials actively countered international criticism of China while promoting China’s efforts in fighting the pandemic. The German newspaper Bild issued an open letter to Xi Jinping: “You shut down every newspaper and website that is critical of your rule, but not the stalls where bat soup is sold. You are not only monitoring your people, you are endangering them – and with them, the rest of the world.”

Keyword Language English Translation Date Found Censored
新冠肺炎+影响+一带一路 Simplified Chinese New Crown Pneumonia + Impact + One Belt One Road February 2, 2020
隐瞒疫情 + 周刊 + 德 Simplified Chinese Concealing the epidemic + implementation + occurrence February 13, 2020
撒钱 + 意大利 Simplified Chinese Spend Money + Italy March 18, 2020
口罩外交+促成教宗+毫無所悉+中梵開展 Traditional Chinese Mask Diplomacy + Contribute to the Pope + Don’t know anything + China-Vatican April 21, 2020
德国+200+公开信 Simplified Chinese Germany +200+ open letter April 22, 2020

Table 8: Sample censored keywords.

Censored image on Chinese social media
Censored image

US Becomes Epicentre of the Pandemic

While WeChat censored a range of references to China’s relations with various countries during the pandemic, the majority of international relations-related keywords we found blocked on WeChat centered on tensions between the U.S. and China.

Censorship on WeChat reflects the ongoing clash of Beijing’s and Washington’s narratives surrounding COVID-19 on the international stage. On one hand, China attempts to downplay its early cover-up of the crisis at home and portray itself as a transparent and responsible world leader in the fight against the pandemic. On the other hand, the U.S. tries to scapegoat China for the widespread of COVID-19 cases and its failure to handle the pandemic timely and properly.

In March 2020, Chinese government official Zhao Lijian promoted a conspiracy theory on Twitter that patient zero of COVID-19 was in the U.S. In April 2020, American Republican legislators introduced a strategic report to Republican political candidates on how to harmonize the party’s election candidates’ messaging against China as the U.S. became the pandemic’s epicentre. The report encourages Republican leaders to assert that they will “push for sanctions on China for its role in spreading this pandemic.”

Whereas censorship was largely concerned with domestic politics in the early days of the pandemic, we found the volume of censored keywords on WeChat referencing China’s relations with the U.S. increased as the U.S. became the pandemic’s epicentre. Censored content included references to criticism of China’s political system by American leaders, conspiratorial discussions of the origin of coronavirus by both Chinese and U.S. politicians, interactions between President Xi and President Trump, and the politicalization of the pandemic in the U.S.

March 27, 2020

Image of President Donald Trump
Reuters. Photo by Carlos Barria [Source]

US Reports Most Cases of COVID-19 in the World

On March 27, 2020, the U.S. overtook China as the country with the most COVID-19 cases in the world. According to news reports as far back as late November, U.S. intelligence officials warned the Trump administration that a virus was sweeping through Wuhan, China. The report adds to criticism of how the U.S. administration responded to the pandemic.

Keyword Language Translation Date Found Censored
解除封锁+美国总统+病毒+特朗普 Simplified Chinese Lifts lockdown + U.S. President + Virus + Trump April 23, 2020
疫情+皇帝+川普 Simplified Chinese Pandemic + Emperor + Trump April 23, 2020
川普老中医 Simplified Chinese Experienced Traditional Chinese Medicine doctor Trump April 25, 2020

 

民主党+川普+病毒+广告 Simplified Chinese Democrats + Trump + Virus + Advertise May 7, 2020
川普+测试+感染+政府+病毒 Simplified Chinese Trump + Test + Infection + Government + Virus May 13, 2020
川普+疫情+病毒+美元 Simplified Chinese Trump + Virus + Virus + U.S. dollar May 19, 2020

Table 9: Sample censored keywords.

Censored image on Chinese social media
Censored image

February – March 2020

Screenshot of a WSJ article that reads "China is the real sick man of Asia".
Kevin Frayer/Getty Images [Source]

U.S.-China Tensions Lead to Expulsion of Journalists

On February 19, 2020, China expelled three Wall Street Journal correspondents following an op-ed that called China the “real sick man of Asia.” Washington then slashed the number of journalists permitted to work in the U.S. at five major Chinese state-owned media outlets and imposed visa restrictions on journalists with Chinese citizenship. China followed up with the expulsion of more American Journalists.

Keyword Language Translation Date Found Censored
Sick+Man+Real+China+Asia English n/a February 24, 2020
官員+東亞病夫+華爾街日報+中方 Traditional Chinese Officials + Asia’s Sick Man + Wall Street Journal + Chinese authorities February 26, 2020

Table 10: Sample censored keywords.

March – April 2020

US and China Trade Blame on Origins of the Virus

On March 12, 2020, Chinese diplomat Zhao Lijian promoted a conspiracy theory on Twitter that the COVID-19 originated in the U.S. On April 6, China published its own version of the timeline on “COVID-19 information sharing and international cooperation,” stressing China’s timely notification of the pandemic to countries including the US. On April 17, Republicans in the U.S. released a strategic report to Republican political candidates outlining how to “dump China” using COVID-19 as political fuel.

Keyword Language Translation Date Found Censored
班农+生化实验室 Simplified Chinese Bannon + Bio Lab March 1, 2020
生化武器+政府+病毒+官員 Traditional Chinese Bioweapon + government + virus + officials March 1, 2020
甩锅+赵立坚+病毒来源 Simplified Chinese Shifts blame + Zhao Lijian + Virus origin April 18, 2020
甩锅+国家主席+指挥 Simplified Chinese Shifts blame + President + Direct April 19, 2020
塔尼耶+P4+乃人工合成 Simplified Chinese (Luc) Montagnier + P4 + is artificially synthesized April 23, 2020

Table 11: Sample censored keywords.

Censored image on Chinese social media
Censored image

April – May, 2020

Picture of a street sign that reads "Li Wenliang Plaza"
HKFP [Source]

Politicization of COVID-19 Continues

On May 27, 2020, U.S. Secretary of State Mike Pompeo tweeted in support of pro-democracy Hong Kong protesters amid the COVID-19 pandemic. U.S. senators introduced a bill to rename the street outside the Chinese embassy in Washington, D.C. “Li Wenliang Plaza.” President Trump said that China “will do anything they can” to make him lose his re-election bid in November. Meanwhile, China has launched an extensive propaganda campaign to paint its coronavirus response as transparent and effective, and, to shift the blame back onto the U.S., Chinese Foreign Minister Wang Yi remarked that Washington had been infected by a “political virus” to continually attack China.

Keyword Language Translation Date Found Censored
反中亂港分子+蓬佩奧+醜惡行徑 Traditional Chinese Anti-China disruptive elements in Hong Kong + Pompeo + Ugly behaviours April 19, 2020
法案+主席+病毒+對香港 Traditional Chinese Bill + President + Virus + To Hong Kong April 20, 2020
中共+法案+主席+參議院 Traditional Chinese CCP + Bill + President + Senator April 20, 2020
民主党+川普+大选+共和党 Simplified Chinese Democratic Party + Trump + Election + Republican Party May 2, 2020
駐美大使館+改為+李文亮廣場 Traditional Chinese [China’s] embassy in the U.S. + Change to + Li Wenliang Plaza May 9, 2020
李文亮+发文+逝者安息+美国议员+把中国 Simplified Chinese Li Wenliang + Post article + Rest in peace + U.S. senator + Treats China May 31, 2020
选民+大选+病毒+特朗普 Simplified Chinese Voter + Election + Virus + Trump May 31, 2020

Table 12: Sample censored keywords.

Censored image on Chinese social media
Censored image

Conclusion

The censored content we collected from Chinese social media invites unique reflection on the COVID-19 pandemic and demonstrates the harms of information control.

Throughout the pandemic, information on COVID-19 has been tightly controlled on Chinese social media even after China managed to keep the spread of the virus under control at later stages. The themes of censored content show areas of sensitivity for the Chinese government from how the virus is contained in China, international diplomacy, and ongoing tensions between the U.S. and China. Outside of this political content, we also found censorship of health-related information, including the number of confirmed COVID-19 cases and deaths, as well as references to personal protective equipment supplies and medical facilities. Censorship of this type of health-related information can hinder disease prevention and awareness. For example, if messages on future waves of COVID-19 infections in China are censored on WeChat, the public could be put at risk.

Censorship of COVID-19 on Chinese social media illustrates the ongoing politicization of the pandemic and the importance of fact-based, open, and effective communications pertaining to public health awareness and response.

Acknowledgements

Thanks to Miles Kenyon, Adam Senft, Chris Parsons for review and comments, Pellaeon Lin, and an anonymous researcher for assistance. Professor Ron Deibert provided supervision and guidance to the project.

Appendix: Methods

Documenting Censorship on YY

YY censors keywords client-side meaning that rules to perform censorship are found inside of the application. YY has a built-in list of keywords that it uses to perform checks to determine if any of these keywords are present in a chat message before a message is sent. If a message contains a keyword from the list, then the message is not sent. The application downloads an updated keyword list each time it is run, which means the lists changes over time.

Client-side censorship makes it possible for us to reverse engineer the application and then download and decode the exhaustive keyword lists YY uses to trigger censorship. Using this method, we have been tracking all updates to YY’s client-side keyword blacklist since February 2015 on an hourly basis.

Documenting Censorship on WeChat

WeChat only censors content server-side meaning that all the rules to perform censorship are on a remote server. When a message is sent from one WeChat user to another, it passes through a server managed by Tencent (WeChat’s parent company) that detects if the message includes blacklisted keywords before a message is sent to the recipient. Documenting censorship on a system with a server-side implementation requires devising a sample of keywords to test, running those keywords through the app, and recording the results. In previous work, we developed an automated system for testing content on WeChat to determine if it is censored.

WeChat censors a message based on whether it contains a blacklisted keyword combination. A keyword combination consists of one or more keyword components. When a keyword combination consists of only one component (e.g., “习近平到武汉” [Xi Jinping goes to Wuhan]), then a message is filtered if it contains that component. For a keyword combination that contains more than one component (e.g., “习近平” [Xi Jinping] and “疫情蔓延” [Epidemic Spread]), a message is censored only if every component in the combination appears somewhere in the message, although not necessarily adjacent to each other. In this case, censorship rules may be implemented to more precisely target content.

Scripting Chats

To discover censored keyword combinations on WeChat, we script group chat conversations. We programmatically collect articles listed on the front page of a collection of news websites. We then extract article text, consisting of each article’s title and body text, from each article and send it in a WeChat group chat consisting of three test accounts: one registered to a mainland Chinese phone number and two registered to Canadian phone numbers (none of these accounts were ever used by actual users). We use one of the Canadian accounts to send messages and the Chinese account to passively monitor whether messages sent in the group chat have been filtered. The other Canadian account acts as a “third wheel,” existing only to facilitate the creation of a group chat (i.e., a chat with three or more users). Throughout this process, we limit our test accounts to interacting with each other in the group chat and to never interact with real users of the platform.

After we send the extracted article text as a message in the WeChat group chat, if the Chinese account did not receive it, then we flag the message text as containing one or more keyword combinations which trigger text censorship. We then perform further tests to reduce the text of the article text to the minimum number of characters required to trigger censorship. Finally, we group each resulting keyword combination into content categories based on the underlying context.

Discovering Keyword Censorship

We ran our testing from January 1 to May 31, 2020, from a University of Toronto network. Our collection of news websites from which we extracted articles consisted of Chinese state media, Chinese-language news aggregators that post trending articles published by state and commercial media in China, news websites based in Hong Kong and Taiwan, and data provided by WeChat Scope and CoFacts . In previous work, we found that extracting article text from these news sources is an effective way to discover censored keyword combinations related to events over a defined time period.

In total we found 2,174 keywords blocked. The Table below provides a breakdown of censored keywords found within each month of our testing period.

Date Number of Keywords Found Blocked
January 2020 162
February 2020 645
March 2020 501
April 2020 628
May 2020 238

Table 13: Number of keywords found blocked.