ResearchFree Expression Online

Planet Netsweeper Section 1 – Methodology & Technical Findings

This is part two of a four-part report on the global proliferation of Netsweeper

Section 1- Methodology & Technical Findings

This section details the research questions that informed our study. We also outline in detail the methods that we adopted to identify Netsweeper installations worldwide, and those that we employed to reduce the findings to countries of interest. We also present high-level technical findings and observations.

1.1 Research Questions

Our research for this report was guided by the following questions:

  1. Can we identify all Netsweeper installations on the Internet? What technical methods and tools can we use to do that?
  2. What tools and methods can we use to confirm which of these Netsweeper installations are on the networks of consumer-facing ISPs?
  3. Are any of the installations that are identified on consumer-facing ISPs located in jurisdictions in which their use represents a human rights concern?
  4. What can we say about how censorship is applied by the installations found in jurisdictions associated with human rights concerns? What types of content are censored? How is it censored? How transparent is such censorship to users? What is the legal and regulatory framework governing censorship in these jurisdictions?
  5. Can we confirm if the installations found in jurisdictions that are associated with human rights concerns are actively serviced by Netsweeper, Inc.?

1.1.2 Countries of interest

Netsweeper has customers around the world. While our prior research has focused on the use of Netsweeper technology in countries of the Global South, the company also has customers in the Global North, including Canada, where it is headquartered, and in the United Kingdom, where it opened an office in 2017. Many of the purchasers of Netsweeper products are institutional customers, particularly in the education sector, where the company advertises compliance with both U.S. (CIPA) and U.K. (OFSTED) guidelines regulating children’s access to online content. Other customers in these countries include private companies seeking to control employee access to the Internet.

Our primary research interest pertains to the filtering of content on consumer-facing ISPs. In most cases, filtering on consumer ISPs does not have an opt-out option, which leaves users with no alternative for accessing blocked content (unless they are able to switch to a non-filtering provider). This same dynamic is not at play in the case of employees or students who experience website or Internet blocking in an institutional or a corporate setting. As a result, we have chosen to exclude institutional and private-sector Netsweeper installations from deeper analysis.

We further focus on countries that routinely violate human rights in areas of free expression, as we think that these countries are more likely to abuse filtering technologies to restrict access to political or human rights content. We selected countries ranked as “Authoritarian” by the 2017 Economist Democracy Index and added other countries that are not ranked as “Authoritarian,” including India, Pakistan, and Somalia, because of the unique history and characteristics of Internet filtering in the countries. India has a long and complex history with Internet filtering that has been the subject of many contentious public debates. Historically, Pakistan has censored the Internet extensively, including blocking all of YouTube in 2008. Somalia is a failed state torn by insurgencies and persistent violence.

1.2 Methodology

Our technical methodology is divided into three phases. In the first phase, we collected a list of IP addresses that might be associated with Netsweeper installations. In the second phase, we filtered our list to include only bona fide Netsweeper installations deployed on consumer ISPs in countries of interest. In the third phase, we examined what content these Netsweeper installations were blocking and whether they may have been communicating with Netsweeper, Inc.

Purpose Methods Data Source
Develop a list of IP addresses of Netsweeper installations Searching existing Internet scanning data sources Censys, Shodan
Develop a list of IP addresses of Netsweeper installations Searching existing Internet censorship data sources OONI, ICLab, Packet captures, Ad hoc testing
Filter our list of IP addresses to bona fide Netsweeper installations on consumer-facing ISPs Remotely scanning the IP addresses Specialized scanning
Identify content blocked by these Netsweeper installations Searching existing Internet censorship data sources OONI, ICLab, Packet captures, Ad hoc testing
Identify content blocked by these Netsweeper installations Remotely scanning IP addresses in countries of interest using HTTP Host headers aimed at triggering censorship Host Header test
Identify whether the Netsweeper installation may be communicating with Netsweeper, Inc. Running our Beacon Box test Beacon Box test

Table 1.1. Our methodology

1.2.1 Developing a list of IP addresses of Netsweeper devices

We developed our list of IP addresses by examining existing Internet scanning data from two sources and existing censorship measurement data from two sources.

Existing Internet scanning data

Shodan and Censys are two platforms that probe most Internet-connected devices at regular intervals and make the results publicly accessible. In previous work, we developed various signatures for how Netsweeper devices respond to the probes that Shodan and Censys send. We queried these services daily for results matching our fingerprints. Figure 1.1 shows the specific queries we sent to Shodan and Censys.

Figure 1.1. Signatures used to identify Netsweeper installations in Censys and Shodan search

The IP addresses we collected provide a broad picture of publicly visible Netsweeper installations, including both public ISP installations, and institutional and private sector installations.

Existing Internet censorship data

The Open Observatory of Network Interference (OONI) and Information Controls Lab (ICLab) collect data on Internet filtering and network interference from vantage points all around the world by convincing volunteers in various countries to run specialized measurement tools. The tools include web connectivity tests that attempt to access lists of potentially censored content, collect the resulting responses, and then analyze them for evidence of censorship. OONI and ICLab data are both publicly searchable.

We searched OONI and ICLab data using signatures (Figure 1.2) that we developed in our prior work to identify additional Netsweeper installations.

Figure 1.2. Signatures used to identify Netsweeper installations in OONI/ICLab data.

We included the blockpage IP addresses in our list of IP addresses of possible Netsweeper installations. We also used OONI and ICLab data (Section 1.2.3) to identify blocked websites.

1.2.2 Filtering our list of IP addresses

We next sought to narrow our list of IP addresses (Section 1.2.1) to bona fide Netsweeper installations filtering content on consumer-facing ISPs. We first ran probes against each IP address to see whether the IP was associated with a bona fide Netsweeper installation. Second, we probed each IP to see whether the installation was on a consumer-facing ISP.

Is the IP address a bona fide Netsweeper installation?

We ran a variety of tests to answer this question, described in Table 1.2.

Question to be answered Data source Value suggestive of Netsweeper installation Test code
Do the headers for a request for the IP address show a direction to http://<IP address>/webadmin? Headers from HTTP HEAD request to http://<IP address> Redirection to http://<IP address>/webadmin b1
Is the redirect from a previous data point followed by a redirect to http://<IP address>/webadmin/redirect? Headers from redirection to http://<IP address>/webadmin Redirection to http://<IP address>/webadmin/redirect b2
Does an attempt to access http://<IP address>/webadmin return a valid page? HTTP GET request to http://<IP address>/webadmin Valid page b3
Does an attempt to access http://<IP address>/webadmin/alert return a valid page? HTTP GET request to http://<IP address>/webadmin/alert Valid page b4
Does an attempt to access http://<IP address>/webadmin/deny return a valid page? HTTP GET request to http://<IP address>/webadmin/deny Valid page b5
Does an attempt to access http://<IP address>:8081/auth/Login.action return a valid page? HTTP GET request of http://<IP address>:8081/auth/Login.action Page containing copyright notice: “2009 Netsweeper Inc.” b6
Does the sysdesc SNMP value of the IP address contain the string “.netsw”? Public GET of SNMPv2 value: “SysDescr” E.g. “Linux NS-WebAdmin 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64” b_snmp
Does a reverse DNS resolution of the IP address suggest that the IP address belongs to a Netsweeper installation? Reverse DNS lookup on the IP A domain name which is indicative of a Netsweeper installation (e.g. nsfilter2.spg.more.net ) rdns
Does the page returned from /deny define CSS templates which suggests a Netsweeper installation? HTTP GET request from http://<IP address>/webadmin/deny “Shared”

“Webadmin2012”

“Webadmin2016”

css
Does the /deny page include a “mailto” link which suggests it is a Netsweeper

installation?

HTTP GET request of http://<IP address>/webadmin/deny HTML page body contains “mailto:” link suggestive of Netsweeper denypage_mailto
Does the page returned from /deny contain an HTML title which suggests a Netsweeper installation? HTTP GET request from http://<IP address>/webadmin/deny “Access Denied” denypage_title

Table 1.2. Summary of data points collected to validate potential Netsweeper installations. The “Test code” values are referenced in the data analysis of our country case studies in Section 2.

Discussion of tests

In general, we considered an IP address to belong to a bona fide Netsweeper installation if the following Boolean expression was matched:

b_snmp || (b1 && b2) || b6 || (b1 && b3 && b4 && b5)

The b_snmp test, which checks whether the SNMP sys_descr value contains the string “.netsw”, is a very good indication that Netsweeper software is installed, as this string is unlikely to appear in servers not running software developed by Netsweeper. Similarly, the b6 test tells us whether or not a visit to the path: “/auth/Login.action” on port 8081 returns a page with a copyright notice of “2009 Netsweeper Inc.”

We do not weight some of the other tests as highly, as they could be matched by non-Netsweeper products. For instance, test b1 only measures whether a direct visit to the IP address redirects to the path: /webadmin. It seems conceivable that non-Netsweeper products could match this test, as “webadmin” is a common word. The tests b3 to b5 all return true if any page is returned in response to their respective queries. A web server that is configured to respond with HTTP 200 to any request would likely return “True” to all these tests. However, it is less likely that a non-Netsweeper server would be in our initial list of IP addresses, because of how we generated that list (Section 1.2.1).

The rdns, css, denypage_title, and denypage_mailto tests do not have Boolean return values. Therefore, the strength of these tests depends on how clear the value returned is in regards to potentially identifying the function of the server. For example, if the deny page title was “Netsweeper – Blocked,” it would be a strong indicator of a Netsweeper installation; if the title was “Not Found,” that would be a weak indicator.

Is the installation on a consumer-facing ISP?

We ran a variety of tests to answer this question, described in Table 1.3.

Question to be answered Data source Value suggestive of consumer-facing ISP
Does the page returned from /deny contain links to domains which suggest who is responsible for administering the installation? HTTP GET request from http://<IP address>/webadmin/deny “nsblock.<ISP NAME>.com”
Does a reverse DNS resolution of the IP address suggest who is responsible for administering the installation? Reverse DNS lookup on the IP A domain name which is indicative of the administrator of the installation (e.g: restrict.kw.zain.com)
Does the sysdesc SNMP value of the IP address suggest who is responsible for administering the installation? Public GET of SNMPv2 value: “SysDescr” E.g. “Linux NS-WebAdmin 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64”
Does the /deny page include a “mailto” link which suggesting who is responsible for administering the installation? HTTP GET request of http://<IP address>/webadmin/deny HTML page body contains “mailto:” link indicative of the installation’s administrator
Do the OONI or ICLab measurements for this installation show a blockpage that includes logos or text indicating an ISP or government authority? OONI and ICLab Blockpage contains logos or text indicating an ISP or government authority
Do the OONI or ICLab measurements for this installation show censorship from multiple vantage points? OONI and ICLab Multiple different vantage points experiencing censorship by a single Netsweeper installation
Do our results from Section 1.2.1 show multiple adjacent IP addresses on the same network? Censys, Shodan, OONI, and ICLab Multiple adjacent IP addresses on the same network

Table 1.3. Summary of data points collected to validate whether Netsweeper installations are on consumer-facing ISPs

1.2.3 Identify content blocked by Netsweeper installations

We further examined bona fide Netsweeper installations on consumer-facing ISPs in countries of interest in order to determine what websites they were blocking and whether or not they might be communicating with Netsweeper, Inc.

Ad-hoc manual testing

In some cases, we collected limited data from users who had access to a vantage point on a network in a country of interest. In such cases, users who had access to a network of interest accessed a set of websites within a web browser and noted the responses. Identifying if a site is inaccessible as a result of deliberate filtering is context-specific and is discussed in further detail in specific country case studies. This type of testing has limitations: it relies on manual data entry and interpretation of results observed. This testing leads to a higher likelihood of error than automated testing.

OONI and ICLab data

We examined our results from OONI and ICLab (Section 1.2.1) to determine which websites were being blocked. OONI and ICLab use the same testing lists, which include a global list tested in every country, and a per-country local list. The lists are manually created by volunteers and there is variation in the size of the lists and the scope of content they cover. As a result, they may only find a subset of censorship that is present at the time of testing. These lists do not provide an exhaustive inventory of Internet filtering.

Host Header test

We also used a measurement technique that does not require a vantage point on the censored network. This test involves sending requests to IP addresses on a censored network and observing if any of these packets receive an injected blockpage.

To begin, we conducted a zmap scan of the Internet, sending all IPv4 addresses a request containing a Host field that might be blocked by Netsweeper. We picked low-risk URLs, such as invalid URLs that did not point to any web content, or the Netsweeper “deny page test” (e.g., denypagetests.netsweeper.com/category/catno/32) for these global scans in order to avoid a situation where a target IP address might be implicated in circumventing censorship. We examined responses to our scan with an IPID value of 242, which our previous research had shown as being a characteristic of Netsweeper injections. We selected a subset of those IPs for further in-depth testing. In order to ensure ethical testing, we selected only IPs tagged as an “infrastructure router” on Censys or IPs that were clearly operated by ISPs themselves and not ISP customers. We then tested these IPs by sending requests for URLs in our local testing list and double-checked our results.

Beacon Box test

We next sought to determine if Netsweeper installations were communicating with infrastructure controlled by Netsweeper, Inc. This test uses properties of the Netsweeper content categorization system to demonstrate communication between the installation and databases used for categorization maintained by Netsweeper, Inc. A positive result on this test can suggest that the company has an ongoing relationship with an installation in a country and thus may have the ability to know how services are used (or misused) in a particular jurisdiction.

Netsweeper’s Internet filtering system is made up of two components. The first is software that intercepts requests for websites and determines if they are to be denied or permitted and the second is a database of website categorizations. The software component looks up how a requested website is categorized through the database component. If a requested website belongs to a content category that has been selected for filtering, the website is blocked.

Given the highly dynamic nature of web content, assigning categories to that content is a significant undertaking; as a consequence, categorization of web content is a key method that filtering vendors use to differentiate their services. According to Netsweeper’s “Live Stats” website, they typically categorize on the order of tens of millions of websites per day. Each Netsweeper customer has a local copy of that database. If a website is requested that has not been categorized in that local database (e.g., a newly-registered domain) the local installation will contact Netsweeper’s cloud-based categorization engine, which will fetch the website, categorize it, and make that categorization available to customer installations to be included in their local databases, within a few seconds.

The Netsweeper Filtering Process
Figure 1.3. The Netsweeper Filtering Process

 

We registered a set of new domains on which we hosted innocuous text content. We divided the domains into two groups: (1) a control group that we never accessed from anywhere and (2) a test group that we accessed in a country of interest. We expect that server logs from the control group would be empty and server logs from the test group would show two entries:

  1. An HTTP GET request for our website from the vantage point

  2. A second HTTP GET request from a different IP address within a few seconds

Figure 1.3. An explanation of the flow of information in the Beacon Box test.

Figure 1.4. An explanation of the flow of information in the Beacon Box test.

In prior research in Yemen, our control group behaved as expected and the test group all showed a request within one second from an IP address belonging to a customer of cloud provider Rackspace. In prior research in Bahrain, our control group behaved as expected and the test group all showed requests within one second from IP addresses belonging to a customer of cloud provider DigitalOcean. A 2015 forum post by a user of Australian ISP Telstra describes a similar follow-up visit from a Rackspace-hosted IP address, a practice which Telstra confirmed to be Netsweeper, Inc.’s categorization process.

1.3 General Technical Findings

In this section, we summarize the general findings of our data collection. For our case studies of bona fide Netsweeper installations on consumer-facing ISPs in specific countries of interest, see Section 2.

1.3.1 Netsweeper installations

Our data collection period ran for seven months from August 31, 2017 to April 9, 2018. We identified the possible installations listed in Table 1.4 after collecting Internet scanning data and on-network measurements that matched our signature (Section 1.2.1). This list includes installations being used in institutional settings as well as those operated at private businesses. There may also be matches to our Netsweeper signature present in this table that are false positives.

Country Number of IP addresses Number of Autonomous Systems (AS)
Canada 80 8
United States 70 29
Great Britain 69 17
India 42 13
Pakistan 20 2
Bahrain 12 9
Afghanistan 10 2
Qatar 8 1
Ireland 8 3
Australia 8 5
Yemen 6 1
Somalia 6 3
Saudi Arabia 5 2
Kuwait 5 2
Sudan 4 2
New Zealand 4 3
Indonesia 4 3
Cyprus 3 1
United Arab Emirates 3 1
South Africa 2 2
Singapore1 1 1
Palestinian Territory 1 1
Netherlands 1 1
Greece 1 1
Dominica 1 1
Germany 1 1
Colombia 1 1
Brunei Darussalam 1 1
Argentina 1 1
Albania 1 1
TOTAL
30 Countries 379 IP addresses 111 ASNs

Table 1.4. List of all possible Netsweeper IP addresses found

Note that a single installation maybe double-counted in Table 1.4 if it was associated with more than one IP address during our data collection period. Geolocation information is based on the latest MaxMind GeoIP2 Country database at the time of collection. We manually corrected some incorrect geolocations that we noticed, such as the ASN “VIVA Bahrain,” which geolocated to Saudi Arabia, despite being a Bahraini ISP.

We narrowed our findings from the master list of all Netsweeper installations to focus on installations being used to censor content on consumer-facing ISPs in countries of interest. Our countries of interest are any country ranked “Authoritarian” in the 2017 Economist Democracy Index, along with India, Pakistan, and Somalia. We added these latter three countries because of the unique history, political and security situation, and characteristics of Internet filtering in the countries (Section 1.1.2). Table 1.5 below identifies Netsweeper installations in countries of interest.

Country Economist 2017 Democracy Index Ranking IP addresses of Netsweeper installations Autonomous System Names Names of ISPs
Afghanistan Authoritarian 10 Afghantelecom Government Communication Network

Etisalat Afghan

Afghan Telecom

Etisalat Afghanistan

Bahrain Authoritarian 16 Batelco

Etisalcom Bahrain Company W.L.L.

Kalaam Telecom Bahrain B.S.C.

Mena Broadband Services WLL

Northstar Technology Company W.L.L.

Nuetel Communications S.P.C

Rapid Telecommunications W.L.L.

ViaCloud WLL

VIVA Bahrain BSC Closed

Zain Bahrain B.s.c.

Batelco

Etisalcom

Kalaam Telecom

Mena Broadband Services

Northstar Technology Company

Nuetel

Rapid Telecom

Viacloud

VIVA

Zain Bahrain

India Flawed Democracy 42 BHARTI Airtel Ltd.

Bharti Airtel Ltd. AS

for GPRS Service

Hathway IP Over Cable Internet

Hughes Escorts Communications Limited Is A Satellite Based Broadband Isp & Asp

National Internet Backbone

Net4India Ltd

Pacific Internet India Pvt. Ltd.

Primesoftex Ltd

Reliance Communications Ltd.DAKC MUMBAI

Reliance Jio Infocomm Ltd

TATA Communications formerly VSNL is Leading ISP

TATA SKY BROADBAND PRIVATE LIMITED

Telstra Global

Bharti Airtel

Bharti Airtel

Hathway

Hughes Communications

BSNL Broadband

Net4

PacNet

Prime Softex

Reliance Communications

Jio

TATA Communications

TATA Sky

Telstra

Kuwait Authoritarian 5 Fast Telecommunications Company W.L.L.

Mobile Telecommunications Company

Fastelco

Zain

Pakistan Hybrid Regime 20 Pakistan Telecommunication Company Limited

Paknet Limited

Merged into PTCL

PTCL

Paknet

Qatar Authoritarian 8 Ooredoo Q.S.C. Ooredoo
Saudi Arabia Authoritarian 1 Etihad Atheeb Telecom Company Go
Sudan Authoritarian 4 KANARTEL

Sudatel

Canar/Canartel

Sudatel

Somalia N/A 7 Golis-Telecom-AS

HORMUUD

O3b Limited

Golis Telecom

Hormuud Telecom

O3b

UAE Authoritarian 3 Emirates Integrated Telecommunications Company PJSC (EITC-DU) du
Yemen Authoritarian 6 Public Telecommunication Corporation Yemennet

Table 1.5. Summary of Netsweeper installations identified in countries of interest

We discuss these installations in more detail in Section 2.

1.3.2 What is blocked?

We collected data concerning the blocking of URLs (Section 1.2.3) and summarize our findings in Table 1.6.

Number of times in our testing where a blockpage was returned 20,607
Number of URLs blocked per country (sum over all countries where blocking observed) 2,464
Number of countries where a blockpage was ever returned, including both countries of interest and non-interest 17
Number of content categories ever seen in a blockpage query string 18

Table 1.6. Overview of observed blocking behaviour.

Netsweeper assigns all URLs to a set of content categories. System administrators select from the set of available content categories to decide which content to block. System administrators can also add URLs to categories such as the “Custom” category.

Category Number of URLs on testing lists that we saw blocked at least once, in at least one country, in each category2
Custom 1,493
Pornography 490
[Blank]3 141
Web Proxy 136
Gambling 76
Substance Abuse 45
Alternative Lifestyles 28
Alcohol 19
Hate Speech 13
Nudity 6
Multiple Categories 7
Criminal Skills 3
Viruses 2
Sex Education 1
Phishing 1
Matrimonial 1
Match Making 1
Abortions 1
TOTAL 2,464

Table 1.7. Content categories found in blockpages

The disproportionate number of URLs blocked in the “Custom” category is due to data collected from India. All URLs found blocked in India were assigned to this content category and data from this country contributed significantly to the large number of blocked URLs.

1.3.3 Beacon Box tests

We conducted seven Beacon Box tests on seven ISPs. Each test was performed with newly registered domain names. These tests showed communication between installations at three ISP networks and infrastructure that we believe is controlled by Netsweeper, Inc. Table 1.8 summarizes the results of these tests.

Country ISP Time of initial visit Follow-up visit User-agent of follow-up visitor
Kuwait Zain 14:25:22.783 14:25:23.116

From

162.243.69.215 (DigitalOcean)

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
India Airtel 09:38:17.188 09:38:19.380

From

159.203.196.79 (DigitalOcean)

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Yemen Yemennet 07:22:50.293 07:22:50.485

From

159.203.42.143 (DigitalOcean)

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0

Table 1.8. Summary of our positive Beacon Box tests

In these three cases, the initial visit to our newly-created domain was followed within less than 2 seconds by a visit from a DigitalOcean-hosted IP address. In all three cases, the user-agent string was identical, perhaps indicating the same software was running on all three DigitalOcean IP addresses. These results were as expected, given our previous testing in 2016 in Bahrain and 2015 in Yemen.

We also ran Beacon Box tests that produced negative results (i.e., the test did not result in any follow-up visits). The negative results were from Airtel and Air Jaldi in India, PTCL in Pakistan, and Ooredoo in Qatar. It is not clear why these tests did not lead to follow-up visits from the Netsweeper categorizer.

We conclude that the Netsweeper installations on the ISPs in Table 1.8 are likely actively communicating with and receiving URL categorization services from infrastructure controlled or maintained by Netsweeper, Inc. Also of note with respect to these communications, there are potential privacy concerns regarding transmission of user web request data to a foreign jurisdiction.

1.3.4 Host Header tests

Our host header tests found Netsweeper-injected responses on 14 ISPs in six countries.

Country ISP
Afghanistan Asix
Etisalat Afghan
Bahrain Bahrain Internet Exchange
Batelco
Infonas WLL
Kalaam Telecom Bahrain B.S.C.
Mena Broadband Services WLL
Nuetel Communications S.P.C
Rapid Telecommunications W.L.L.
India CityCom Networks Pvt Ltd
Hathaway IP Over Cable Internet
Telstra Global
Japan Telstra Global
United States Windstream Communications Inc
Yemen Public Telecommunication Corporation

Table 1.9. Positive results of our Host Header test

Bahrain Case Study

We identified an infrastructure IP address in Bahrain and sent a series of Host Header probes to the IP address containing each URL in the Bahrain local testing list. We received blockpages for 57 of these URLs. The blockpages were consistent with the blockpage seen by Bahraini Internet users and were returned in packets with an IPID value of 242. The results of this testing are discussed further in the Bahrain country case study in Section 2.

Figure 1.5. A sample packet containing a blockpage returned during our Host Header testing.

1.3.5 Miscategorization

Although Netsweeper and other filtering companies promote the breadth of their website categorization databases and the effectiveness of their automated categorization methods, it is inevitable that content will be miscategorized. Automated categorization systems can misinterpret the presence of certain keywords, such as by confusing sexual health material for adult content or mistaking drug rehabilitation services for those promoting drug use. Prior research on the filtering product SmartFilter showed how errant categorizations can have large impacts on the accessibility of content and can leave both content creators and users with few opportunities for recourse.

Our data collection identified a number of apparent content miscategorizations. In some cases, we can identify the same miscategorization across several Netsweeper installations, which indicates that Netsweeper’s categorization system may be responsible. In other cases, it is unclear whether Netsweeper or the operator of a single Netsweeper installation may be responsible for a miscategorization. Even temporary or unintended miscategorizations can prevent people from accessing information, often with minimal avenues for recourse.

Google searches for “gay” and “lesbian” classified as pornography

We found that Google searches for the keywords “gay” (i.e., http://www.google.com/search?q=gay) and “lesbian” (i.e., http://www.google.com/search?q=lesbian) were blocked in the UAE, Bahrain, and Yemen. In the UAE and Bahrain, these searches were blocked because that URL was included in the “Pornography” category. Testing data from Yemen did not indicate the category to which the blocked URL belonged, but it may be because of the same miscategorization.

However, it is unlikely that a user would actually see a blockpage for a specific Google search, because if they visit the homepage of www.google.com prior to conducting their search, they will be automatically redirected to HTTPS, which obscures the user’s search terms from Netsweeper.

Other miscategorizations as pornography

One of the dangers of automated categorization systems is that content might be miscategorized based on the presence of certain keywords or terms. For example, the website of the Centre for Health and Gender Equity (http://www.genderhealth.org/), which contains content discussing sexual and reproductive health, was found categorized as “pornography.”

In our testing data, the website of the World Health Organization (WHO) was also found to be blocked in the “pornography” category in the UAE and Kuwait. In addition to the WHO homepage (http://www.who.int), several other WHO URLs that were tested were also blocked, including the WHO’s pages on sexual and reproductive health (http://www.who.int/reproductivehealth/), HIV/AIDS (http://www.who.int/topics/hiv_aids/), and a website on avian influenza (http://www.who.int/influenza/human_animal_interface). These websites also did not appear to be blocked in every test in UAE and Kuwait; some tests showed that these websites were accessible.

A number of sites that do not appear to host any sexual content were also blocked as a result of being categorized as pornography in at least one instance. Importantly, we do not know whether these miscategoriations were a result of Netsweeper’s categorization process or erroneous manual intervention by the operators of a single Netsweeper installation.

Site Description URL
The Christian Science Monitor http://www.csmonitor.com
World Union for Progress Judaism https://wupj.org
Center for Health and Gender Equity http://www.genderhealth.org/
Change Illinois, a political advocacy group in Illinois http://www.changeil.org
White Honor, a white supremacist website http://whitehonor.com/
BackTrack Linux http://www.backtrack-linux.org
Middle East Transparent, a news website https://middleeasttransparent.com/fr/

Table 1.10. Non-pornographic sites observed categorized as Pornography, either due to Netsweeper or due to erroneous manual intervention by the operators of a single Netsweeper installation

Previous research published by the ONI showed how Netsweeper’s categorization of social media platform Tumblr as pornography– potentially due to the presence of pornographic content on some Tumblr sites– led to the entire platform being blocked in Kuwait, Qatar, UAE, and Yemen. A “one-size-fits-all” approach is likely to cause significant collateral impact given the diverse types of content hosted on social media and media sharing platforms.

Multiple miscategorizations of gay.com

The URL http://www.gay.com was blocked in Yemen, Afghanistan, and the UAE where it was variously categorized as “Pornography,” “Match Making,” “Alternative Lifestyles,” and “Web Proxy.” The site was previously an LGBTQ social networking and personals site but, since 2016, has been the homepage of the Los Angeles LGBT Center. It is possible that the categorization of the website is out of date in some cases.

Alternative lifestyles category

Filtered LGBTQ content in the UAE.
Figure 1.6. Filtered LGBTQ content in the UAE.

One category provided by Netsweeper, called “Alternative Lifestyles,” warrants special discussion. The category is defined by Netsweeper as follows:

“This includes sites that reference topics on habits or behaviors related to social relations, dress, expressions, or recreation that are important enough to significantly influence the lives of a sector of the population. It can include the full range of non-traditional sexual practices, interests and orientations. Some sites may contain graphic images or sexual material with no pornographic intent.”

The category itself raises a number of concerns. First, the framing of LGBTQ identities as “non-traditional” illustrates the inherently discriminatory nature of this content category. By creating this category, Netsweeper is enabling censorship authorities to implement the wholesale blocking of LGBTQ content, including websites of civil rights and advocacy organizations, HIV/AIDS prevention organizations, and LGBTQ media and cultural groups. This category appears to serve no other purpose beyond facilitating the blocking of non-pornographic LGBTQ content.

The problematic use of this Netsweeper content category was flagged in 2011 by the ACLU in their complaint to the Missouri Research & Education Network (MOREnet). MOREnet had used the Alternative Lifestyles category to block LGBTQ content in more than 100 school districts across the state. Following the ACLU’s outreach, MOREnet disabled the blocking of the Alternative Lifestyles category. Network filtering company Lightspeed Systems removed their own similar “education.lifestyle” content category, which contained non-pornographic LGBTQ content, following similar complaints from the ACLU.

We found 28 sites blocked in the Alternative Lifestyles content category (all in the UAE), including:

Site Description URL
Gay & Lesbian Alliance Against Defamation http://www.glaad.org
Human Rights Campaign http://www.hrc.org
The International Lesbian, Gay, Bisexual, Trans and Intersex Association http://ilga.org/
Gay Men’s Health Centre http://www.gmhc.org
The International Foundation for Gender Education http://www.ifge.org
Queerty, an LGBTQ online magazine http://www.queerty.com
Transsexual road map http://www.tsroadmap.com/
Gay Calgary http://www.gaycalgary.com
GlobalGayz, an LGBTQ travel and culture site http://www.globalgayz.com
Caritas International, a Catholic relief, social services and development organization http://www.caritas.org

Table 1.11. Sites observed categorized as Alternative Lifestyles

Other, unexplained miscategorizations

Some sites were likely miscategorized as “Web Proxy” in at least one instance. Such sites include:

Site Description URL
Date.com http://www.date.com/
B’nai B’rith International http://bnaibrith.org
World Jewish Congress http://www.worldjewishcongress.org
Vanguard Blog from the LA LGBT Center http://www.gay.com/
Feminist Majority Foundation http://www.feminist.org
Jewish Defense League http://www.jdl.org/
TMZ, a celebrity news site http://www.tmz.com
Former Catholic http://www.formercatholic.com
The Bahai Faith http://www.bahai-faith.org/

Table 1.12. Non-proxy sites observed categorized as Web Proxy

We also found 11 Blogspot-hosted URLs that were blocked in Kuwait as a result of being assigned to the “Viruses” category. It is not clear why this was the case.

1.3.6 Blocking content by country

Netsweeper has a feature that allows for the blocking of websites from specific countries. The company’s documentation lists “Countries” as one of the main category groups, alongside web content, web apps, and protocols. It is not clear what justifiable use case would require the blocking of all content from a specific country or set of countries. Our past research has shown that all content from the Israel top-level domain (.il) was found to be blocked in Yemen, although we cannot be sure that such blocking was implemented using this feature.


  1. The reverse DNS entry for the installation found in Singapore is apacdemo.netsweeper.com; we believe that this installation is for sales demonstration purposes and is used by Netsweeper for marketing in the Asia-Pacific region.
  2. It is possible that some URLs might be added to these categories by individual operators and do not represent categorizations performed by Netsweeper, Inc.
  3. Some measurements did not include a content category; these instances are labelled as “[Blank]”.

Media Mentions

CBC, CBC (2), Indian Express, Indian Express (2), Radio-CanadaVice, The Record, Bloomberg, WiredNews Laundry, Scroll, BGR, U of T News, Epoch Time, Daily O, CyberWire, Goo, Wolne Media, First Post, La Presse, Data News, Sludge Feed, Dutch IT Channel