This is part two of a four-part report on the global proliferation of Netsweeper
Section 1- Methodology & Technical Findings
This section details the research questions that informed our study. We also outline in detail the methods that we adopted to identify Netsweeper installations worldwide, and those that we employed to reduce the findings to countries of interest. We also present high-level technical findings and observations.
1.1 Research Questions
Our research for this report was guided by the following questions:
- Can we identify all Netsweeper installations on the Internet? What technical methods and tools can we use to do that?
- What tools and methods can we use to confirm which of these Netsweeper installations are on the networks of consumer-facing ISPs?
- Are any of the installations that are identified on consumer-facing ISPs located in jurisdictions in which their use represents a human rights concern?
- What can we say about how censorship is applied by the installations found in jurisdictions associated with human rights concerns? What types of content are censored? How is it censored? How transparent is such censorship to users? What is the legal and regulatory framework governing censorship in these jurisdictions?
- Can we confirm if the installations found in jurisdictions that are associated with human rights concerns are actively serviced by Netsweeper, Inc.?
1.1.2 Countries of interest
Netsweeper has customers around the world. While our prior research has focused on the use of Netsweeper technology in countries of the Global South, the company also has customers in the Global North, including Canada, where it is headquartered, and in the United Kingdom, where it opened an office in 2017. Many of the purchasers of Netsweeper products are institutional customers, particularly in the education sector, where the company advertises compliance with both U.S. (CIPA) and U.K. (OFSTED) guidelines regulating children’s access to online content. Other customers in these countries include private companies seeking to control employee access to the Internet.
Our primary research interest pertains to the filtering of content on consumer-facing ISPs. In most cases, filtering on consumer ISPs does not have an opt-out option, which leaves users with no alternative for accessing blocked content (unless they are able to switch to a non-filtering provider). This same dynamic is not at play in the case of employees or students who experience website or Internet blocking in an institutional or a corporate setting. As a result, we have chosen to exclude institutional and private-sector Netsweeper installations from deeper analysis.
We further focus on countries that routinely violate human rights in areas of free expression, as we think that these countries are more likely to abuse filtering technologies to restrict access to political or human rights content. We selected countries ranked as “Authoritarian” by the 2017 Economist Democracy Index and added other countries that are not ranked as “Authoritarian,” including India, Pakistan, and Somalia, because of the unique history and characteristics of Internet filtering in the countries. India has a long and complex history with Internet filtering that has been the subject of many contentious public debates. Historically, Pakistan has censored the Internet extensively, including blocking all of YouTube in 2008. Somalia is a failed state torn by insurgencies and persistent violence.
1.2 Methodology
Our technical methodology is divided into three phases. In the first phase, we collected a list of IP addresses that might be associated with Netsweeper installations. In the second phase, we filtered our list to include only bona fide Netsweeper installations deployed on consumer ISPs in countries of interest. In the third phase, we examined what content these Netsweeper installations were blocking and whether they may have been communicating with Netsweeper, Inc.
Purpose | Methods | Data Source |
---|---|---|
Develop a list of IP addresses of Netsweeper installations | Searching existing Internet scanning data sources | Censys, Shodan |
Develop a list of IP addresses of Netsweeper installations | Searching existing Internet censorship data sources | OONI, ICLab, Packet captures, Ad hoc testing |
Filter our list of IP addresses to bona fide Netsweeper installations on consumer-facing ISPs | Remotely scanning the IP addresses | Specialized scanning |
Identify content blocked by these Netsweeper installations | Searching existing Internet censorship data sources | OONI, ICLab, Packet captures, Ad hoc testing |
Identify content blocked by these Netsweeper installations | Remotely scanning IP addresses in countries of interest using HTTP Host headers aimed at triggering censorship | Host Header test |
Identify whether the Netsweeper installation may be communicating with Netsweeper, Inc. | Running our Beacon Box test | Beacon Box test |
Table 1.1. Our methodology
1.2.1 Developing a list of IP addresses of Netsweeper devices
We developed our list of IP addresses by examining existing Internet scanning data from two sources and existing censorship measurement data from two sources.
Existing Internet scanning data
Shodan and Censys are two platforms that probe most Internet-connected devices at regular intervals and make the results publicly accessible. In previous work, we developed various signatures for how Netsweeper devices respond to the probes that Shodan and Censys send. We queried these services daily for results matching our fingerprints. Figure 1.1 shows the specific queries we sent to Shodan and Censys.
The IP addresses we collected provide a broad picture of publicly visible Netsweeper installations, including both public ISP installations, and institutional and private sector installations.
Existing Internet censorship data
The Open Observatory of Network Interference (OONI) and Information Controls Lab (ICLab) collect data on Internet filtering and network interference from vantage points all around the world by convincing volunteers in various countries to run specialized measurement tools. The tools include web connectivity tests that attempt to access lists of potentially censored content, collect the resulting responses, and then analyze them for evidence of censorship. OONI and ICLab data are both publicly searchable.
We searched OONI and ICLab data using signatures (Figure 1.2) that we developed in our prior work to identify additional Netsweeper installations.
We included the blockpage IP addresses in our list of IP addresses of possible Netsweeper installations. We also used OONI and ICLab data (Section 1.2.3) to identify blocked websites.
1.2.2 Filtering our list of IP addresses
We next sought to narrow our list of IP addresses (Section 1.2.1) to bona fide Netsweeper installations filtering content on consumer-facing ISPs. We first ran probes against each IP address to see whether the IP was associated with a bona fide Netsweeper installation. Second, we probed each IP to see whether the installation was on a consumer-facing ISP.
Is the IP address a bona fide Netsweeper installation?
We ran a variety of tests to answer this question, described in Table 1.2.
Question to be answered | Data source | Value suggestive of Netsweeper installation | Test code |
---|---|---|---|
Do the headers for a request for the IP address show a direction to http://<IP address>/webadmin? | Headers from HTTP HEAD request to http://<IP address> | Redirection to http://<IP address>/webadmin | b1 |
Is the redirect from a previous data point followed by a redirect to http://<IP address>/webadmin/redirect? | Headers from redirection to http://<IP address>/webadmin | Redirection to http://<IP address>/webadmin/redirect | b2 |
Does an attempt to access http://<IP address>/webadmin return a valid page? | HTTP GET request to http://<IP address>/webadmin | Valid page | b3 |
Does an attempt to access http://<IP address>/webadmin/alert return a valid page? | HTTP GET request to http://<IP address>/webadmin/alert | Valid page | b4 |
Does an attempt to access http://<IP address>/webadmin/deny return a valid page? | HTTP GET request to http://<IP address>/webadmin/deny | Valid page | b5 |
Does an attempt to access http://<IP address>:8081/auth/Login.action return a valid page? | HTTP GET request of http://<IP address>:8081/auth/Login.action | Page containing copyright notice: “2009 Netsweeper Inc.” | b6 |
Does the sysdesc SNMP value of the IP address contain the string “.netsw”? | Public GET of SNMPv2 value: “SysDescr” | E.g. “Linux NS-WebAdmin 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64” | b_snmp |
Does a reverse DNS resolution of the IP address suggest that the IP address belongs to a Netsweeper installation? | Reverse DNS lookup on the IP | A domain name which is indicative of a Netsweeper installation (e.g. nsfilter2.spg.more.net ) | rdns |
Does the page returned from /deny define CSS templates which suggests a Netsweeper installation? | HTTP GET request from http://<IP address>/webadmin/deny | “Shared”
“Webadmin2012” “Webadmin2016” |
css |
Does the /deny page include a “mailto” link which suggests it is a Netsweeper
installation? |
HTTP GET request of http://<IP address>/webadmin/deny | HTML page body contains “mailto:” link suggestive of Netsweeper | denypage_mailto |
Does the page returned from /deny contain an HTML title which suggests a Netsweeper installation? | HTTP GET request from http://<IP address>/webadmin/deny | “Access Denied” | denypage_title |
Table 1.2. Summary of data points collected to validate potential Netsweeper installations. The “Test code” values are referenced in the data analysis of our country case studies in Section 2.
Discussion of tests
In general, we considered an IP address to belong to a bona fide Netsweeper installation if the following Boolean expression was matched:
b_snmp || (b1 && b2) || b6 || (b1 && b3 && b4 && b5) |
---|
The b_snmp test, which checks whether the SNMP sys_descr value contains the string “.netsw”, is a very good indication that Netsweeper software is installed, as this string is unlikely to appear in servers not running software developed by Netsweeper. Similarly, the b6 test tells us whether or not a visit to the path: “/auth/Login.action” on port 8081 returns a page with a copyright notice of “2009 Netsweeper Inc.”
We do not weight some of the other tests as highly, as they could be matched by non-Netsweeper products. For instance, test b1 only measures whether a direct visit to the IP address redirects to the path: /webadmin. It seems conceivable that non-Netsweeper products could match this test, as “webadmin” is a common word. The tests b3 to b5 all return true if any page is returned in response to their respective queries. A web server that is configured to respond with HTTP 200 to any request would likely return “True” to all these tests. However, it is less likely that a non-Netsweeper server would be in our initial list of IP addresses, because of how we generated that list (Section 1.2.1).
The rdns, css, denypage_title, and denypage_mailto tests do not have Boolean return values. Therefore, the strength of these tests depends on how clear the value returned is in regards to potentially identifying the function of the server. For example, if the deny page title was “Netsweeper – Blocked,” it would be a strong indicator of a Netsweeper installation; if the title was “Not Found,” that would be a weak indicator.
Is the installation on a consumer-facing ISP?
We ran a variety of tests to answer this question, described in Table 1.3.
Question to be answered | Data source | Value suggestive of consumer-facing ISP |
---|---|---|
Does the page returned from /deny contain links to domains which suggest who is responsible for administering the installation? | HTTP GET request from http://<IP address>/webadmin/deny | “nsblock.<ISP NAME>.com” |
Does a reverse DNS resolution of the IP address suggest who is responsible for administering the installation? | Reverse DNS lookup on the IP | A domain name which is indicative of the administrator of the installation (e.g: restrict.kw.zain.com) |
Does the sysdesc SNMP value of the IP address suggest who is responsible for administering the installation? | Public GET of SNMPv2 value: “SysDescr” | E.g. “Linux NS-WebAdmin 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64” |
Does the /deny page include a “mailto” link which suggesting who is responsible for administering the installation? | HTTP GET request of http://<IP address>/webadmin/deny | HTML page body contains “mailto:” link indicative of the installation’s administrator |
Do the OONI or ICLab measurements for this installation show a blockpage that includes logos or text indicating an ISP or government authority? | OONI and ICLab | Blockpage contains logos or text indicating an ISP or government authority |
Do the OONI or ICLab measurements for this installation show censorship from multiple vantage points? | OONI and ICLab | Multiple different vantage points experiencing censorship by a single Netsweeper installation |
Do our results from Section 1.2.1 show multiple adjacent IP addresses on the same network? | Censys, Shodan, OONI, and ICLab | Multiple adjacent IP addresses on the same network |
Table 1.3. Summary of data points collected to validate whether Netsweeper installations are on consumer-facing ISPs
1.2.3 Identify content blocked by Netsweeper installations
We further examined bona fide Netsweeper installations on consumer-facing ISPs in countries of interest in order to determine what websites they were blocking and whether or not they might be communicating with Netsweeper, Inc.
Ad-hoc manual testing
In some cases, we collected limited data from users who had access to a vantage point on a network in a country of interest. In such cases, users who had access to a network of interest accessed a set of websites within a web browser and noted the responses. Identifying if a site is inaccessible as a result of deliberate filtering is context-specific and is discussed in further detail in specific country case studies. This type of testing has limitations: it relies on manual data entry and interpretation of results observed. This testing leads to a higher likelihood of error than automated testing.
OONI and ICLab data
We examined our results from OONI and ICLab (Section 1.2.1) to determine which websites were being blocked. OONI and ICLab use the same testing lists, which include a global list tested in every country, and a per-country local list. The lists are manually created by volunteers and there is variation in the size of the lists and the scope of content they cover. As a result, they may only find a subset of censorship that is present at the time of testing. These lists do not provide an exhaustive inventory of Internet filtering.
Host Header test
We also used a measurement technique that does not require a vantage point on the censored network. This test involves sending requests to IP addresses on a censored network and observing if any of these packets receive an injected blockpage.
To begin, we conducted a zmap scan of the Internet, sending all IPv4 addresses a request containing a Host field that might be blocked by Netsweeper. We picked low-risk URLs, such as invalid URLs that did not point to any web content, or the Netsweeper “deny page test” (e.g., denypagetests.netsweeper.com/category/catno/32) for these global scans in order to avoid a situation where a target IP address might be implicated in circumventing censorship. We examined responses to our scan with an IPID value of 242, which our previous research had shown as being a characteristic of Netsweeper injections. We selected a subset of those IPs for further in-depth testing. In order to ensure ethical testing, we selected only IPs tagged as an “infrastructure router” on Censys or IPs that were clearly operated by ISPs themselves and not ISP customers. We then tested these IPs by sending requests for URLs in our local testing list and double-checked our results.
Beacon Box test
We next sought to determine if Netsweeper installations were communicating with infrastructure controlled by Netsweeper, Inc. This test uses properties of the Netsweeper content categorization system to demonstrate communication between the installation and databases used for categorization maintained by Netsweeper, Inc. A positive result on this test can suggest that the company has an ongoing relationship with an installation in a country and thus may have the ability to know how services are used (or misused) in a particular jurisdiction.
Netsweeper’s Internet filtering system is made up of two components. The first is software that intercepts requests for websites and determines if they are to be denied or permitted and the second is a database of website categorizations. The software component looks up how a requested website is categorized through the database component. If a requested website belongs to a content category that has been selected for filtering, the website is blocked.
Given the highly dynamic nature of web content, assigning categories to that content is a significant undertaking; as a consequence, categorization of web content is a key method that filtering vendors use to differentiate their services. According to Netsweeper’s “Live Stats” website, they typically categorize on the order of tens of millions of websites per day. Each Netsweeper customer has a local copy of that database. If a website is requested that has not been categorized in that local database (e.g., a newly-registered domain) the local installation will contact Netsweeper’s cloud-based categorization engine, which will fetch the website, categorize it, and make that categorization available to customer installations to be included in their local databases, within a few seconds.
We registered a set of new domains on which we hosted innocuous text content. We divided the domains into two groups: (1) a control group that we never accessed from anywhere and (2) a test group that we accessed in a country of interest. We expect that server logs from the control group would be empty and server logs from the test group would show two entries:
-
An HTTP GET request for our website from the vantage point
-
A second HTTP GET request from a different IP address within a few seconds
In prior research in Yemen, our control group behaved as expected and the test group all showed a request within one second from an IP address belonging to a customer of cloud provider Rackspace. In prior research in Bahrain, our control group behaved as expected and the test group all showed requests within one second from IP addresses belonging to a customer of cloud provider DigitalOcean. A 2015 forum post by a user of Australian ISP Telstra describes a similar follow-up visit from a Rackspace-hosted IP address, a practice which Telstra confirmed to be Netsweeper, Inc.’s categorization process.
1.3 General Technical Findings
In this section, we summarize the general findings of our data collection. For our case studies of bona fide Netsweeper installations on consumer-facing ISPs in specific countries of interest, see Section 2.
1.3.1 Netsweeper installations
Our data collection period ran for seven months from August 31, 2017 to April 9, 2018. We identified the possible installations listed in Table 1.4 after collecting Internet scanning data and on-network measurements that matched our signature (Section 1.2.1). This list includes installations being used in institutional settings as well as those operated at private businesses. There may also be matches to our Netsweeper signature present in this table that are false positives.
Country | Number of IP addresses | Number of Autonomous Systems (AS) |
---|---|---|
Canada | 80 | 8 |
United States | 70 | 29 |
Great Britain | 69 | 17 |
India | 42 | 13 |
Pakistan | 20 | 2 |
Bahrain | 12 | 9 |
Afghanistan | 10 | 2 |
Qatar | 8 | 1 |
Ireland | 8 | 3 |
Australia | 8 | 5 |
Yemen | 6 | 1 |
Somalia | 6 | 3 |
Saudi Arabia | 5 | 2 |
Kuwait | 5 | 2 |
Sudan | 4 | 2 |
New Zealand | 4 | 3 |
Indonesia | 4 | 3 |
Cyprus | 3 | 1 |
United Arab Emirates | 3 | 1 |
South Africa | 2 | 2 |
Singapore1 | 1 | 1 |
Palestinian Territory | 1 | 1 |
Netherlands | 1 | 1 |
Greece | 1 | 1 |
Dominica | 1 | 1 |
Germany | 1 | 1 |
Colombia | 1 | 1 |
Brunei Darussalam | 1 | 1 |
Argentina | 1 | 1 |
Albania | 1 | 1 |
TOTAL | ||
30 Countries | 379 IP addresses | 111 ASNs |
Table 1.4. List of all possible Netsweeper IP addresses found
Note that a single installation maybe double-counted in Table 1.4 if it was associated with more than one IP address during our data collection period. Geolocation information is based on the latest MaxMind GeoIP2 Country database at the time of collection. We manually corrected some incorrect geolocations that we noticed, such as the ASN “VIVA Bahrain,” which geolocated to Saudi Arabia, despite being a Bahraini ISP.
We narrowed our findings from the master list of all Netsweeper installations to focus on installations being used to censor content on consumer-facing ISPs in countries of interest. Our countries of interest are any country ranked “Authoritarian” in the 2017 Economist Democracy Index, along with India, Pakistan, and Somalia. We added these latter three countries because of the unique history, political and security situation, and characteristics of Internet filtering in the countries (Section 1.1.2). Table 1.5 below identifies Netsweeper installations in countries of interest.
Country | Economist 2017 Democracy Index Ranking | IP addresses of Netsweeper installations | Autonomous System Names | Names of ISPs |
---|---|---|---|---|
Afghanistan | Authoritarian | 10 | Afghantelecom Government Communication Network
Etisalat Afghan |
Afghan Telecom
Etisalat Afghanistan |
Bahrain | Authoritarian | 16 | Batelco
Etisalcom Bahrain Company W.L.L. Kalaam Telecom Bahrain B.S.C. Mena Broadband Services WLL Northstar Technology Company W.L.L. Nuetel Communications S.P.C Rapid Telecommunications W.L.L. ViaCloud WLL VIVA Bahrain BSC Closed Zain Bahrain B.s.c. |
Batelco
Etisalcom Kalaam Telecom Mena Broadband Services Northstar Technology Company Nuetel Rapid Telecom Viacloud VIVA Zain Bahrain |
India | Flawed Democracy | 42 | BHARTI Airtel Ltd.
Bharti Airtel Ltd. AS for GPRS Service Hathway IP Over Cable Internet Hughes Escorts Communications Limited Is A Satellite Based Broadband Isp & Asp National Internet Backbone Net4India Ltd Pacific Internet India Pvt. Ltd. Primesoftex Ltd Reliance Communications Ltd.DAKC MUMBAI Reliance Jio Infocomm Ltd TATA Communications formerly VSNL is Leading ISP TATA SKY BROADBAND PRIVATE LIMITED Telstra Global |
Bharti Airtel
Bharti Airtel Hathway Hughes Communications BSNL Broadband Net4 PacNet Prime Softex Reliance Communications Jio TATA Communications TATA Sky Telstra |
Kuwait | Authoritarian | 5 | Fast Telecommunications Company W.L.L.
Mobile Telecommunications Company |
Fastelco
Zain |
Pakistan | Hybrid Regime | 20 | Pakistan Telecommunication Company Limited
Paknet Limited Merged into PTCL |
PTCL
Paknet |
Qatar | Authoritarian | 8 | Ooredoo Q.S.C. | Ooredoo |
Saudi Arabia | Authoritarian | 1 | Etihad Atheeb Telecom Company | Go |
Sudan | Authoritarian | 4 | KANARTEL
Sudatel |
Canar/Canartel
Sudatel |
Somalia | N/A | 7 | Golis-Telecom-AS
HORMUUD O3b Limited |
Golis Telecom
Hormuud Telecom O3b |
UAE | Authoritarian | 3 | Emirates Integrated Telecommunications Company PJSC (EITC-DU) | du |
Yemen | Authoritarian | 6 | Public Telecommunication Corporation | Yemennet |
Table 1.5. Summary of Netsweeper installations identified in countries of interest
We discuss these installations in more detail in Section 2.
1.3.2 What is blocked?
We collected data concerning the blocking of URLs (Section 1.2.3) and summarize our findings in Table 1.6.
Number of times in our testing where a blockpage was returned | 20,607 |
---|---|
Number of URLs blocked per country (sum over all countries where blocking observed) | 2,464 |
Number of countries where a blockpage was ever returned, including both countries of interest and non-interest | 17 |
Number of content categories ever seen in a blockpage query string | 18 |
Table 1.6. Overview of observed blocking behaviour.
Netsweeper assigns all URLs to a set of content categories. System administrators select from the set of available content categories to decide which content to block. System administrators can also add URLs to categories such as the “Custom” category.
Category | Number of URLs on testing lists that we saw blocked at least once, in at least one country, in each category2 |
---|---|
Custom | 1,493 |
Pornography | 490 |
[Blank]3 | 141 |
Web Proxy | 136 |
Gambling | 76 |
Substance Abuse | 45 |
Alternative Lifestyles | 28 |
Alcohol | 19 |
Hate Speech | 13 |
Nudity | 6 |
Multiple Categories | 7 |
Criminal Skills | 3 |
Viruses | 2 |
Sex Education | 1 |
Phishing | 1 |
Matrimonial | 1 |
Match Making | 1 |
Abortions | 1 |
TOTAL | 2,464 |
Table 1.7. Content categories found in blockpages
The disproportionate number of URLs blocked in the “Custom” category is due to data collected from India. All URLs found blocked in India were assigned to this content category and data from this country contributed significantly to the large number of blocked URLs.
1.3.3 Beacon Box tests
We conducted seven Beacon Box tests on seven ISPs. Each test was performed with newly registered domain names. These tests showed communication between installations at three ISP networks and infrastructure that we believe is controlled by Netsweeper, Inc. Table 1.8 summarizes the results of these tests.
Country | ISP | Time of initial visit | Follow-up visit | User-agent of follow-up visitor |
---|---|---|---|---|
Kuwait | Zain | 14:25:22.783 | 14:25:23.116
From 162.243.69.215 (DigitalOcean) |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 |
India | Airtel | 09:38:17.188 | 09:38:19.380
From 159.203.196.79 (DigitalOcean) |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 |
Yemen | Yemennet | 07:22:50.293 | 07:22:50.485
From 159.203.42.143 (DigitalOcean) |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 |
Table 1.8. Summary of our positive Beacon Box tests
In these three cases, the initial visit to our newly-created domain was followed within less than 2 seconds by a visit from a DigitalOcean-hosted IP address. In all three cases, the user-agent string was identical, perhaps indicating the same software was running on all three DigitalOcean IP addresses. These results were as expected, given our previous testing in 2016 in Bahrain and 2015 in Yemen.
We also ran Beacon Box tests that produced negative results (i.e., the test did not result in any follow-up visits). The negative results were from Airtel and Air Jaldi in India, PTCL in Pakistan, and Ooredoo in Qatar. It is not clear why these tests did not lead to follow-up visits from the Netsweeper categorizer.
We conclude that the Netsweeper installations on the ISPs in Table 1.8 are likely actively communicating with and receiving URL categorization services from infrastructure controlled or maintained by Netsweeper, Inc. Also of note with respect to these communications, there are potential privacy concerns regarding transmission of user web request data to a foreign jurisdiction.
1.3.4 Host Header tests
Our host header tests found Netsweeper-injected responses on 14 ISPs in six countries.
Country | ISP |
---|---|
Afghanistan | Asix |
Etisalat Afghan | |
Bahrain | Bahrain Internet Exchange |
Batelco | |
Infonas WLL | |
Kalaam Telecom Bahrain B.S.C. | |
Mena Broadband Services WLL | |
Nuetel Communications S.P.C | |
Rapid Telecommunications W.L.L. | |
India | CityCom Networks Pvt Ltd |
Hathaway IP Over Cable Internet | |
Telstra Global | |
Japan | Telstra Global |
United States | Windstream Communications Inc |
Yemen | Public Telecommunication Corporation |
Table 1.9. Positive results of our Host Header test
Bahrain Case Study
We identified an infrastructure IP address in Bahrain and sent a series of Host Header probes to the IP address containing each URL in the Bahrain local testing list. We received blockpages for 57 of these URLs. The blockpages were consistent with the blockpage seen by Bahraini Internet users and were returned in packets with an IPID value of 242. The results of this testing are discussed further in the Bahrain country case study in Section 2.
1.3.5 Miscategorization
Although Netsweeper and other filtering companies promote the breadth of their website categorization databases and the effectiveness of their automated categorization methods, it is inevitable that content will be miscategorized. Automated categorization systems can misinterpret the presence of certain keywords, such as by confusing sexual health material for adult content or mistaking drug rehabilitation services for those promoting drug use. Prior research on the filtering product SmartFilter showed how errant categorizations can have large impacts on the accessibility of content and can leave both content creators and users with few opportunities for recourse.
Our data collection identified a number of apparent content miscategorizations. In some cases, we can identify the same miscategorization across several Netsweeper installations, which indicates that Netsweeper’s categorization system may be responsible. In other cases, it is unclear whether Netsweeper or the operator of a single Netsweeper installation may be responsible for a miscategorization. Even temporary or unintended miscategorizations can prevent people from accessing information, often with minimal avenues for recourse.
Google searches for “gay” and “lesbian” classified as pornography
We found that Google searches for the keywords “gay” (i.e., http://www.google.com/search?q=gay) and “lesbian” (i.e., http://www.google.com/search?q=lesbian) were blocked in the UAE, Bahrain, and Yemen. In the UAE and Bahrain, these searches were blocked because that URL was included in the “Pornography” category. Testing data from Yemen did not indicate the category to which the blocked URL belonged, but it may be because of the same miscategorization.
However, it is unlikely that a user would actually see a blockpage for a specific Google search, because if they visit the homepage of www.google.com prior to conducting their search, they will be automatically redirected to HTTPS, which obscures the user’s search terms from Netsweeper.
Other miscategorizations as pornography
One of the dangers of automated categorization systems is that content might be miscategorized based on the presence of certain keywords or terms. For example, the website of the Centre for Health and Gender Equity (http://www.genderhealth.org/), which contains content discussing sexual and reproductive health, was found categorized as “pornography.”
In our testing data, the website of the World Health Organization (WHO) was also found to be blocked in the “pornography” category in the UAE and Kuwait. In addition to the WHO homepage (http://www.who.int), several other WHO URLs that were tested were also blocked, including the WHO’s pages on sexual and reproductive health (http://www.who.int/reproductivehealth/), HIV/AIDS (http://www.who.int/topics/hiv_aids/), and a website on avian influenza (http://www.who.int/influenza/human_animal_interface). These websites also did not appear to be blocked in every test in UAE and Kuwait; some tests showed that these websites were accessible.
A number of sites that do not appear to host any sexual content were also blocked as a result of being categorized as pornography in at least one instance. Importantly, we do not know whether these miscategoriations were a result of Netsweeper’s categorization process or erroneous manual intervention by the operators of a single Netsweeper installation.
Site Description | URL |
---|---|
The Christian Science Monitor | http://www.csmonitor.com |
World Union for Progress Judaism | https://wupj.org |
Center for Health and Gender Equity | http://www.genderhealth.org/ |
Change Illinois, a political advocacy group in Illinois | http://www.changeil.org |
White Honor, a white supremacist website | http://whitehonor.com/ |
BackTrack Linux | http://www.backtrack-linux.org |
Middle East Transparent, a news website | https://middleeasttransparent.com/fr/ |
Table 1.10. Non-pornographic sites observed categorized as Pornography, either due to Netsweeper or due to erroneous manual intervention by the operators of a single Netsweeper installation
Previous research published by the ONI showed how Netsweeper’s categorization of social media platform Tumblr as pornography– potentially due to the presence of pornographic content on some Tumblr sites– led to the entire platform being blocked in Kuwait, Qatar, UAE, and Yemen. A “one-size-fits-all” approach is likely to cause significant collateral impact given the diverse types of content hosted on social media and media sharing platforms.
Multiple miscategorizations of gay.com
The URL http://www.gay.com was blocked in Yemen, Afghanistan, and the UAE where it was variously categorized as “Pornography,” “Match Making,” “Alternative Lifestyles,” and “Web Proxy.” The site was previously an LGBTQ social networking and personals site but, since 2016, has been the homepage of the Los Angeles LGBT Center. It is possible that the categorization of the website is out of date in some cases.
Alternative lifestyles category
One category provided by Netsweeper, called “Alternative Lifestyles,” warrants special discussion. The category is defined by Netsweeper as follows:
“This includes sites that reference topics on habits or behaviors related to social relations, dress, expressions, or recreation that are important enough to significantly influence the lives of a sector of the population. It can include the full range of non-traditional sexual practices, interests and orientations. Some sites may contain graphic images or sexual material with no pornographic intent.”
The category itself raises a number of concerns. First, the framing of LGBTQ identities as “non-traditional” illustrates the inherently discriminatory nature of this content category. By creating this category, Netsweeper is enabling censorship authorities to implement the wholesale blocking of LGBTQ content, including websites of civil rights and advocacy organizations, HIV/AIDS prevention organizations, and LGBTQ media and cultural groups. This category appears to serve no other purpose beyond facilitating the blocking of non-pornographic LGBTQ content.
The problematic use of this Netsweeper content category was flagged in 2011 by the ACLU in their complaint to the Missouri Research & Education Network (MOREnet). MOREnet had used the Alternative Lifestyles category to block LGBTQ content in more than 100 school districts across the state. Following the ACLU’s outreach, MOREnet disabled the blocking of the Alternative Lifestyles category. Network filtering company Lightspeed Systems removed their own similar “education.lifestyle” content category, which contained non-pornographic LGBTQ content, following similar complaints from the ACLU.
We found 28 sites blocked in the Alternative Lifestyles content category (all in the UAE), including:
Site Description | URL |
---|---|
Gay & Lesbian Alliance Against Defamation | http://www.glaad.org |
Human Rights Campaign | http://www.hrc.org |
The International Lesbian, Gay, Bisexual, Trans and Intersex Association | http://ilga.org/ |
Gay Men’s Health Centre | http://www.gmhc.org |
The International Foundation for Gender Education | http://www.ifge.org |
Queerty, an LGBTQ online magazine | http://www.queerty.com |
Transsexual road map | http://www.tsroadmap.com/ |
Gay Calgary | http://www.gaycalgary.com |
GlobalGayz, an LGBTQ travel and culture site | http://www.globalgayz.com |
Caritas International, a Catholic relief, social services and development organization | http://www.caritas.org |
Table 1.11. Sites observed categorized as Alternative Lifestyles
Other, unexplained miscategorizations
Some sites were likely miscategorized as “Web Proxy” in at least one instance. Such sites include:
Site Description | URL |
---|---|
Date.com | http://www.date.com/ |
B’nai B’rith International | http://bnaibrith.org |
World Jewish Congress | http://www.worldjewishcongress.org |
Vanguard Blog from the LA LGBT Center | http://www.gay.com/ |
Feminist Majority Foundation | http://www.feminist.org |
Jewish Defense League | http://www.jdl.org/ |
TMZ, a celebrity news site | http://www.tmz.com |
Former Catholic | http://www.formercatholic.com |
The Bahai Faith | http://www.bahai-faith.org/ |
Table 1.12. Non-proxy sites observed categorized as Web Proxy
We also found 11 Blogspot-hosted URLs that were blocked in Kuwait as a result of being assigned to the “Viruses” category. It is not clear why this was the case.
1.3.6 Blocking content by country
Netsweeper has a feature that allows for the blocking of websites from specific countries. The company’s documentation lists “Countries” as one of the main category groups, alongside web content, web apps, and protocols. It is not clear what justifiable use case would require the blocking of all content from a specific country or set of countries. Our past research has shown that all content from the Israel top-level domain (.il) was found to be blocked in Yemen, although we cannot be sure that such blocking was implemented using this feature.
- The reverse DNS entry for the installation found in Singapore is apacdemo.netsweeper.com; we believe that this installation is for sales demonstration purposes and is used by Netsweeper for marketing in the Asia-Pacific region.↩
- It is possible that some URLs might be added to these categories by individual operators and do not represent categorizations performed by Netsweeper, Inc.↩
- Some measurements did not include a content category; these instances are labelled as “[Blank]”.↩