A team of scientists from Stony Brook University, the University of Massachusetts, the University of California at Berkeley, and the University of Toronto in Canada tried to define the extent of Chinese Internet censorship study the work of the “great Chinese firewall”.
The study lasted more than nine months, and specifically for it, experts created a system called GFWatch, which accessed domains inside and outside the Chinese Internet space, and then checked how the “great firewall” of China reacted to it and interfered with connections at the DNS level. (to prevent Chinese users from accessing the domain or to restrict access to the country’s internal sites).
With the help of GFWatch, the researchers checked 534 million different domains, accessing approximately 411 million domains every day to lock in and then recheck whether the locks they discovered were persistent. As a result, it was calculated that the “great firewall” of China is currently blocking about 311,000 domains, with 270,000 blocking working as it should, and another 41,000 domains, it seems, were blocked by accident.
The errors arose because the Chinese authorities tried to block domains using regular expressions to filter DNS, but they did not consider situations where a short domain is part of a longer domain name, and other sites were affected by the blocking. For example, the country’s authorities have banned access to reddit.com, while accidentally blocking booksreddit.com, geareddit.com and 1,087 other sites.
The research team compiled a list of 311,000 blocked domains to determine what type of content the Chinese authorities are banning the most. Using services like FortiGuard, researchers found that about 40% of blocked sites are newly registered domains, which the Chinese authorities block proactively until they categorize and whitelist their content.
As for other “prohibited” domains, most often they host business content, pornography, or information related to IT. Blocked sites also include sites that host tools to bypass blocking, gambling resources, personal blogs, entertainment portals, news and media sites, as well as domains with malicious and fraudulent content.
It is also interesting that after the start of the coronavirus pandemic, many domains related to COVID-19 were added to the blockings. Private domains included covid19classaction.it, covid19song.info, covidcon.org, ccpcoronavirus.com, covidhaber.net, and covid-19truth.info. Some of these sites contain material blaming China for the coronavirus pandemic.
“We found that most of the domains blocked by the ‘great Chinese firewall’ are unpopular and do not even make the list of the most popular sites,” the researchers said.
For example, out of a sample of 138,700 domains, only 1.3% of sites (about 1,800) are among the 100,000 most popular sites on the Internet (according to the Tranco rating).
In addition, the researchers said they have identified cases where Chinese DNS blocking, which usually involves altering the DNS records returned to Chinese users, accidentally corrupted DNS records outside of the Chinese Internet space on some DNS providers’ networks. Such errors affected at least 77,000 sites.