The Great Firewall of China is an important tool for the Chinese Government in their efforts to censor the Internet. It works, in part, by inspecting web traffic to determine whether or not particular words are present. If the Chinese Government does not approve of one of the words in a web page (or a web request), perhaps it says “f” “a” “l” “u” “n”, then the connection is closed and the web page will be unavailable — it has been censored.
This user-level effect has been known for some time… but up until now, no-one seems to have looked more closely into what is actually happening (or when they have, they have misunderstood the packet level events).
It turns out [caveat: in the specific cases we’ve closely examined, YMMV] that the keyword detection is not actually being done in large routers on the borders of the Chinese networks, but in nearby subsidiary machines. When these machines detect the keyword, they do not actually prevent the packet containing the keyword from passing through the main router (this would be horribly complicated to achieve and still allow the router to run at the necessary speed). Instead, these subsiduary machines generate a series of TCP reset packets, which are sent to each end of the connection. When the resets arrive, the end-points assume they are genuine requests from the other end to close the connection — and obey. Hence the censorship occurs.
However, because the original packets are passed through the firewall unscathed, if both of the endpoints were to completely ignore the firewall’s reset packets, then the connection will proceed unhindered! We’ve done some real experiments on this — and it works just fine!! Think of it as the Harry Potter approach to the Great Firewall — just shut your eyes and walk onto Platform 9¾.
Ignoring resets is trivial to achieve by applying simple firewall rules… and has no significant effect on ordinary working. If you want to be a little more clever you can examine the hop count (TTL) in the reset packets and determine whether the values are consistent with them arriving from the far end, or if the value indicates they have come from the intervening censorship device. We would argue that there is much to commend examining TTL values when considering defences against denial-of-service attacks using reset packets. Having operating system vendors provide this new functionality as standard would also be of practical use because Chinese citizens would not need to run special firewall-busting code (which the authorities might attempt to outlaw) but just off-the-shelf software (which they would necessarily tolerate).
There’s a little more to this story (but not much) and all is revealed in our academic paper (Clayton, Murdoch, Watson) which will be presented at the 6th Workshop on Privacy Enhancing Technologies being held here in Cambridge this week.
NB: There’s also rather more to censorship in China than just the “Great Firewall” keyword detecting system — some sites are blocked unconditionally, and it is necessary to use other techniques, such as proxies, to deal with that. However, these static blocks are far more expensive for the Chinese Government to maintain, and are inherently more fragile and less adaptive to change as content moves around. So there remains real value in exposing the inadequacy of the generic system.
The bottom line though, is that a great deal of the effectiveness of the Great Chinese Firewall depends on systems agreeing that it should work … wasn’t there once a story about the Emperor’s New Clothes ?