If you’d tried to download a file from either filesonic or fileserve some time between 3am on 15/04/2011 to 7pm on 16/04/20111 you might’ve noticed something kind of odd. You might have got an error similar to one of the following.
Your first assumption for the first case, as mine was, might be that you’ve been assigned an IP address previously assigned to someone which has failed the CAPTCHA many times. However, the second claims someone else is downloading at the same time. At first glance, this seems to only leave two possibilities: someone else on your connection is downloading a file (nope) or ISP doing large scale NAT (nope, which is a relief).
After a brief IM with an acquaintance who has clearly done their homework, but wishes to remain anonymous I was informed I’d been IWF’d, something that didn’t sound particularly pleasant and, as it turns out, isn’t.
Wait, who are the IWF? What are they doing exactly?
The IWF are the quango who have taken it upon themselves to filter bits of the Internet in the UK. ISPs then subscribe to a list of blocked domains and individual URLs. Enforcing the blocking of an individual domain is relatively easy: don’t respond to DNS requests if the user happens to be using your nameserver, but more importantly, drop any packets to and from the associated ip. Therefore this can be done at the IP layer.2
Sites which allow user content, including one-click file hosting sites (eg Rapidshare) present a more complex challenge for censors. Clearly since anyone can upload files to them, people can upload the types of files the IWF block. Although these types of files will almost always eventually be removed by the administrators of such sites following a complaint, there will inevitably be a delay between a complaint being filed and an actual take down. In this interim period, the IWF want to prevent the files from being downloaded. However, blocking the whole domain would be too agressive. Therefore, the IWF want to only block individual URLs. Unfortunately, this can only be done at the HTTP layer since URLs exist at this layer, which means its necessary to proxy all traffic through a HTTP proxy. This is done by your ISP routing any packets addressed to, say Fileserve’s IP address, to their own proxy instead of to Fileserve’s servers. Your ISP’s proxy then checks to see if the url is blocked. If it is, several different things may happen depending on your ISP. In some cases the connection is dropped. In others a 404 File Not Found (dishonest) or a 403 Forbidden (honest) is returned. If the URL isn’t blocked, the ISP’s proxy will make the request on your behalf to Fileserve and respond with Fileserve’s response.
Actually it turns out this second technique is fully general, so there’s no need to use the first one, so CleanFeed and co don’t.
How does this relate to the original problem?
Putting lots of people behind a proxy breaks many web services. This is because as far as the service is concerned, it appears as if all of a particular ISP’s users are coming from a few IP address (for example, Virgin Media Broadband have just three proxy servers). Many web services make the (often false, but close enough to true to be workable) assumption that one IP address means one user. Perhaps the most famous case of IWF causing this type of breakage is when they blocked Wikipedia’s Virgin Killer page thereby breaking anonymous edits. In this case, it only takes a relatively small proportion of Fileserve users to fail the CAPTCHA for everyone to be locked out. Even if that weren’t the case, the one file per IP policy of both Fileserve and Filesonic ensure that only a few people from a whole ISP will be able to download at once.
How can I be sure I’ve been IWF’d
(This answer is likely to be more technically in depth than the rest of the post, so feel free to skip it.) There are many different techniques and this is an area in which your mileage is likely to vary significantly given each ISP has a different set up for filtering traffic. Two particular systems are Cleanfeed and WebMinder. In general though there are three families of techniques:
- Inspecting the HTTP response headers.
- Using traceroute in various ways.
- Convincing the website to display what it thinks your IP address is and comparing it to your external IP received from an unblocked site, say whatismyip.com.
The first way is probably easiest. Here’s an example from Virgin Media Broadband:
$ telnet filesonic.com 80 Trying 78.140.176.180... Connected to filesonic.com. Escape character is '^]'. HEAD / HTTP/1.0 HTTP/1.0 301 Moved Permanently Server: nginx Date: Content-Type: text/html Content-Length: 178 Location: http://www.filesonic.com/ Age: 0 Via: HTTP/1.1 webcache1-know.server.virginmedia.net (Traffic-Server/5.7.0-59705 [cMs f ])
The important line here is the Via header. The HTTP standard specifies that proxy servers must insert this, however some are naughty and don’t. If your response has a Via header in and you aren’t explicitly using a proxy server, then its presence probably indicated you’re being IWF’d. It is possible that some reverse proxies will insert a Via header. To be sure, you should look at the hostname of the proxy server after the “Via: “. If it contains your ISP’s name, then it seems very likely you’re being IWF’d. Instead of telnet, you may prefer to use the header inspection tool in your web browser (eg in Chrome spanner→tools→developer tools).
Unfortunately, most ISPs aren’t this up front. Therefore it is likely you will have to resort to using traceroute in most cases. One technique with traceroute is to
- Do a traceroute.
- Check each intermediate node against a list of known IWF proxies. One such list was created during the Wikipedia incident. Even if there isn’t an exact match, it is worth being suspicious in the case that one of the nodes is in the same class C (or possibly, as is the case with Virgin Media Broadband, class B) subnet as a known IWF proxy. It’s probably also wise to look for suspicious names in the hostname such as webblock, proxy, or webcache.
Another technique is to traceroute multiple IP addresses known to not be IWF’d and your suspected IWF’d address. If it’s IWF’d the path should deviate from the normal path sooner, indicating the packet is making its way to your ISP’s proxy, which is within the ISP’s network.
Some ISPs may route the ICMP packets sent by standard traceroute differently from the HTTP traffic we’re concerned about. Therefore it might be desirable to use tcptraceroute with 80 as the destination port for this purpose.
Anyway, I’d be very interested if people want to post their experience with detection of this on different ISPs in the comments.
Edit 18/04/2011: These techniques are actually detecting the presence of a transparent proxy. It is the use of a transparent proxy which causes all these symptoms. Some ISPs use transparent proxies not only for implementing the IWF blacklist, but for other purposes too. One such purpose is caching, since this decreases the ISP’s usage of networks upstream, therefore decreasing their bills. This practice has been prevalent in the past. I was under the impression that it had largely fallen out of favour, but it has been suggested that Virgin Media Broadband use this technique for caching purposes. I have not confirmed this yet.
Edit 18/04/2011: Virgin Media Broadband do not use transparent proxies for caching any more, although they definitely have in the past. Keep in mind that it is possible that your ISP is using a transparent caching proxy though. Although, if they are then to be honest you really should stop using an ISP stuck in the nineties.
Edit 19/04/2011: It looks like there’s a site called censorleaks.com which may be able to tell you whether a site has been IWF’d automatically. I can’t vouch for the site’s accuracy. I believe the name is a misnomer; they are actually probably using a technique similar to one of the one’s outlined here on an ISP known to implement the IWF blacklist rather than having direct access to a copy of the IWF blacklist.
People within the IWF must view illegal content to verify whether it should be taken down. Can’t they be arrested?
Apparently, and this is nth hand information where n≥3, there are six people in the UK who have special permission to view this content and the verification is done in a sealed room. So no.
Edit 17/04/2011: As is so often the case with such hearsay, this is incorrect, or at least not the full story. Rather the police have published a Memorandum of Understanding which is more general purpose. Thanks to mkb for pointing this out. The IWF are mentioned in particular in the memo so it’s possible the original statement related to there being six employees of the IWF who do the actual checking.
What can I do about it?
As a workaround the usual trick of using your own proxy based somewhere outside IWF land works. This includes other countries, but also some ISPs within the UK since IWF blocking doesn’t necessarily apply to all Internet providers within the UK. The IWF have a list of companies who recieve their blocklist. As a rule of the thumb, most home ISPs have it, but some commercial and educational ISPs may not. For example, JANET does not subscribe to the IWF blocklist and therefore the situation will vary from University to University.
You are unlikely to have much luck with public proxies since in this case, again, you are sharing an IP with many other people. Still, given there are so many public proxies, you may have some luck if you manage to find an unpopular one.
Longer term, it is important to keep in mind that ISPs subscribe to the IWF voluntarily. The reason for them even bothering at all is there’s a large lobby including politicians and tabloids3 who conflate being able to access certain urls with the act of abusing a child. The only way to counter a large lobby is to create your own. This is left as an exercise for the reader.
- The actual interval is probably larger, but this was the only time I was monitoring. As of writing it seems like it may still be in effect for filesonic. ↩
- Well, maybe not, since this might result in over aggressive blocking for where there isn’t a one to one mapping between IPs and domains/web sites, so DNS manipulation may be preferable. Of course in that case the website may still be accessible by visiting the IP address but manipulating the Host header. Anyway, I digress. ↩
- I include The Observer in this, who may very well have helped pave the way for bringing the IWF into existence with their mud flinging claim that Clive Feather (a director of Demon) “provides paedophiles with access to thousands of photographs of children being sexually abused” by providing unfiltered USENET access. ↩









