Web content labelling

As we all know, the web contains a certain amount of content that some people don’t want to look at, and/or do not wish their children to look at. Removing the material is seldom an option (it may well be entirely lawfully hosted, and indeed many other people may be perfectly happy for it to be there). Since centralised blocking of such material just isn’t going to happen, the best way forward is the installation of blocking software on the end-user’s machine. This software will have blacklists and whitelists provided from a central server, and it will provide some useful reassurance to parents that their youngest children have some protection. Older children can of course just turn the systems off, as has recently been widely reported for the Australian NetAlert system.

A related idea is that websites should rate themselves according to widely agreed criteria, and this would allow visitors to know what to expect on the site. Such ratings would of course be freely available, unlike the blocking software which tends to cost money (to pay for the people making the whitelists and blacklists).

I’ve never been a fan of these self-rating systems whose criteria always seem to be based on a white, middle-class, presbyterian view of wickedness, and — at least initially — were hurriedly patched together from videogame rating schemes. More than a decade ago I lampooned the then widely hyped RSACi system by creating a site that scored “4 4 4 4”, the highest (most unacceptable) score in every category: http://www.happyday.demon.co.uk/awful.htm and just recently, I was reminded of this in the context of an interview for an EU review of self-regulation.

However, leaving aside what the value of rating, the key problem with the notion of self-rating websites, and this was quickly obvious even in the first flush of enthusiasm in the mid-1990s, is that it is really very difficult for a webmaster to rate their site in an honest and helpful manner without going to considerable effort. This means that it is just not economic to spend your money so as to assist the small minority of visitors who might be offended by some small aspect of your content.

So The Sun cheerfully put a correct “may contain partial nudity” onto their “page 3” pictures of topless women, but to avoid having to think about such things they scored all their news pages “1 1 1 1”, even on the day when the headline was “Boy 12 rapes Girl 11” and the story gave some of the details…

For webmasters who wanted to make an honest attempt at correctly rating every single one of their pages, and one was Sylvia Spruck Wrigley at Demon Internet (where I worked in the 1990s), the whole process was extremely time consuming. I recall her spending ages trying to work out how to rate the pages within a Guy Fawkes themed section of the website; what was a suitable rating for a page that mentioned 1605 interrogation techniques and the punishment for treason?

The RSACI scheme was eventually swept up into ICRA, and PICS was developed as a meta-system for handling the “algebra” when multiple ratings schemes exist in parallel. However, because it was hard work to do it properly and only a minority of webmasters could be bothered to even produce generic “1 1 1 1” labels for their corporate sites, the idea of self-labelling their sites has almost died away (in dot-com speak, it no longer has mindshare). Nevertheless, despite continued evidence of its failure, the approach continued to get support from EU funds (so here’s a relaunch in late 2000), and even in 2007 Microsoft continue to ship a Content Advisor so that parents can set the particular rating levels that their offspring may visit.

An easy category to understand is “bad language”, so the ICRA rating scheme has the categories:

4: Abusive or vulgar terms
3: Profanity or swearing
2: Mild expletives
1: None of the above

and a parent can select what their kids are permitted to view, perhaps with a different value for the 8 year old and the 10 year old (older children are generally considered to be more mature).

In passing one might note that modern approaches to labelling such as this W3C effort are looking at much more complex scenarios and subtlety of labelling, going way beyond 1,2,3,4. If they are ever implemented, they’ll pose a considerable challenge to user interface designers!

One of the groups who still think that self-rating is a Good Idea is politicians (it’s an easy out — they don’t need to consider censoring anything; webmasters will self-label their pages and thereby quietly censor themselves from being viewed by those who set Content Advisor to show that they care). Of course politicians get to tell people what to do, so they have told their civil servants to put ICRA ratings onto UK Government websites.

You can see Content Advisor in action at the Department for Culture Media and Sport. They proudly display an ICRA logo on their front page, and they have labelled their gambling pages (they look after gaming, horse racing etc) to show that there is indeed discussion of gambling. Some parents may have thought that they were only blocking poker websites, or discussions of horse-racing odds, but the DCMS webmaster has made sure they can’t find out about the what the Gambling Act 2005 does either.

Surprisingly perhaps, there’s a certain amount of “bad language” on Government websites and you might think that this would be easy for the webmasters to get right, if they cared (and had the time to care). However, the evidence is that even getting this right is too much trouble.

For example, looking at the Home Office website, it currently hosts 14 documents that contain the word “fuck” and 4 that contain the word “cunt” (their research studies and reports of prison inspections often report witnesses in their own words). However, they are all in PDF files — so Content Advisor won’t prevent them being viewed 🙁

The Department of Health had five “fucks” when I last looked (though mysteriously Google doesn’t seem to fully index their site at the moment) and these are in HTML documents, and so it is perfectly possible to label them correctly. However, to take one example, Chapter 19 of the Inquiry into Child Abuse in North Wales is viewable here, but when one inspects the ICRA tags for the page:

<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/Browsable/ DH_4927518" r (nz 1 vz 1 lz 1 oz 1 cz 1) "http://www.rsac.org/ratingsv01.html" l gen true for "http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/Browsable/ DH_4927518" r (s 0 n 0 v 0 l 0))'/>

or rather more simply, feeds the page’s URL into the ICRA label checker one discovers that there is no “potentially offensive language” on the page. However, when one reads it one finds:

B’s only complaint about Ysgol Talfryn is that he was assaulted by a teacher there at about 10.10 am on 22 February 1989. His account of this was that the teacher (Y) asked him, unusually, to read aloud, whereupon B told Y to “fuck off”.

My! How standards have fallen at the Department of Health. This text is apparently not “abusive or vulgar”, “profanity or swearing” or even a “mild expletive”…. why, back in my grandmother’s day, even the now innocuous “bloody” would have frightened the horses!

Richard,

One major issue that you indirectly touched on in your last paragraph is that the ratings are effectivly a static point in time, that is You/Somebody decides on such and such a day how it should be rated against the public moral sensability current belived to be prevalent on that day.

Also it is not just “curse” / “Swear” words that change their meaning with time, one extream example being the expression “bachelor gay” which once ment somebody who was activly chasing women (ie lothario / rake / blade). Today most people would give it an almost oposit meaning.

There is also the dual use of a word within a culture which can cause problems as the “The Howard League for Penal Reform” found out.

Also words have different meanings depending on the culture such as Nonce and Decrypt which are common in the security volcabulary. But nonce is used in Britain as slang for a peadophile and in many cultures Decrypt means the same as exhume.

I suspect that any rating system is doomed to fail for these fairly fundemental reasons as well as the more obvious ones.

Other rating systems such as by an official organisation such as the “Board of Film Censors” is obviously not scalable and at the other extream a “web of trust” system (similar to that of PGP Key Rings) where individuals give a page a rating that others might look at is going to have pitfalls in it as has been seen on E-Bay for rating sellers.

From the technical perspective I suspect you are right when you say “so people cast around for other solutions, in my view pointlessly” in that there are none that are going to work.

However from the political perspective people do not want to hear this as they do not want their little “Jimmy” / “Cindy” being subject to such “moraly offensive” materials. So it is easy for Politicians to spend other peoples money on tilting at windmills as long as it improves their raitings. From this perspective (and the kick backs into party funds from greatfull technology companies) these systems are a great success so expect to see many many more of them…

The real solution (though nobody wants to talk about it) is of course the old time tested one of “self responsability”. That is as part of their learning process nearly all children and quite a few adults are going to be curious about many things. And sometimes curiosity leads to them being hurt which is one of the reasons most of us do not put our hands into flames.

Sometimes even quite inocent searches (black holes for instance) will turn up unexpected and possibly quite nasty content this cannot be avoided in a free / open society. Parents need to accept this as one of the ground rules and either educate their children to expect that there will be things out their that will be upsetting and how to deal with it or to compleatly remove access. As history shows the latter option is always doomed to fail.

Oh where you say, “white, middle-class, presbyterian” they have a four letter acroynim for this in the U.S. which is WASP (White Anglo Saxon Protestent) which has alwaysed mildly amused me due to “waspish behaviour” being an almost perfect description of the way they behave 8)

5 thoughts on “Web content labelling”

Nick Towner says:

2007-09-18 at 09:21 UTC

Before Internet became so popular, we had similar problems in the telephony world: consumers asked the phone company to block calls to premium rate numbers from their line, only to discover that they could no longer call the out-of -hours GP service any more, either. In the Netherlands, this led to a switch to the 0900 prefix for serious services, 0909 for entertainment and 0906 for services with an erotic character, and disputes over which numbers existing services should get.

Dave Berry says:

2007-09-24 at 20:26 UTC

So, you don’t have any answers, just a general feeling of self-satisfied smugness that you want to share with the world. Congratulations!

Richard Clayton says:

2007-09-24 at 20:46 UTC

I was pointing out that self-rating isn’t an answer and why this is the case. Since the scheme is still being endorsed by politicians a decade after its flaws were apparent to everyone else, this seems a useful thing to do, since it could prevent further wasted effort.

The technical answer to blocking bad things is filtering software run by the consenting on their own (end-user) machines. However, this isn’t cheap — so people cast around for other solutions, in my view pointlessly. So you’re quite right in perceiving that apart from that I have no other answers.

Clive Robinson says:

2007-09-27 at 12:05 UTC

Richard,

One major issue that you indirectly touched on in your last paragraph is that the ratings are effectivly a static point in time, that is You/Somebody decides on such and such a day how it should be rated against the public moral sensability current belived to be prevalent on that day.

Also it is not just “curse” / “Swear” words that change their meaning with time, one extream example being the expression “bachelor gay” which once ment somebody who was activly chasing women (ie lothario / rake / blade). Today most people would give it an almost oposit meaning.

There is also the dual use of a word within a culture which can cause problems as the “The Howard League for Penal Reform” found out.

Also words have different meanings depending on the culture such as Nonce and Decrypt which are common in the security volcabulary. But nonce is used in Britain as slang for a peadophile and in many cultures Decrypt means the same as exhume.

I suspect that any rating system is doomed to fail for these fairly fundemental reasons as well as the more obvious ones.

Other rating systems such as by an official organisation such as the “Board of Film Censors” is obviously not scalable and at the other extream a “web of trust” system (similar to that of PGP Key Rings) where individuals give a page a rating that others might look at is going to have pitfalls in it as has been seen on E-Bay for rating sellers.

From the technical perspective I suspect you are right when you say “so people cast around for other solutions, in my view pointlessly” in that there are none that are going to work.

However from the political perspective people do not want to hear this as they do not want their little “Jimmy” / “Cindy” being subject to such “moraly offensive” materials. So it is easy for Politicians to spend other peoples money on tilting at windmills as long as it improves their raitings. From this perspective (and the kick backs into party funds from greatfull technology companies) these systems are a great success so expect to see many many more of them…

The real solution (though nobody wants to talk about it) is of course the old time tested one of “self responsability”. That is as part of their learning process nearly all children and quite a few adults are going to be curious about many things. And sometimes curiosity leads to them being hurt which is one of the reasons most of us do not put our hands into flames.

Sometimes even quite inocent searches (black holes for instance) will turn up unexpected and possibly quite nasty content this cannot be avoided in a free / open society. Parents need to accept this as one of the ground rules and either educate their children to expect that there will be things out their that will be upsetting and how to deal with it or to compleatly remove access. As history shows the latter option is always doomed to fail.

Oh where you say, “white, middle-class, presbyterian” they have a four letter acroynim for this in the U.S. which is WASP (White Anglo Saxon Protestent) which has alwaysed mildly amused me due to “waspish behaviour” being an almost perfect description of the way they behave 8)

Philip says:

2007-10-01 at 13:06 UTC

There is an in-built offending-and-or-dangerous-content blocker included with my broadband internet package. It seems to think that google groups is offensive and blocks the site. Admittedly, I did search there to find advice how to cure the router from blocking all bar two computers/MAC addresses to be connected at the same time.

The problem of putting up barriers in the internet has two sides – on the one hand there are governments and other authorities (e.g. religious or commercial entities) that may want to censor unfavourable reports.
On the other hand, there are individuals who may want to block certain web content themselves – either to ensure their children are completely sandboxed from any harm, or to remove adverts and certain unwanted web sites and increase browsing ‘efficiency’. E.g. one could use content rating to only find technical information when searching for some kind of gadget, but no sales offers (which completely spam any search engine’s result).
Naturally, self characterisation of web sites is doomed for failure. No advertisers or merchants would label their sites (or banners) such that they can be sorted out easily.

I think it would be worth a discussion whether these kind of tools would give censorship an easy game, or whether there is a benefit to end users in terms of ‘efficiency’.

Light Blue Touchpaper

Security Research, Computer Laboratory, University of Cambridge

5 thoughts on “Web content labelling”

Leave a Reply Cancel reply