OK. Short version of the story in InformationWeek: Woman puts up a website. She puts a “webwrap” agreement at the bottom – i.e. basically a contract that says if you use the site then you agree to the contract. Still some question as to whether such a mechanism is binding, but anyway…
So the Internet Archive of course comes along and indexes her site. Which apparently is a violation of the webwrap. So she sues, representing herself, I believe. The court throws out everything on a preliminary motion by IA except for the breach of contract.
InformationWork observes that “Her suit asserts that the Internet Archive’s programmatic visitation of her site constitutes acceptance of her terms, despite the obvious inability of a Web crawler to understand those terms and the absence of a robots.txt file to warn crawlers away.” (my emphasis). They then conclude with this statement:
If a notice such as Shell’s is ultimately construed to represent just such a “meaningful opportunity” to an illiterate computer, the opt-out era on the Net may have to change. Sites that rely on automated content gathering like the Internet Archive, not to mention Google, will have to convince publishers to opt in before indexing or otherwise capturing their content. Either that or they’ll have to teach their Web spiders how to read contracts.
They already have – sort of. It’s called robots.txt – the thing referred to above. For those of you who haven’t heard of this, its a little file that you put on the top level of your site and which is the equivalent of a “no soliciation” sign on your door. Its been around for at least a decade (probably longer) and most (if not all) search engines
From the Internet Archive’s FAQ:
How can I remove my site’s pages from the Wayback Machine?
The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine.
Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy.
You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org.
standardized methods of communications – privacy policies, etc. – more. Question is, will people be required to use it, or simply disregard and act dumb?