silly lawsuit of the week

OK. Short version of the story in InformationWeek: Woman puts up a website. She puts a “webwrap” agreement at the bottom – i.e. basically a contract that says if you use the site then you agree to the contract. Still some question as to whether such a mechanism is binding, but anyway…

So the Internet Archive of course comes along and indexes her site. Which apparently is a violation of the webwrap. So she sues, representing herself, I believe. The court throws out everything on a preliminary motion by IA except for the breach of contract.

InformationWork observes that “Her suit asserts that the Internet Archive’s programmatic visitation of her site constitutes acceptance of her terms, despite the obvious inability of a Web crawler to understand those terms and the absence of a robots.txt file to warn crawlers away.” (my emphasis). They then conclude with this statement:

If a notice such as Shell’s is ultimately construed to represent just such a “meaningful opportunity” to an illiterate computer, the opt-out era on the Net may have to change. Sites that rely on automated content gathering like the Internet Archive, not to mention Google, will have to convince publishers to opt in before indexing or otherwise capturing their content. Either that or they’ll have to teach their Web spiders how to read contracts.

(my emphasis).

They already have – sort of. It’s called robots.txt – the thing referred to above. For those of you who haven’t heard of this, its a little file that you put on the top level of your site and which is the equivalent of a “no soliciation” sign on your door. Its been around for at least a decade (probably longer) and most (if not all) search engines

From the Internet Archive’s FAQ:

How can I remove my site’s pages from the Wayback Machine?

The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine.

Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy.

You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org.

standardized methods of communications – privacy policies, etc. – more. Question is, will people be required to use it, or simply disregard and act dumb?

colophon

From time to time, you may notice that this blog isn’t working, or something is messed up, or you see an error message. One of the reasons I decided to setup this blog rather than using something like blogger, wordpress.com, etc., was to muck about with the bits and pieces from time to time. I find tweaking PHP code, looking at new plugins and editing themes to be a nice break from drafting 50 page master procurement agreements. In fact, I would have liked to do everything from the ground up (i.e. set up the box, linux, apache, mysql, php, etc.) but these days hosting service providers make the proposition of setting that up much less attractive. I figured taking care of some (but not all) of the bits and pieces would satisfy my tweaking desires. And let me keep somewhat acquainted with such things. Of course, not being an elite hacker inevitably leads to things that break from time to time.

Switched to WP Engine. Their service was too amazing to resist. I realized that I didn’t have time to do everything, after trying out Linode (great service, by the way).

Anyway, the great (and for the most part free) software and other stuff used to create techblawg.ca:

  • WordPress – amazingly great and overall very, very cool blogging software
  • Theme – Responsive by CyberChimps
  • MySQL – the stunning database engine that will one day take over the entire world, but which in the meantime serves as the back-end database storing all the bits and pieces for WordPress
  • PHP – the remarkably versatile scripting language that WordPress uses
  • Plugins – a whole bunch of little individual bits of code that plugin to WordPress to extend functionality in a million different ways. There is a long, long, long list of different plugins used on techblawg so for the time being I won’t be listing them all out here

Without the work and dedication of all the folks who created the tools listed above and made them freely available, many blogs (indeed, many sites) would simply not be in existence as it would have otherwise not been practical to create them. I guess this is the exact opposite of the tragedy of the commons.

Other things that power this blog are myself, David Ma, and huge quantities of coffee. Hope you enjoy it.