This story appeared on Network World Fusion at

http://www.nwfusion.com/columnists/2005/032805bradner.html

'Net Insider

Refusal, ignorance, arrogance or PR?

By Scott Bradner

Network World, 03/28/05

Scott Bradner

In mid-March, French news service Agence France Presse sued Google in a U.S. District Court for copyright violations. The news service demanded that Google stop including its material on the Google News site and asked for $17.5 million in compensatory damages. You will pardon me if I express some doubts about the actual motivation for this lawsuit.

I've written in the past about Google News. I consider it one of the most useful sites on the Internet. I use it to fill out the news snippets that I get from most other news sources. That said, I get frustrated at Google News links to subscription-only sites because I can't access some of the stories that look interesting. I've always assumed that such sites welcome Google's pointers because they get free advertising for themselves and thus might get some additional customers.

In that context, this lawsuit makes me wonder what's up with AFP. Google News doesn't show full articles, so I find it hard to understand what damage could mount up to more than $17 million - maybe AFP has a very high opinion of its ability to come up with inventive headlines and feels that other news organizations will rip them off if the headlines, which Google News does show, are visible. Or maybe the reason that AFP doesn't want Google News to point to its material is that AFP fears getting more subscribers will mean it would have to hire more people to deal with them.

Even if I don't understand why a company in the business of selling its services does not want more people to know about those services. It doesn't look like it would be all that hard for AFP to ensure that Google skips over its sites. Google has an easy-to-find Web page that says quite clearly how to keep a site from being scanned (www.google.com/remove.html). Basically, all you do if you want Google to skip all or part of your site is put one or more files named "robots.txt" in your Web site. For example, your whole site will be skipped if you have such a file at the root of your Web server containing these two lines:

User-agent: *

Disallow: /

Robots.txt files can get quite fancy.

I suppose it's possible that the Google News Web crawlers don't pay attention to the robots.txt files that Google says it respects for its other Web crawling, but that doesn't seem likely. It is likelier that AFP somehow didn't know how easy it would be to do 2 minutes worth of work itself, on its own Web site, to ensure that its material would not be included. A tactic that would have taken far less effort than, as the news service claims to have done, pestering Google to try to get it to stop scanning. It also would have taken far less effort than filing a lawsuit. Well, maybe it's not all that likely that no one at AFP knew about robots.txt files - maybe there is some other reason it didn't take the easy path. The two that spring to mind are arrogance ("stop," said King Canute to the tide, "splash," said the tide to King Canute) or a desire for publicity.

Disclaimer: Of course you never see either arrogance or a desire for publicity in relationship to Harvard, so the above observation is mine alone.