This story appeared on Network World Fusion at
http://www.nwfusion.com/columnists/2005/032805bradner.html
'Net Insider
Refusal, ignorance, arrogance or PR?
By Scott Bradner
Network World, 03/28/05
Scott Bradner
In mid-March, French news service Agence France Presse sued Google
in a U.S. District Court for copyright violations. The news service demanded
that Google stop including its material on the Google News site and asked for
$17.5 million in compensatory damages. You will pardon me if I express some
doubts about the actual motivation for this lawsuit.
I've written in the past about Google News. I consider it one of
the most useful sites on the Internet. I use it to fill out the news snippets
that I get from most other news sources. That said, I get frustrated at Google
News links to subscription-only sites because I can't access some of the
stories that look interesting. I've always assumed that such sites welcome
Google's pointers because they get free advertising for themselves and thus
might get some additional customers.
In that context, this lawsuit makes me wonder what's up with AFP.
Google News doesn't show full articles, so I find it hard to understand what
damage could mount up to more than $17 million - maybe AFP has a very high
opinion of its ability to come up with inventive headlines and feels that other
news organizations will rip them off if the headlines, which Google News does
show, are visible. Or maybe the reason that AFP doesn't want Google News to
point to its material is that AFP fears getting more subscribers will mean it
would have to hire more people to deal with them.
Even if I don't understand why a company in the business of
selling its services does not want more people to know about those services. It
doesn't look like it would be all that hard for AFP to ensure that Google skips
over its sites. Google has an easy-to-find Web page that says quite clearly how
to keep a site from being scanned (www.google.com/remove.html). Basically, all
you do if you want Google to skip all or part of your site is put one or more
files named "robots.txt" in your Web site. For example, your whole
site will be skipped if you have such a file at the root of your Web server
containing these two lines:
User-agent: *
Disallow: /
Robots.txt files can get quite fancy.
I suppose it's possible that the Google News Web crawlers don't
pay attention to the robots.txt files that Google says it respects for its
other Web crawling, but that doesn't seem likely. It is likelier that AFP
somehow didn't know how easy it would be to do 2 minutes worth of work itself,
on its own Web site, to ensure that its material would not be included. A
tactic that would have taken far less effort than, as the news service claims
to have done, pestering Google to try to get it to stop scanning. It also would
have taken far less effort than filing a lawsuit. Well, maybe it's not all that
likely that no one at AFP knew about robots.txt files - maybe there is some
other reason it didn't take the easy path. The two that spring to mind are
arrogance ("stop," said King Canute to the tide, "splash,"
said the tide to King Canute) or a desire for publicity.
Disclaimer: Of course you never see either arrogance or a desire
for publicity in relationship to Harvard, so the above observation is mine
alone.