The following text is
copyright 2007 by Network World, permission is hearby given for reproduction,
as long as attribution is given and this notice is included.
Google: looking good by doing
less evil
By: Scott Bradner
Google made news in mid march by saying that they were going
to reduce the length of time that they keep of personally identifiable
information they are going to keep about their users from infinite to merely
obscene. There are some positives in this announcement but it mostly emphasises
how bad things are now, and will continue to be.
Google announced their new plans in a blog entry on March
14th. ("Taking steps to further improve our privacy practices"
http://googleblog.blogspot.com/2007/03/taking-steps-to-further-improve-our.html.) Google had been under pressure for a
log time over their assumption that it was fine to keep a lifelong record of
every search query each of their users ever executed along with the IP address
the query was executed from and a cookie ID to link together queries from your
computer even if the IP address changes.
Google is not alone in this belief, to one degree or another
all of the search engine companies have said they save the same basic
information -- although AOL says they do not keep IP addresses. Google does not
exactly say why they think they need to keep a record of all of your queries --
their log retention FAQ (http://216.239.57.110/blog_resources/google_log_retention_policy_faq.pdf)
says vaguely "We use this
information to improve the quality of our services and for other business
purposes. For example, we use this information for fraud detection and
prevention purposes, to identify system problems and to combat denial of
service attacks." But it is
reasonable to assume that the main reason they keep the logs is that they are
trying to get in our heads to see how we think so they can feed us ads that we
will respond to. Google has done
quite well in convincing advertisers that they know how to do this and the logs
are the way they do it.
But even given this actual reason for the logs its hard to
see that they need years worth of logs in which individual searchers can be
easily identified -- under their new policy they will maintain logs forever but
will do some simple tweaks to the data after 18-24 months to make it a little
harder to identify the individual searcher. These tweaks are not likely to be all that effective in
actually hiding people's identities as AOL found out when they released a pile
of similar data. (See Thanks for
nothing AOL http://www.networkworld.com/columnists/2006/082806bradner.html) I would think that the most reliable
information Google needs to know about me in order to target ads comes from the
last few months - its not all that often that I'll still be interested in a
topic I was looking at 4 years ago.
Google says that the 18-24 month duration was chosen to be
compatible with possible future data retention laws in various parts of the
world. But the FAQ admits that the
laws generally not exist yet and when they do the retention period could be as
short as 6 months. Why not make
the Google retention period be based on the laws where the hardware is
located?
Maybe Google just wants an excuse for long retention because
it is afraid that it has not yet thought of all the ways it can exploit the
information it has about us.
Google has come very late to the realization that some
people are worried about the information Google stores about them. This is a good first step but it would
be far better for Google to actually anonymize their information in a few days
or weeks rather than years.
disclaimer: Harvard does not forget easily, at least its
former students since they are a revenue source, but has not expressed an
opinion on others remembering activities so the above is my own opinion.