The following text is copyright 2006 by Network World, permission is hearby given for reproduction, as long as attribution is given and this notice is included.

 

Thanks AOL, you have endangered all of us

 

By Scott Bradner

 

As you all know by now, in early August someone over at AOL had a brain fart and posted a few months of raw search queries by 650,000 AOL customers.  The queries semi-anonymized by using ID numbers rather than IP addresses for the query source.  But as a number of people quickly found out, lots of individuals could be easily identified by looking at what they searched for.  The New York Times even published a front page story about one of the searchers they were able to identify.  What AOL did was stupid and, to their credit, AOL said so, but we are all in greater danger because of what AOL did even if we are not all AOL users.

 

Even though AOL removed their posting quite quickly the data has been mirrored in a number of places around the net (see http://www.gregsadetsky.com/aol-data/).  AOL's data is quite an interesting and sometimes deeply troubling snapshot of the psychology of the US in the Internet age.  A number of web sites that provide analysis or access to the data have sprung up in the last week or so (e.g., http://www.dontdelete.com/).  In addition to the Times, a number of other newspapers have done their own analysis of the data or have asked outside researches to do so.  For example the Wall Street Journal reported that over 3 million of the 26 million queries were "some form of explicit sexual search" vs. a little over 3,000 for "god".

 

I grabbed a copy of the data and poked around a bit.  There are a lot of people looking for not so nice things: over 800 queries included "child porn," 38 queries with both the words "build" and "bomb" in them, over 400 queries included "bestiality," over 131,000 queries included "nude," almost 39,000 queries that included the word "kill" (of which 900 were queries for "To Kill a Mockingbird"), over 20,000 queries included "HIV" and over 11,000 included "AIDS," and 284 queries included the word "toothpick."

 

Somewhat worrisome for AOL (and Google) was the fact that over half a million of the searches on the AOL search engine were in fact requests to use Google.  Google may not like it but "to google" is becoming the generic term for performing an Internet search.

 

The biggest problem with AOL releasing this data is not the potential privacy invasion of its own users, although that can be significant.  I expect a number of police agencies have already or will soon serve AOL with subpoenas demanding identification information for specific people who searched for suspicious content, building bombs or child porn for example. 

 

The biggest problem with what AOL did is that it destroyed the security through obscurity cover over all the data that the search engine companies have been collecting on their users.  Sure, a lot of people have been worried about this data, particularly after the US government in some type of wild goose chase subpoenaed huge amounts of raw search info from these companies.  (see "King George I on privacy" - http://www.networkworld.com/columnists/2006/013006bradner.html) But now every divorce lawyer, private eye or local cop will first think of this easy way to get dirt on someone.  Have fun trying to explain some of the searches you have done in the past to your soon-to-be ex's lawyer.

 

There are ways to search without leaving a trail, for example using anonymizing networks such as Tor (see http://tor.eff.org/) -- see this story on how to set it up to anonymize your google searches (http://www.infoshop.org/inews/article.php?story=20060813064045562).

 

It would be great if Congress would just put a halt to these massive databases but your privacy is apparently not a Congressional concern. (See "Congress fails to grasp security risk" http://www.networkworld.com/columnists/2006/081406bradner.html) So have your browser remove Google's cookies (See "Are Microsoft's cookies super?" http://www.networkworld.com/columnists/2006/051506bradner.html) and use an anonymizing net or public Internet kiosk if you want to do a search that might be hard to explain 5 years from now.

 

disclaimer:  Harvard is infrequently anonymous and even less frequently does web searches so the above is my own ramble and warning