The following text is
copyright 2006 by Network World, permission is hearby given for reproduction,
as long as attribution is given and this notice is included.
A few petabytes,
congresscritters and lawyers to go.
By Scott Bradner
I started playing with digitized
literature almost 25 years ago. A
lot has changed in the digital books biz since then. Some of the history, current status, future possibilities
and clashing business models in this area were recently explored in a cover
"manifesto" in the New York Times Magazine by Wired writer Kevin
Kelly. Spoiler: it will all come
out fine in the end, but the length of time you will have to wait depends on
when Congress stops moving the copyright goal posts.
In the summer of 1982 a Classics
graduate student working in the computer lab I ran in the Harvard Psychology
Department got a copy of the Thesaurus Linguae Graecae, a large batch of
classical Greek literature that had been typed into computers someplace outside
of the US with David Packard paying the bill. I, along with people in the Harvard Classics and English
departments, convinced the university administration to pay for a huge, for the
time, 300 MB disk dive to store this text as well as a collection of Middle
English literature. Over the next
few years the graduate student, Greg Crane,
(http://www.perseus.tufts.edu/About/grc.html) now a professor at Tufts
University, put together the first version of what became the Perseus Project
(http://www.perseus.tufts.edu/PerseusInfo.html) a web-like mixture of text and
clickable links to other material (but done many years before the web and
search engines showed up).
This very well indexed, on-line, text changed what sort of
things that would be reasonable PHD thesis topics. Before Greg's work a student could get a thesis based on
years of index-card based investigations on how specific words where used in
classical Greek, after Greg that became a weekend task.
Kelly's Times Magazine article (http://www.nytimes.com/2006/05/14/magazine/14publishing.html)
explores what happens in a future where you might have petabytes of
digital material being attacked by cutting edge search engines. Kelly estimates that a 50 petabyte disk
farm could hold all the 32 million books, 750 million articles and essays, 25
million songs, 500 million images, 500K movies, TV shows and short films and
100 billion public web pages.
Quite a bit of the material is already digitized; new books, DVD movies
and CD music for example. The
article describes multiple projects underway to try to catch up with digitizing
older books and discusses the legal and access issues that congress ever
extending the copyright period is causing.
A few years ago in the column I quoted a student who told me
"if it is not on the web then it does not exist." ("How big is
the world? http://www.networkworld.com/archive/1999b/1018bradner.html)
The same point was reinforced last week when I suggested that a graduate
student see if he could find some information on a particular topic in the
library that was one floor down from my office and he admitted to only being in
the library once or twice and not to look anything up.
Kelly paints a picture where
physical libraries might not be needed, other than for books published by
companies whose lawyers are not ready to embrace a searchable digital
world. In Kelly's future world
books are no longer individual items but instead are parts of a vast relational
database on steroids where your biggest problem will be figuring out how to ask
the question you want the answer to. And to figure out what is left that could
be a good thesis topic. All in
all, a very good read.
disclaimer: If physical libraries fade away Harvard
is going to wind up with a lot of prime real estate that will be bitterly
fought over but I did not ask the view of the university library folk about the
NYT article so the above is my own review.