How big is the world?
By Scott Bradner
Network World, 10/18/99
An undergraduate student told me last year that "if it was
not on the Web then it did not exist." The "it"
she was talking about was research material.
She had a very important point (not to mention the implications a
statement like that has at a research university such as Harvard,
which has a breathtaking variety of available resources in its
libraries and museums). Most people - and most students are
people - are beginning to act as if the Web is the world's only
real data source. This is more than a bit troubling on many
fronts.
The Web is now big enough to pass at first glance for a world
surrogate. The Online Computer Library Center (OCLC) recently
published its annual research results at
www.oclc.org/oclc/research/projects/webstats/. OCLC projects that
there are some 3.6 million Web sites (+/- a 3% fudge factor) with
288 million Web pages (+/- 35%). The center only classifies
42,000 of those sites (+/- 30%) as adult sites, though those
sites sure do raise a political ruckus far in excess of their
numbers.
OCLC has quite a good methodology, which is well-explained in a
document reachable from the center's site. So you should feel
comfortable trusting the center's numbers as a first
approximation.
There is clearly a lot of stuff out there. But what are the
characteristics of what is there and what is not there?
One of the biggest problems with the'Net is knowing the
qualifications of those people creating and posting information.
A particular document could have come from a future Nobel Prize
winner writing in his field or it could have originated from a
demented teenager spewing out fantasies. Unquestioning reliance
on what you read on the'Net is as productive as unquestioning
reliance on what you read in a supermarket checkout line.
Another significant problem with using the 'Net as a primary or
sole source of information is that the 'Net is woefully
incomplete. Very little current information is actually online.
Some areas are far better represented than others, with the
national newspapers and some areas of scientific research leading
the way.
But there is a real dearth of material from most areas. This is
largely a result of the fact that most people like to get paid
for their labors. The Web is currently mostly no-cost access to
information. People with valuable content, such as most printed
books, tend to avoid putting it up lest they reduce sales of
their content. Out-of-print books might seem a good target
for'Net-based access, but copyright laws get in the way.
You're missing a lot if your world is just the Web.
Disclaimer: With 200K or so alumni, Harvard's world is the world.
But the above warning is my own.