How big is the world?

By Scott Bradner
Network World, 10/18/99

An undergraduate student told me last year that "if it was not on the Web then it did not exist." The "it" she was talking about was research material.

She had a very important point (not to mention the implications a statement like that has at a research university such as Harvard, which has a breathtaking variety of available resources in its libraries and museums). Most people - and most students are people - are beginning to act as if the Web is the world's only real data source. This is more than a bit troubling on many fronts.

The Web is now big enough to pass at first glance for a world surrogate. The Online Computer Library Center (OCLC) recently published its annual research results at www.oclc.org/oclc/research/projects/webstats/. OCLC projects that there are some 3.6 million Web sites (+/- a 3% fudge factor) with 288 million Web pages (+/- 35%). The center only classifies 42,000 of those sites (+/- 30%) as adult sites, though those sites sure do raise a political ruckus far in excess of their numbers.

OCLC has quite a good methodology, which is well-explained in a document reachable from the center's site. So you should feel comfortable trusting the center's numbers as a first approximation.

There is clearly a lot of stuff out there. But what are the characteristics of what is there and what is not there?

One of the biggest problems with the'Net is knowing the qualifications of those people creating and posting information. A particular document could have come from a future Nobel Prize winner writing in his field or it could have originated from a demented teenager spewing out fantasies. Unquestioning reliance on what you read on the'Net is as productive as unquestioning reliance on what you read in a supermarket checkout line.

Another significant problem with using the 'Net as a primary or sole source of information is that the 'Net is woefully incomplete. Very little current information is actually online. Some areas are far better represented than others, with the national newspapers and some areas of scientific research leading the way.

But there is a real dearth of material from most areas. This is largely a result of the fact that most people like to get paid for their labors. The Web is currently mostly no-cost access to information. People with valuable content, such as most printed books, tend to avoid putting it up lest they reduce sales of their content. Out-of-print books might seem a good target for'Net-based access, but copyright laws get in the way.

You're missing a lot if your world is just the Web.

Disclaimer: With 200K or so alumni, Harvard's world is the world. But the above warning is my own.