We recently discussed search engines ( http://www.langa.com/newsletters/2001/2001-01-04.htm#2 , http://www.langa.com/newsletters/2001/2001-01-08.htm#1 , http://www.winmag.com/columns/explorer/2001/01.htm ), and that prompted reader Rod Padrick to write about an amazing site he found:
One of the pages includes "Deep Web Sites" which indicates that the 60 known, largest deep Web sites contain data of about 750 terabytes (HTML included basis), or roughly 40 times the size of the known surface Web. These sites appear in a broad array of domains from science to law to images and commerce. The total number of records or documents within this group is about 85 billion.
Basically, the folks at BrightPlanet found that "Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request." Ordinary "spider" indexing of "surface" web sites misses this content, which BrightPlanet says is truly vast:
- Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web
- The deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the surface Web
- The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web
- More than an estimated 100,000 deep Web sites presently exist
- 60 of the largest deep Web sites collectively contain about 750 terabytes of information – sufficient by themselves to exceed the size of the surface Web by 40 times
- On average, deep Web sites receive about 50% greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep Web site is not well known to the Internet search public
- The deep Web is the largest growing category of new information on the Internet
- Deep Web sites tend to be narrower with deeper content than conventional surface sites
- Total quality content of the deep Web is at least 1,000 to 2,000 times greater than that of the surface Web
- Deep Web content is highly relevant to every information need, market and domain
- More than half of the deep Web content resides in topic specific databases
It’s amazing reading, and you’ll find the full report at http://www.completeplanet.com/tutorials/deepweb/index.asp .