Data from search engine analysis run on Feb. 21, 2000.
For 25 specific, single word queries, Fast, (at either AlltheWeb.com or Lycos advanced search) found the most hits, followed by Northern Light and AltaVista. Excite and Google also show gains. On individual searches, Fast ranked first on 18 of the searches, yet on some searches other search engines found more (with one tie for first):
This chart compares the size of the databases of the Web search engines. For this comparison, I used 25 single keyword searches that are processed almost identically by each search engine.
This comparison is based on the reported number of hits
from each database, verified by visiting the last page of results when
possible. This is not a measure based on precision, recall, or relevance
but only on the raw database size. As such, it provides an important
measure of database coverage. For earlier comparisons see below:
Specific Database Notes
AltaVista clusters results, but this analysis used the Advanced Search which does not cluster by site. AltaVista is notorious for inconsistencies in reporting the number of hits it finds. Each search result set was checked and only the number of hits available for display was counted. Since the advanced search can only display the first 1,000 results, none of the search terms used reported more than that number. Because AltaVista can time out on a search and not give a full results set, their total database size may be under-represented here. However, it does reflect what searchers can find when using AltaVista.
Google includes some results (URLs) that it has not actually indexed. These can be readily identified by the lack of a extract or a "cached" copy. These are URLs which are linked from other pages but not necessarily yet verified by the Google spider. For this reason, the Google size will be reflected as larger than their database of fully indexed pages actually represents.
HotBot clusters results by site, and there is no way to uncluster them This makes it difficult to accurately measure the size of their database. For this comparison, the advanced search was used. All the top level domains in the results were noted and then the search was re-run using the domain limitation with all found top level domains ORed together. Though tedious, this effectively turned off the site clustering to find HotBot's total number of hits.
Northern Light automatically recognize and search the English-form of word variants and plurals. For that reason, only nonplural terms are used. Only the Web portion of Northern Light was searched, not their Special Collection. Northern Light also clusters hits by site with no ability to disable the site clustering. The number of reported hits was used, rather than trying to verify the number under each site. This could cause a misrepresentation of their size.
Excite provides no capability for searching all languages simultaneously (it defaults to English only). Therefore all the searches were done in each language and the resulting numbers combined to come up with the Excite total.
MSN Search will only display up to 200 hits, so their reported numbers above that amount could not be verified.
AOL Search includes the Open Directory, an AOL database, and an Inktomi database. Like the other search engines using an Inktomi database, only the Inktomi results were used.
Lycos provides access to the Fast database in its advanced search. That version is represented by Fast. On this chart, the column labeled Lycos is the regular Lycos search engine.
More details on the study's methodology provide an example of the comparison process used here.
While decisions about which Web search engine to use should not be based on size alone, this information is especially important when looking for very specific keywords, phrases, and areas of specialized interest. See also the following statistical analyses:
|A Notess.com Web Site
©1999-2007 by Greg R. Notess, all rights reserved
|Search Engine Showdown|