Search Engine Showdown
StatisticsDirectoriesReviewsOthersRead More
FeaturesStrategiesNews SearchesMulti-Search EnginesPhone Numbers
Search Engine Statistics: Database Total Size Estimates
by Greg R. Notess

Data from search engine analysis run on Jan. 5, 1999

Northern Light115,455,526
MSN Web Search43,957,342

This table shows estimated total size of the search engines' available databases on the date the comparison was run. Both Northern Light and AltaVista have 'tricks' that can be used for an at the moment count. On Northern Light, use the Power Search limited to the Web and search count or not count. On the AltaVista advanced search enter an asterisk only in the Boolean expression box. Note different results when the Count documents is checked as opposed to when it is not. AltaVista's 'trick' reported about 130 million on Jan. 5th while Northern Light gave the exact same figure as the company provided.

Since only the Northern Light figure appears accurate, I used the number of hits reported by Northern Light and then estimated the size of the other databases using that number times the percentage of total hits from my 15 searches I used for reporting relative size. Since all the other numbers are based on Northern Light's numbers, this method is somewhat suspect. However, it does demonstrate some interesting inconsistencies between reported numbers and the number of records available when actually searching the databases.

So why these discrepancies? There are several factors to consider which may explain these results beyond the limit of basing the estimates on a small number of searches and on Northern Light's reported numbers.

The Inktomi-based search engines (HotBot, Snap, and MSN Web Search) are run on clusters of computers. According to Inktomi, at any point in time, some of the computers may be down for backup or other maintenance. Consequently, their entire database may not be searched. My estimates thus reflect what was available to be searched at the time I ran the searches.

AltaVista will time out on some searches and only deliver partial results. Since my numbers are based on actual number of hits found, that may cause AltaVista's size to be under-represented. On the other hand, if Inktomi and AltaVista do not have their full databases available to searchers, what is the use of that extra portion size if it is inaccessible?

Lycos does not index every word on Web pages, so its results will be lower in my estimates which are based on the number of results. However, since the full page is not indexed in Lycos, my measure may be a better guide for users trying to find unique words or phrases on the Web.

The other notes on the relative size page apply to these estimates as well.

While decisions about which Web search engine to use should not be based on size alone, this information should be part of the decision. See also statistics on the lack of overlap between the search engines, unique hits, dead links, and the change over time.