Search Engine Showdown
 
 

Review ofGoogle

Last updated Oct. 10, 2006.
by Greg R. Notess

Google has become for many the pre-eminent Web search engine. In Feb. 1999 it moved from Alpha test version to Beta and officially launched Sept. 21, 1999.

Since that time it has made its mark with its relevance ranking based on link analysis, cached pages, and aggressive growth. Since its beta release, it has had phrase searching and the - for NOT, but it did not add an OR operation until Oct. 2000. In Dec. 2000, it added title searching. In June 2000 it announced a database of over 560 million pages, which grew to over 600 million by the end of 2000 and then 1.5 billion in Dec. 2001. The 2+ billion reported on their home page as of April 2002 includes indexed pages, unindexed URLs, and other file formats. By Nov. 2002, they moved their claim up to 3 billion, and in Feb. 2004 it went to 4 billion. While no official claim is given, 20+ billion is once current estimate.  Use the table of contents on the left to navigate this review.

Databases:

Google also has many other databases and services. Above their regular Web results, hits may be displayed from these other databases in what Google calls OneBox results. In addition it offers several specialized subsets: a government database of the .gov and .mil sites; University searches; a Linux search; an Apple/Macintosh search; and a Microsoft search.

The Google database is used by AOL and several other sites. Yahoo! dropped Google in Feb. 2004 in favor of its own database, after it had switching from Inktomi to Google in July 2000 and then reaffirmed and more closely integrated Google results in Oct. 2002

Strengths:
  * Size and scope: It is one of the largest, and includes PDF, DOC, PS, and many other file types
  * Relevance based on sites' linkages and authority
  * Cached archive of Web pages as they looked when they were indexed
  * Additional databases: Google Groups, News, Directory, Books, Scholar, etc. (see above).

Weaknesses: See also the Google Inconsistencies Page
  * Limited search features: no nesting, no truncation, does not support full Boolean
  * Link searches must be exact and are incomplete
  * Only indexes first 101 KB of a Web page and about 120 KB of PDFs
  * May search for plural/singular, synonyms, and grammatical variants without telling you
  * Not as comprehensive as legend has it

Default Operation:
Multiple search terms are processed as an AND operation by default. Phrase matches are ranked higher

Boolean Searching:
Google uses an automatic Boolean AND between terms and has slowly been moving towards more Boolean support; however, it does not yet support the AND operator, NOT operator, or full Boolean searching with the ability to nest operators. In Feb. 1999, Google added the - symbol to perform a NOT function. In Oct. 2000, they added the ability to use an OR (which must be in upper case) to do some Boolean OR operations. See the Boolean Searching on Google page for more details on how to get Google to do certain kinds of Boolean searches.

The + used to be able to be used to require a term, but since the default operation was AND, the + was never really needed and for a while caused the following message to appear:

Google always searches for pages containing all the words in your query, so you do not need to use + in front of words.

However, the + can be used for forcing a search on stop words and for requiring Google to search for only that exact term without any possible plural/singular, synonyms, and grammatical variants.

Proximity Searching:
In Feb. 1999, Google added phrase searching designated in the usual manner by enclosing the phrase in "double quotes." Google also detects phrase matches even when the quotes are not used and usually ranks phrase matches higher. No other proximity searching is directly available. However, using the wildcard word within a phrase trick described below, the unofficial Google API Proximity Search (GAPS) tool can reproduce proximity searching up to a distance of 3 words. Unfortunately, GAPS stopped working in the Spring of 2006.

Truncation:
No truncation is available. Some automatic plural searching and word stemming occurs for English words and can be turned off by using the plus sign in front of each term that should not be stemmed. However, within phrases, there is a trick which can be used for a wildcard word. Use an asterisk * within a phrase search to match any word in that position. So, for example, to find "a little neglect may breed mischief" when you are not sure of the second to last word, search "a little neglect may * mischief". Multiple asterisks can be used as in "a little * * * mischief".  However, Google changed the processing in Aug. 2005 so that a single asterisk will sometimes represent more than just a single word. This is the only way Google supports a wildcard symbol.

While not exactly truncation, the synonym operator of a tilde ~ before a search term, with no space,  to tell Google to look for synonyms. So a search on yosemite ~trails will find pages that have terms like 'hiking,' 'rides,' and 'maps.' This synonym finder will sometimes include plural, singular, or other grammatical variants as well. So the earlier search also found matches with 'trail' and 'trailer.' So the ~ can be used to get something a bit closer to truncation but not very. Bear in mind that the ~ only works in Google's Web database and only for English language terms.

Case Sensitivity:
Google has no case sensitive searching. Using either lower or upper case results in the same hits.

Field Searching:
Google offers several field searches connected with entering URLs. In the December 2000 revision of its advanced search form, it add several title and URL field searches.

Note that some field searching cannot be combined with other query words. In others words, a search entered such as uniqueword link:name.com will only be processed as if only the field search was present as in link:name.com. The uniqueword is ignored. The  intitle: and inurl: fields can be combined with other search terms but allintitle: and allinurl: cannot. 

FieldExplanation
intitle:Finds pages that have the term(s) in the HTML title element. Can be combined with other search terms. intitle:search engines. This should find 'search' in the title and 'engines' anywhere in the page.
inurl:Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). Can be combined with other search terms. inurl:searchenginewatch.
allintitle:Finds pages that have the term(s) in the HTML title element. allintitle:search engines.
link:Finds pages which contain hypertext links to the exact specified URL. link:searchengineshowdown.com/search finds pages with links to this site. Unfortunately, on Google, link: searches are incomplete and do not retrieve all pages that Google has indexes that link to the specified URL
allinurl:Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). allinurl:searchenginewatch.
site:Finds pages from the designated Web site. Path and file names can be included. site:notess.com/write
allinanchor:Finds pages that have the term(s) somewhere in the links to the page. .
related:Invokes GoogleScout to find other pages similar in linkage patterns to the given URL and at a similar hierarchical level. The URL must be exact. In other works related:notess.com and related:www.notess.com find different results.
numrange: Finds a range of numbers. Either 5..11 or numrange:5-11 works. See number searching section below.
pricerange: Finds a range of numbers prefixed by the $ sign. Either $5..11 or pricerange:5-11 works. See number searching section below.
flink:Used to find pages linked from the given URL. No longer working as of Oct. 30, 1999. flink:notess.com

Before the official release in Sept. 1999, clicking the small bar graph  at the beginning of a displayed hit would automatically run a link: search, but that graphic disappeared with the official launch. Another field search which can be used is related:[URL] which invokes GoogleScout to find other pages similar in linkage patterns to the given URL.

Limits:
Google has language, domain, date, filetype, and adult content limits. The date limit, added in July 2001, is only available on the Advanced Search page. Only three options are available: Past 3 Months, Past 6 Months, or Past Year.

The file type limit, added along with the addition of other file types to the Google index, was added to the Advanced Search page in Nov. 2001. The Advanced Search page only offers file type limits under the label of File Formats for PDF, Word (.doc), Excel (.xls), PowerPoint (.ppt), and Rich Text Format (.rtf). Using the filetype: prefix, the file type limit can also be used for PostScript (.ps), Text (.txt), .htm, WordPerfect (.wpd), and other file extensions. To use the prefix command, just put the extension immediately after filetype: as in differentials filetype:ps.

Google introduced the language limit in April 2000 with eleven languages which was expanded as of Aug. 2000 to 24. As of July 2001, Russian was added. In Nov. 2001, Arabic and Turkish and then in early 2002 Catalan, Croatian, Indonesian, Serbian, Slovak, and Slovenian joined the group for the following 34 language limit options. These are available on the Advanced Search page and their Language Tools page.

To choose more than one at a time use the preferences page, which also offers a choice for which of 14 languages the surrounding text will be displayed in.

In May 2000, a family filter was added which tries to exclude adult Web pages. Turn it on from the preferences page or in the Advanced Search. By default, "Moderate Filtering" is turned on which is supposed to "Filter explicit images only." In other words, the Web (or text) search is not filtered, but the image results are. The Strict Filtering option which will "filter both explicit text and explicit images" will turn on the filter for the Web and Images databases (and the Directory) but not Groups, News, and Froogle. Some ads will be filtered. Also, the Strict Filtering option will block all results for certain words. However, none of the filters, even the Strict Filtering, blocks all explicit content.

The Advanced Search offers a domain limit, which can be used to limit results to those from the specified domain or it can be used to exclude results from a specified domain.

Stop Words:
Google searches almost all words except for operators like AND and OR if they are not in a phrase. These can be searched by putting + in front of them. As of March 2000, 'the' was a stop word that could not be searched even with the + sign. But by 2002, 'the' could be searched with the plus. As of Nov. 2001, stop words within a phrase no longer require a + sign and will automatically be searched. Also, if only stop words are entered even without phrase markings, they will be searched. In the early years, you had to be sure to only place the + in front of stop words. If a + wais placed in front of a non-stop word in the same query, all + signs would be ignored.

Sorting:
Results are sorted by relevance which is determined by Google's PageRank analysis, determined by links from other pages with a greater weight given to authoritative sites. Pages are also clustered by site. Only two pages per site will be displayed, with the second indented. Others are available via the [ More results from . . . ] link. If the search finds less than 1,000 results when clustered with two pages per site and if you page forward to the last page, after the last record the following message will appear:

In order to show you the most relevant results, we have omitted some entries very similar to the 63 already displayed. If you like, you can repeat the search with the omitted results included.

Clicking the "repeat the search" option will bring up more pages, some of which are near or exact duplicates of pages already found while others are pages that were clusted under a site listing. However, clicking on that link will not necessarily retrieve all results that have been clustered under a site. You can also just add &filter=0 to the end of a search results URL. To see all results available on Google, you need to check under each site cluster as well as using the "repeat this search" option.

Display:
The display includes the title, URL, a brief extract showing text near the search terms, the file size, and for many hits, a link to a cached copy of the page. This cached copy is from Google's index and may be older than the version currently available on the Web. The cached copy will display highlighted search terms. If more than one search term is used, each has a different color highlighting. The default output is 10 hits per screen, but the searcher can also choose 20, 30, 50, or 100 hits at a time on the preferences page. In June 1999, numeric relevance scores and "phase match" or "partial phrase match" indicators were removed. In Sept. 1999, the graphic relevancy bar with its link to a link: search was removed. At the same time, a GoogleScout link was added. GoogleScout is now just labeled as "Similar pages" and find other pages similar in linkage patterns to the displayed hit. In April 2000, Google started clustering results by site. Formerly, hits from the same site would be listed indented under the first. As of April 2000, only the first two hits are displayed (with the second one indented) and the rest available under a
[ More results from hostname ]
link.

With the addition of non-HTML files in 2001, Google added two notes to the display to identify those files. Before the title in the first line of the display, [PDF] or [PS] or [XLS] is used to denote the different file format. On some, a second line of the display lists
File Format: PDF/Adobe Acrobat - View as Text.

Around Aug. 2001, Google started refreshing the indexing of certain pages (those with daily updates) more frequently than the rest of the database. These were marked with "Fresh!" after the URL and size. In Dec. 2001, this tag was changed to list the indexing date. As of Feb. 2002, 3 million pages were being refreshed on an almost daily basis. Google no longer reports such numbers.

Special Search Features:
Cached Pages: Google was the first general search engine that provides access to pages at the time they were indexed, designated as "cached" pages. For an alternative sources for cached pages see the archives page.

Character Searching: Google is also the only search engine that searches for some characters. As of Sept. 2003, it would search for the ampersand & and the underscore _ characters by themselves or as part of a character string. In other words, a search on adv_search gets different results than "adv search" and &tc differs from tc. While it would not search # or + in most cases, it does differentiate c#, c++, c+, and c. It does not, however, differentiate c*, c+@, or c+-, interpreting c* as c and both c+- and c+@ as c+. (These c+ type strings are all various programming languages.) In March 2004, it started searching for the $  when it precedes a number (see more details on number searching below). Other punctuation marks may change the sorting of results. Tomi Häsä reports that searching for I/O works as does searching for sharped musical pitches as in a#, c#, f#, and g#. The &, + and _ also can be used one or more times in the middle or at the end of a character or a word or between characters and words as in a+, a_, C++, page_count, a&b&c, and i&&.

Number Searching: (New, March 2004) Google handles numbers in some special ways and can search for a range of numbers. When it searches for numbers, it also finds numbers with and without commas. The number range search finds decimal numbers within the range as well. In other words, a search like chennai 565011 finds pages with 565,011 while the number range of  5..11 will match numbers such as 5, 7, 9, 11, and 7.23. Both plain number searches and number range searches can be combined with other terms and can be included in phrase searching. 

Number Range Searching Syntax:

Number Searching Syntax Notes:

 Documentation
Google Help Pages
Google Zeitgeist (search patterns and trends)
Press Releases