Oct. 10, 2006.
by Greg R. Notess
Google has become for many the pre-eminent Web search engine. In Feb. 1999 it moved from Alpha test version to Beta and officially launched Sept. 21, 1999.
See News & Blog Posts
Google Database Components 3/02
Google's Unindexed URLs 3/02
Boolean Searching on Google 11/00
Excite vs. Google: Contradictory Directions 6/00
Since that time it has made its mark with its relevance ranking based on link analysis, cached pages, and aggressive growth. Since its beta release, it has had phrase searching and the - for NOT, but it did not add an OR operation until Oct. 2000. In Dec. 2000, it added title searching. In June 2000 it announced a database of over 560 million pages, which grew to over 600 million by the end of 2000 and then 1.5 billion in Dec. 2001. The 2+ billion reported on their home page as of April 2002 includes indexed pages, unindexed URLs, and other file formats. By Nov. 2002, they moved their claim up to 3 billion, and in Feb. 2004 it went to 4 billion. While no official claim is given, 20+ billion is once current estimate. Use the table of contents on the left to navigate this review.
- Web: Indexed Web pages (also includes URLs that it has not fully indexed) and additional file types in the Web database include PDF, .ps, .doc, .xls, .txt, .ppt, .rtf, .asp, .wpd, and more. See Google Database Components for more details.
- Ads: Paid advertisements usually shown on the right side (or top) under a "Sponsored Links" heading
- Images: Picture database
- Groups: Usenet news database
- News: Past 30 days of Web-based news sites
- Book Search: Full text books with only limited viewing of in-copyright books
- Google Scholar: Academic papers, articles, reports, and citations
- Directory: A version of the Open Directory with entries ranked in Google's PageRank order
- Froogle: Shopping and product search
- Catalog Search: Scanned product catalogs
Google also has many other databases and services. Above their regular Web results, hits may be displayed from these other databases in what Google calls OneBox results. In addition it offers several specialized subsets: a government database of the .gov and .mil sites; University searches; a Linux search; an Apple/Macintosh search; and a Microsoft search.
The Google database is used by AOL and several other sites. Yahoo! dropped Google in Feb. 2004 in favor of its own database, after it had switching from Inktomi to Google in July 2000 and then reaffirmed and more closely integrated Google results in Oct. 2002.
* Size and scope: It is one of the largest, and includes PDF, DOC, PS, and many other file types
* Relevance based on sites' linkages and authority
* Cached archive of Web pages as they looked when they were indexed
* Additional databases: Google Groups, News, Directory, Books, Scholar, etc. (see above).
Weaknesses: See also the Google Inconsistencies Page
* Limited search features: no nesting, no truncation, does not support full Boolean
* Link searches must be exact and are incomplete
* Only indexes first 101 KB of a Web page and about 120 KB of PDFs
* May search for plural/singular, synonyms, and grammatical variants without telling you
* Not as comprehensive as legend has it
Multiple search terms are processed as an AND operation by default. Phrase matches are ranked higher
Google uses an automatic Boolean AND between terms and has slowly been moving towards more Boolean support; however, it does not yet support the AND operator, NOT operator, or full Boolean searching with the ability to nest operators. In Feb. 1999, Google added the - symbol to perform a NOT function. In Oct. 2000, they added the ability to use an OR (which must be in upper case) to do some Boolean OR operations. See the Boolean Searching on Google page for more details on how to get Google to do certain kinds of Boolean searches.
The + used to be able to be used to require a term, but since the default operation was AND, the + was never really needed and for a while caused the following message to appear:
Google always searches for pages containing all the words in your query, so you do not need to use + in front of words.
However, the + can be used for forcing a search on stop words and for requiring Google to search for only that exact term without any possible plural/singular, synonyms, and grammatical variants.
In Feb. 1999, Google added phrase searching designated in the usual manner by enclosing the phrase in "double quotes." Google also detects phrase matches even when the quotes are not used and usually ranks phrase matches higher. No other proximity searching is directly available. However, using the wildcard word within a phrase trick described below, the unofficial Google API Proximity Search (GAPS) tool can reproduce proximity searching up to a distance of 3 words. Unfortunately, GAPS stopped working in the Spring of 2006.
No truncation is available. Some automatic plural searching and word stemming occurs for English words and can be turned off by using the plus sign in front of each term that should not be stemmed. However, within phrases, there is a trick which can be used for a wildcard word. Use an asterisk * within a phrase search to match any word in that position. So, for example, to find "a little neglect may breed mischief" when you are not sure of the second to last word, search
"a little neglect may * mischief".
Multiple asterisks can be used as in
"a little * * * mischief".
However, Google changed the processing in Aug. 2005
so that a single asterisk will sometimes represent more than just a single word. This is the only way Google supports a wildcard symbol.
While not exactly truncation, the synonym operator of a tilde ~ before a search
term, with no space, to tell Google to look for synonyms. So a search on
yosemite ~trails will find pages that have terms like 'hiking,'
'rides,' and 'maps.' This synonym finder will sometimes include plural,
singular, or other grammatical variants as well. So the earlier search also
found matches with 'trail' and 'trailer.' So the ~ can be used to get something
a bit closer to truncation but not very. Bear in mind that the ~ only works in
Google's Web database and only for English language terms.
Google has no case sensitive searching. Using either lower or upper case results in the same hits.
Note that some field searching cannot be combined with other query words. In others words, a search entered such as
uniqueword link:name.com will only be processed as if only the field search was present as in
link:name.com. The uniqueword
is ignored. The
inurl: fields can be combined with other search terms
|intitle:||Finds pages that have the term(s) in the HTML title element. Can be combined with other search terms. intitle:search engines. This should find 'search' in the title and 'engines' anywhere in the page.|
|inurl:||Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). Can be combined with other search terms. inurl:searchenginewatch.|
|allintitle:||Finds pages that have the term(s) in the HTML title element. allintitle:search engines.|
|link:||Finds pages which contain hypertext links to the exact specified URL. link:searchengineshowdown.com/search finds pages with links to this site. Unfortunately, on Google, link: searches are incomplete and do not retrieve all pages that Google has indexes that link to the specified URL|
|allinurl:||Finds pages that have the term(s) somewhere in the URL (host name, path, or filename). allinurl:searchenginewatch.|
|site:||Finds pages from the designated Web site. Path and file names can be included. site:notess.com/write|
|allinanchor:||Finds pages that have the term(s) somewhere in the links to the page. .|
|related:||Invokes GoogleScout to find other pages similar in linkage patterns to the given URL and at a similar hierarchical level. The URL must be exact. In other works related:notess.com and related:www.notess.com find different results.|
|numrange:||Finds a range of numbers. Either 5..11 or numrange:5-11 works. See number searching section below.|
|pricerange:||Finds a range of numbers prefixed by the $ sign. Either $5..11 or pricerange:5-11 works. See number searching section below.|
|flink:||Used to find pages linked from the given URL. No longer working as of Oct. 30, 1999. flink:notess.com|
Before the official release in Sept. 1999, clicking the small bar graph at the beginning of a displayed hit would automatically run a
link: search, but that graphic disappeared with the official launch. Another field search which can be used is
related:[URL] which invokes GoogleScout to find other pages similar in linkage patterns to the given URL.
Google has language, domain, date, filetype, and adult content limits. The date limit, added in July 2001, is only available on the Advanced Search page. Only three options are available: Past 3 Months, Past 6 Months, or Past Year.
The file type limit, added along with the addition of other file types to the Google index, was added to the Advanced Search page in Nov. 2001. The Advanced Search page only offers file type limits under the label of File Formats for PDF, Word (.doc), Excel (.xls), PowerPoint (.ppt), and Rich Text Format (.rtf). Using the
filetype: prefix, the file type limit can also be used for PostScript (.ps), Text (.txt), .htm, WordPerfect (.wpd), and other file extensions. To use the prefix command, just put the extension immediately after
filetype: as in
Google introduced the language limit in April 2000 with eleven languages which was expanded as of Aug. 2000 to 24. As of July 2001, Russian was added. In Nov. 2001, Arabic and Turkish and then in early 2002 Catalan, Croatian, Indonesian, Serbian, Slovak, and Slovenian joined the group for the following 34 language limit options. These are available on the Advanced Search page and their Language Tools page.
- Chinese (Simplified & Traditional)
To choose more than one at a time use the preferences page, which also offers a choice for which of 14 languages the surrounding text will be displayed in.In May 2000, a family filter was added which tries to exclude adult Web pages. Turn it on from the preferences page or in the Advanced Search. By default, "Moderate Filtering" is turned on which is supposed to "Filter explicit images only." In other words, the Web (or text) search is not filtered, but the image results are. The Strict Filtering option which will "filter both explicit text and explicit images" will turn on the filter for the Web and Images databases (and the Directory) but not Groups, News, and Froogle. Some ads will be filtered. Also, the Strict Filtering option will block all results for certain words. However, none of the filters, even the Strict Filtering, blocks all explicit content.
The Advanced Search offers a domain limit, which can be used to limit results to those from the specified domain or it can be used to exclude results from a specified domain.
Google searches almost all words except for operators like AND and OR if they are not in a phrase. These can be searched by putting + in front of them. As of March 2000, 'the' was a stop word that could not be searched even with the + sign. But by 2002, 'the' could be searched with the plus. As of Nov. 2001, stop words within a phrase no longer require a + sign and will automatically be searched. Also, if only stop words are entered even without phrase markings, they will be searched. In the early years, you had to be sure to only place the + in front of stop words. If a + wais placed in front of a non-stop word in the same query, all + signs would be ignored.
Results are sorted by relevance which is determined by Google's PageRank analysis, determined by links from other pages with a greater weight given to authoritative sites. Pages are also clustered by site. Only two pages per site will be displayed, with the second indented. Others are available via the [ More results from . . . ] link. If the search finds less than 1,000 results when clustered with two pages per site and if you page forward to the last page, after the last record the following message will appear:
In order to show you the most relevant results, we have omitted some entries very similar to the 63 already displayed. If you like, you can repeat the search with the omitted results included.
Clicking the "repeat the search" option will bring up more pages, some of which are near or exact duplicates of pages already found while others are pages that were clusted under a site listing. However, clicking on that link will not necessarily retrieve all results that have been clustered under a site.
You can also just add
&filter=0 to the end of a search results URL. To see all results available on Google, you need to check under each site cluster as well as using the "repeat this search" option.
The display includes the title, URL, a brief extract showing text near the search terms, the file size, and for many hits, a link to a cached copy of the page. This cached copy is from Google's index and may be older than the version currently available on the Web. The cached copy will display highlighted search terms. If more than one search term is used, each has a different color highlighting. The default output is 10 hits per screen, but the searcher can also choose 20, 30, 50, or 100 hits at a time on the preferences page. In June 1999, numeric relevance scores and "phase match" or "partial phrase match" indicators were removed. In Sept. 1999, the graphic relevancy bar with its link to a
link: search was removed. At the same time, a GoogleScout link was added. GoogleScout is now just labeled as "Similar pages" and find other pages similar in linkage patterns to the displayed hit. In April 2000, Google started clustering results by site. Formerly, hits from the same site would be listed indented under the first. As of April 2000, only the first two hits are displayed (with the second one indented) and the rest available under a
[ More results from hostname ]
With the addition of non-HTML files in 2001, Google added two notes to the display to identify those files. Before the title in the first line of the display, [PDF] or [PS] or [XLS] is used to denote the different file format. On some, a second line of the display lists
File Format: PDF/Adobe Acrobat - View as Text.
Around Aug. 2001, Google started refreshing the indexing of certain pages (those with daily updates) more frequently than the rest of the database. These were marked with "Fresh!" after the URL and size. In Dec. 2001, this tag was changed to list the indexing date. As of Feb. 2002, 3 million pages were being refreshed on an almost daily basis. Google no longer reports such numbers.
Special Search Features:
Cached Pages: Google was the first general search engine that provides access to pages at the time they were indexed, designated as "cached" pages. For an alternative sources for cached pages see the archives page.
Character Searching: Google is also the only search engine that searches for some characters. As of Sept. 2003, it would search for the ampersand
and the underscore
_ characters by themselves or as part of a
character string. In other words, a search on
different results than
"adv search" and
tc. While it would not search # or + in most cases, it does
It does not, however, differentiate
c and both
c+. (These c+ type strings are all various programming languages.)
In March 2004, it started searching for the
$ when it
precedes a number (see more details on number searching below). Other punctuation marks may change the sorting of results.
Tomi Häsä reports that searching for I/O works as does searching for sharped
musical pitches as in a#, c#, f#, and g#. The &, + and _ also can be used one or
more times in the middle or at the end of a character or a word or between
characters and words as in a+, a_, C++, page_count, a&b&c, and i&&.
Number Searching: (New,
March 2004) Google handles numbers in some special ways and can search for a
range of numbers. When it searches for numbers, it also finds numbers with and
without commas. The number range search finds decimal numbers within the range
as well. In other words, a search like
chennai 565011 finds
pages with 565,011 while the number range of
5..11 will match
numbers such as 5, 7, 9, 11, and 7.23. Both plain number searches and number
range searches can be combined with other terms and can be included in phrase
Number Range Searching Syntax:
- Smaller number, two periods, larger number, as in
- Use the prefix of numrange, a colon, and then the smaller number, a dash,
- For open ended ranges, just leave off one of the numbers. For example, to
search for all numbers equal to or larger than 534, use either
numrange:534-. To find only numbers smaller than 16, use either
- A variant of the number search is the price range search. Currently, it
recognize the dollar ($) sign when it is placed immediately in front of the
number with no space, but it does not yet recognize the pound (£), Yen (¥), or
Euro (€) characters yet. Either use the $ sign with the .. syntax as in
$5..11(the second number could also have the $ sign, but it is not required while the first number must have it to work), or use
pricerange:5-11without the $ sign.
Number Searching Syntax Notes:
- Be sure to put the smaller number first or the range operation won't work
- Must be positive numbers (although you may find negatives, the - sign is not searched and is interpreted as a NOT operator)
- Numbers and number ranges can be used within a phrase search
- A plain number search also will match a number with a comma. In other
2001finds pages with 2001 but not pages with 2,001 while
- A number range search, like most other Google searches, will also find pages that do not contain the number but are linked from other pages that contain the number in the linking anchor text. Check the cached copy of the page for a header saying "These terms only appear in links pointing to this page."