Review of Gigablast
Last updated Mar. 31, 2008.
by Greg R.
See News & Blog Posts
Debuting in beta form July 21, 2002 and in pre-beta in March 2002, this newer search engine has some very nice features. It lacks some advanced search features, but it does include cached pages, good date reporting, and as of Aug. 2003, PDF, Word, Powerpoint, PostScript, and Excel files. In Oct. 2003, Gigablast added indexing of all generic meta tags. A new version was launched in early 2008 with Freshness dating limits. Use the table of contents on the left to navigate this review.
Databases:
Gigablast has one single database of indexed Web pages including PDF, Microsoft
Word, Powerpoint, PostScript, and Excel files. Other databases available
include:
- Images (from PicSearch)
- Video
- Directory (from the Open Directory)
Several meta search engines use Gigablast results along with other sites like HereUAre.
Strengths:
* Date reporting (including date indexed and date last modified)
* Cached pages and links to the Wayback Machine
* Includes PDF and other file types and cached HTML versions of these
other file types
* No advertising
Weaknesses:
* Small operation with a slightly smaller database than others which
is not always refreshed as frequently
* Lacks truncation, proximity, and other advanced search features.
Default Operation:
Multiple search terms are processed as an AND operation by default. So adding
more terms should get fewer results. The relevance ranking algorithm generally
puts phrase matches highest. At the end of a results list, Gigablast will offer
a "No more results found. Show relevant partial matches for your query." which
will run an OR operation.
Boolean Searching:
Full Boolean and implied Boolean searching is available. The - can be placed directly in front of a term to exclude it like a Boolean NOT.
As of Sept. 2003, the Boolean
operators (which must be in uppercase) AND, OR, and AND NOT can be used and can
be nested using parentheses.
The
advanced search
has separate lines for ALL, ANY, and NONE. One unique advanced technique
available is that by putting a space, two periods, and another space between terms
should exclude exact phrase matches.
Proximity Searching:
Phrase searching is available by using "double quotes" around a phrase or entering terms into the "this exact phrase" box on the
advanced search form, which actually has two such boxes such that two phrases can be specified. No further proximity searching is available.
Truncation:
No truncation is currently available.
Case Sensitivity:
Searches are not case sensitive. Search terms entered in lowercase, uppercase, or mixed case all get the same number of hits.
Field Searching:
| Field | Explanation |
|---|---|
| ip: | Page is the specified IP range. Incomplete numbers are truncated. ip:216.32.120 finds any computer in 216.32.120.* |
| link: | Pages include a link to the specified URL. link:searchengineshowdown.com finds pages with links to this site. |
| site: | Results are only from the specified site. site:nasa.gov finds pages at NASA's Web site |
| suburl: | Pages have the term(s) somewhere in the URL (host name, path, or filename). suburl:searchenginewatch |
| title: | Hits have the term(s) in the HTML title element. title:"search engines" |
| type: |
File type. Options as of March 2008 are
|
| url: | Result must be exactly this URL and nothing else. url:www.slashdot.com/index.html |
Limits:
Gigablast has language, domain, "freshness dating," file type, and family filter
limits. Most are available by using the field searches above or
the advanced search page. For a
domain limit, use a site: field search or the "Restrict to these Sites"
box in the advanced search.
A file type limit is only available by using the type: field search.
The Freshness Dating limit is available on both the main page and the advanced search page. This limit tries to detect a more accurate publication date for a Web page. Gigablast describes it as follows: "the date that a search engine last saw a particular page is VERY different than the date that the page was actually published to the web, or modified in some substantial form. Gigablast's patent-pending "publication date detection" algorithms estimate the date that a particular page was first published or most recently edited or modified. The algorithms ignore webpage clocks or date counters." The limit options include the last day, week, month, or year along with a "custom" option which goes to the advanced search form where the searcher can specify a starting and ending date.
The family filter limit on the advanced search is a simple check box and is not on by default for Web searches but is on by default for image searching.
Language limits on the advanced search page include the following languages:
- Arabic
- Bengali
- Chinese (China)
- Chinese (Taiwan)
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hebrew
- Hindi
- Indonesian
- Italian
- Japanese
- Korean
- Norwegian
- Polish
- Portuguese
- Russian
- Spanish
- Swedish
- Thai
- Vietnamese
Stop Words:
Gigablast does ignore very common words such as 'the', 'of', 'and', 'it,'
'is,' and 'or'. However, they will be searched if preceded by a + or if they are
included within a phrase search. Originally they had no stop words.
Sorting:
By defaults, results are sorted in order of relevance score. The advanced search also
used to have an option for date sorting, but that was gone by Oct. 2003. Site clustering is turned on by default, so only one page per site is displayed. This can be turned off in the advanced search.
Display:
Gigablast displays the title, a 1-2 line keyword-in-context extract, the URL, file size, the
"publication" date of the page, a link to a cached copy of the page as it
looked when Gigablast crawled it, and a link to older copies of the page at the
WayBack machine. Ten results at a time are displayed, although the advanced search gives an option for up to 50. .
Unique:
Meta Tag Searching and Display: [In 2008, this changed so that the display
is only available within an XML feed. See the
help file
for details.] Gigablast is the only search engine
indexing meta tags beyond just the meta description that some
others index. It is the only search engine that can also display meta tags in
the results list. Gigablast claims to be indexing all "generic" meta tags. In
addition, it can display the meta tags in the results list. Doing this requires
adding commands to the URL of the results list. At the end of the url, add a
&dt= followed by the word(s) for the meta tags, followed by a
colon, and then a number to represent how many characters from each meta tag
should be displayed. So, for
example, adding &dt=keywords+author+generator+description:30,
will display the meta tag content for meta keywords, meta author, meta
generator, and meta description tags for any records retrieved. Use a + between
meta tag words. It seems that this "generic" meta tag approach excludes more
complex meta tags like Dublin Core which
use a syntax like DC.Creator. The dot syntax will not work for the
display command, although Gigablast does index some of the content of these
tags.
Date Display: It also is the only search engine that clearly displays the date it crawled a Web page and the date reported by the Web page at the time it was crawled. And, a unique feature: by putting two spaces between terms, phrase matches are excluded.
