Gigablast Category Archive
After no activity since 2005, the last several weeks have seen additions to Gigablast's press page. Who? Gigablast is an older search engine that has done some interesting and unique things in the page. It does have its own, unique database. Now it has databases for blog search (last 30 days only) and a Wikipedia search. Also, the main blog search page has a link to the "Most Linked-To News Posts for the Last 24 Hours" along with subsets for several different languages. Gigablast also now claims to be "the leading clean-energy search engine" with 90% of its power coming from wind energy. Strangely, the new "press news" consists of single line statements with no link to a longer story. Still, if you haven't tried Gigablast in a while, it might be time to take a look at yet another Google alternative.
For one of my upcoming columns in Online, I compared the various custom search engines and other tools for building a topical search engine from a subset of a major search engine's database. Tools like Gigablast Custom Topic Search, Google Custom Search Engine, Live Search Macros, Swickis, Rollyo, and Yahoo! Search Builder. I compared a number of features (including the maximum number of sites, whether they support subdirectories, and if they have usage statistics). This information can now be seen on my new Customize Your Own Search Engine page.
Ever heard of HereUAre, which has "Over 10 billion pages indexed?" Try a search and you may recognize the results as coming from Gigablast. So what's the connection? This leads to a rather strange story of a vanished press release that I've been researching on and off for the past month or so. Here's the story.
In trying to update my site awhile back, I came across one page that linked to a June 19, 2006 press release from Gigablast about a database size increase to 10 billion and a new "report as spam" feature. The linked page (beta.gigablast.com/prnew.html) was no longer live. I did find a cached copy of the page, from Sept. 10, 2006 only at MSN Search. (No cached copy were available on Oct. 8 at Google, Yahoo!, Ask, or the Wayback machine.) Fortunately, when I came across, I FURLed the MSN Search cached copy of the page. In checking today, I could not find a cache or link at any of the main search engines. Since FURL saves a copy of the page, I have the text from the press release. I'm glad I did, since I could not find a cached copy of the page at Live or any of the other search engines today when I checked.
To summarize the release, Gigablast now has a database with over 10 bilion pages, and here is where it calls it the "HereUAre search engine." It also mentions a beta (no longer available), "multi-language support, real-time indexing, and improved spam control." One part of the spam control is that at the end of each search result, Gigablast now has a link labeled "[report as spam]." Click that link on to report an entry as spam. The Gigablast site does not have the 10 billion claim on it, although it does continue to have the [report as spam] links. The HereUAre site does have the 10 billion claim and the spam reporting. It also makes it sound as if the search technology is its own, with no mention of Gigablast. I was also surprised that I found no mention of HereUAre, the Gigablast 10 billion, or the spam report at other search engine news sites. So, I'm posting what I've found out, and in the interest of sharing information, is a copy of MSN's cached copy of the press release.
Although the site was available in April, according to some, presumably in experimental form, today the Government Search tab has been linked on the main Gigablast page. The government search covers U.S. state, federal, and military sites but it is broader than just a top level domain limit.
Gigablast adds two new tabs: Travel and Blog Search. These new specialized search offerings are available from the main page and do not just restrict to specific portions of the main Gigablast database. Some unique items may be retrieved in each of these specialized databases that may not show up in the main database. The exact scope of each collection is not yet explained. The scope may seem a bit strange, as when the blog search includes some non-blog pages.
Gigablast announces the addition of related pages results on searches. These appear in a yellow box above the regular results and are supposed to include "highly relevant search results which do not necessarily contain the searchers query terms." The list of related pages can be expanded by clicking on the "more" link in the yellow box. While the initial suggestions for several searches I tried all seem to contain my search terms, expanding the list found others that did not. The exact method used to find these related pages is not specified, but it provides another way to search laterally and to expand a search beyond what other search engines may provide.
From 640 million pages up to 1,014,363,952, Gigablast continues to grow and push at increasing its database size and scope.
Gigablast announces the availability of XML search feeds. This is likely most useful when combined with their Site Search or Custom Topic Search options. For the XML wizards, see the full instructions on how to create and customize these feeds.
More innovation from Gigablast! Create your own mini-Gigablast search box that just searches sites (up to 200) that you specify. The sites can include a path to limit the search to just one directory tree of a specific site while include the full results from other sites. See the Custom Topic Search page for full details.
Gigablast grows yet again, with over 640 million pages claimed as its new size. While it is still significantly smaller than the several billion pages at Google and Yahoo! Search, it is great to see continued growth and activity at this less well-known search engine.
Sometime recently, Gigablast finally made the switch to a default AND operation, joining the default function of all the major search engines. Entering more than one word will automatically search for all the query terms. Gigablast review and the search engine features chart have been updated.
Gigablast announces a
modified logo and the addition of a new slogan, "Information Acceleration." Its database has been updated and is now at about 250 million records. It can also now default to an AND search, like all the major search engines. However, to get it to do that, you have to append
&rat=1 to the end of the URL after a search. While the technique is impractical for most searchers, we can hope that the default AND will become an an option within a preference page and/or an advanced search choice sometime in the future. Hopefully, the advanced search page will also have a section for displaying meta tags in the results set as well.
Joining the other major search engines, Gigablast now has a spell checker which makes suggestions for correct (or just alternate) spellings of unusual query terms. Like most other search engines, these suggestions are displayed as "Did you mean. . .ï¿½"
Matt Wells also offers a fascinating exploration of the spelling suggestions at other search engines in his announcement. Searching 'dooty' at several search engines he found no consistency in suggestions:
- AlltheWeb - booty
- Altavista - dhooti
- Gigablast - door
- Google - doody
- MS Word - Doty
- Teoma - doty
Sometime in the next year, Gigablast also plans to have hardware that can handle a 5 billion document database that can still serve hundreds of queries per second. And the add URL page is back up again.
Sometime between June and October, Gigablast took away the option to sort by date on the advanced search form. As the only search engine to offer that option, it is a shame to see it disappear. At least Gigablast still reports the date indexed and the last modified date stamp as of the last crawl. Also, the Add URL page is "temporarily disabled."
Gigablast makes another bold move -- indexing, searching, and displaying of all generic meta tags. Previously, some search engines would index words in a meta keyword or meta description tag. Some would use the meta description tag for the summary in the search engine results list. But none of the major search engines ever indexed any other meta tags. Now, Gigablast claims to be indexing all "generic" meta tags.
In addition, Gigablast can display the meta tags in the results list. Doing this requires adding commands to the URL of the results list. At the end of the url, add a
&dt= followed by the word(s) for the meta tags, followed by a colon, and then a number to represent how many characters from each meta tag should be displayed.
So, for example, adding
&dt=keywords+author+generator+description:30, will display the meta tag content for meta keywords, meta author, meta generator, and meta description tags for any records retrieved. Use a + between meta tag words.
It seems that this "generic" meta tag approach excludes more complex meta tags like Dublin Core which use a syntax like
DC.Creator. The dot syntax will not work for the display command, although Gigablast does index some of the content of these tags.
This are great new features, but the display is difficult to get. I hope that the advanced search page soon includes options for displaying various meta tags and that a field search for meta tags will be added as well. Let's hope that unscrupulous Web sites do not abuse these meta tag abilities.
Meta tags, originally introduced to the search space by AltaVista as way to get better descriptions of a Web page, were terribly abused by search engine spammers. Only the meta keywords and meta description tags were indexed, and only by some search engines. Now, Gigablast is introducing new meta tags and is daring to try again. This time, they are "geotags," meta tags for identifying the location of a Web site. According to the information page, the following meta tags are now indexed and supported at Gigablast:
The full Boolean capabilities of Gigablast announced on Monday don't always seem to work right. The - is working more accurately than either NOR or AND NOT today. I am hoping it is a momentary glitch since I just updated the Gigablast review, the search feature chart, and the search engines by search feature page.
It may be a small search engine, but Gigablast keeps on innovating. Matt Wells announced today the support for Boolean operators at Gigablast: AND, OR, AND NOT, and OR NOT. It is also supposed to support nesting. It is available from the main page search box. Operators should be in all upper case.
Three days after adding PDFs to Gigablast, Matt Wells announces the addition of PostScript (.ps) , PowerPoint (.ppt), Excel (.xls), and Microsoft Word (.doc) files. Search syntax is similar to the PDFs, using
type: as in
type:pdf for Adobe Acrobat PDFs
type:doc for Microsoft Word documents
type:ppt for PowerPoint presentations
type:xls for Excel spreadsheets
type:ps for PostScript files
type:text for ASCII text files
type:html for HTML Web pages
These all seem to work, except for the PowerPoint one. I could not find any results for that search, but I expect that will be fixed soon. All of these file types also have [cached] copies, which are HTML or ASCII versions of the formatted files. This is a very useful addition. None of these file types or commands are listed on either the advanced search page or the help page.
Matt Wells of Gigablast announces that "Gigablast now indexes PDF documents." To limit a search to PDF files, Gigablast uses a different command than the other search engines:
type:pdf rather than the more standard 'filetype:'.
To exclude PDF files, add
type:text to a search. Matt also says that Gigablast "will support other file types in the future." Gigablast review updated.
But remember, Gigablast defaults to OR, so a search like
nutrition type:pdf is actually looking for any page with 'nutrition' OR and PDF file. The nutrition search finds zero results with both. To force it to work as expected, remember to add the + symbol, as in
The search results display gives a big PDF logo in front of all the PDF files, but most do not include extracts. That makes it hard to determine what the file is about since many PDF file names are not very helpful. On the plus side, Gigablast is the only other search engine other than Google than includes an HTML version of the PDF. Click on the [cached] link after any PDF to see the HTML version used for indexing.
It is great to see this included on Gigablast, especially for the cache availability. But in several quick searches, most of the PDFs are fairly short ones. I found few from .gov sites as well. So the underlying database needs to expand, but this is a great start.
A few minor updates at Gigablast announced today include better keyword highlighting. The default of an OR on multi-word searches remains, unlike all major search engines and hearkening back to search engines of the late 1990s. However, they now put a teal bar at top or search engine results pages where the default OR was used which links to an explanation and states
"The results below may not have all your query terms, but may be relevant. Try generalizing your query. [Info]"
Sorry, but I think Gigablast just needs to default to AND like most people expect and the major search engines all do. I find the default OR frustrating enough that I will skip a try at Gigablast just for that reason sometimes. Also, Gigablast announced that "When returning a page of search results Gigablast lets you know how long ago that page was cached by displaying a small message at the bottom of that page." However, you only see that if someone else has done that same search recently. These are different from the dates at the top of the cached page, and Gigablast still does a far better job than any other search engine at honestly stating when they crawled a Web page and the date reported at that time.
Matt Wells, creator of Gigablast, announces the release of Gigablast 2.0 which is supposed to double the speed of query responses, increase the importance of phrase matching in the relevance ranking, and started a full update to the database.
Gigablast has moved beyond being just a one person operation. They now have a management team page with a new Chief Management Officer and Chief Scientist. I hope this means they have some funding and can expand the Gigablast database to include more pages and to refresh them much more frequently.
New search engine Gigablast is starting to expand. Yesterday they launched a Swedish/Scandinavian version at gigablast.nu. It looks like the same database with the addition of a Swedish language limit. The design of the site is much nicer than gigablast.com but the advanced search does not have all the same options. They are accepting advertising which may well help to support continued development of Gigablast. More information is available on their About page.
Gigablast may be around for awhile if it can make a go at offering site search to paying customers. The announcement appeared on their site today stating that "for a teeny fraction of the other guys' prices you can have an account on Gigablast.com that can support millions of web pages." Of course, the monthly price for a million page site still costs US$2,500.
GigaBlast launched in beta today. While much smaller than the recently launched Openfind, it offers some nice advantages. It includes cached copies of the pages it indexes, like Google. It includes an advanced search, date sorting, field searching, and excellent reporting of both the date spidered and the last modified date. It does lack full Boolean, truncation, and other advanced search features. See the Search Engine Showdown review for more on its search features.