August 2003 Archive
Well, it looks like it took AlltheWeb announcing a larger size than Google to get Google to finally update its claim to 3.3 billion. Since And I don't know if this is related or not, but I have suddenly found a few hits that Google labels as "Supplemental Result" right before the cached link as in the last record on this search. I'm not sure what this "Supplemental Result" is supposed to be, but the URL is a dead link. I certainly hope Google did not just boost its numbers by adding a bunch of dead links. I rather doubt this is the case, and another such search found a record with the same label, but that one is not a dead link.
Wait . . . a little more searching turns up an answer on Google's How to Interpret your Search Results help page where it says that
"Google augments results for difficult queries by searching a supplemental collection of more web pages. Results from this index are marked in green as "Supplemental."I am assuming that this is new, in part because doing a site search at Google for "supplemental" results in zero hits, even though they have a help page with that answer. But I'd like to see more description somewhere of what this supplemental collection consists of.
Overture announces that it's AlltheWeb search engine now has a database of "approximately 3.2 billion." The AlltheWeb home page says 3,151,743,117 is the current number, although it is actually a bit higher. That trumps Google's claim, up since last November of 3,083,324,652. Of course, Google probably has more than that by now, and I expect they'll boost that number on their home page soon. The exact number is less important than recognizing the AlltheWeb has been able to get their database to about the same size as Google.
It should be a benefit to searchers to have two very large databases. And it does seem to show a commitment on Overture's part to continual improvement of the AlltheWeb underlying database. I have not yet run detailed comparisons, but I expect that both Google and AlltheWeb will find some pages that the other does not have.
This season seems to be the summer of toolbars. HotBot, Infospace, and Ask Jeeves all launch toolbars. Lycos launches its Sidesearch. Google upgrades its toolbar. So now its AltaVista's turn, and the AltaVista toolbar has some nice features. It does require IE 5+ and Windows 95+. It installs like Google's within the IE browser under the other bars. It has the popular AltaVista translation option right on the toolbar and is highly customizable. It has site search and direct access to AltaVista's Web, news, images, audio, and video databases. It also has dictionary, calculator, time, conversion, weather, and other popular information options. It even includes a pop-up blocker, like Google's Toolbar 2.0.
Three days after adding PDFs to Gigablast, Matt Wells announces the addition of PostScript (.ps) , PowerPoint (.ppt), Excel (.xls), and Microsoft Word (.doc) files. Search syntax is similar to the PDFs, using
type: as in
type:pdf for Adobe Acrobat PDFs
type:doc for Microsoft Word documents
type:ppt for PowerPoint presentations
type:xls for Excel spreadsheets
type:ps for PostScript files
type:text for ASCII text files
type:html for HTML Web pages
These all seem to work, except for the PowerPoint one. I could not find any results for that search, but I expect that will be fixed soon. All of these file types also have [cached] copies, which are HTML or ASCII versions of the formatted files. This is a very useful addition. None of these file types or commands are listed on either the advanced search page or the help page.
Back in May, Google's intitle: and inurl: were not working properly, as I posted earlier. Well, they now seem to be working again. A search that combines a general query term with these field searches, like "market research" intitle:tourism, now work. I've updated my Google Inconsistencies page to note that problem has been fixed, but I added another report of a strange result for the simple query of 'cameras.'
LookSmart results are now on Lycos, as announced in July. Trying to figure out which results are from which database at Lycos is now rather confusing. It appears that ads come from a combination of Overture and Lycos' own AdBuyer. The first 2-4 numbered "sponsored link" listings are from Overture. The rest of the first six numbered listings are from Lycos' own AdBuyer program. On the right margin in colored boxes are four more AdBuyer ads.
The Web Results section may contain hits from three different databases. For common queries, the first listing is a collection of Lycos Network content, which may contain links to special partners (advertisers). Then, for about 50,000 keywords, there are ten listings from LookSmart. These are called "commercial listings" and are basically ads. Then after that come the results from the FAST Web search database.
Also, the advanced search seems to be broken today. I could not get any results from it. When that is fixed, it may go directly to the FAST results without all the other links.
The "Fast Forward" function has been renamed "Sidesearch," and Lycos has released "Sidesearch" as a separate "Search Comparison Tool." Like the toolbars all the other search engines are now offering, Sidesearch installs in IE 5.01 or higher on computers running Windows 98 or higher. What it does is run a Lycos search in the left sidebar when you search at another search engine like Google. It's an interesting way to try to move users to Lycos, but it has to be installed first.
The Lycos review has been updated.
Matt Wells of Gigablast announces that "Gigablast now indexes PDF documents." To limit a search to PDF files, Gigablast uses a different command than the other search engines:
type:pdf rather than the more standard 'filetype:'.
To exclude PDF files, add
type:text to a search. Matt also says that Gigablast "will support other file types in the future." Gigablast review updated.
But remember, Gigablast defaults to OR, so a search like
nutrition type:pdf is actually looking for any page with 'nutrition' OR and PDF file. The nutrition search finds zero results with both. To force it to work as expected, remember to add the + symbol, as in
The search results display gives a big PDF logo in front of all the PDF files, but most do not include extracts. That makes it hard to determine what the file is about since many PDF file names are not very helpful. On the plus side, Gigablast is the only other search engine other than Google than includes an HTML version of the PDF. Click on the [cached] link after any PDF to see the HTML version used for indexing.
It is great to see this included on Gigablast, especially for the cache availability. But in several quick searches, most of the PDFs are fairly short ones. I found few from .gov sites as well. So the underlying database needs to expand, but this is a great start.
Following in the footsteps of AlltheWeb, Google now has a built-in a calculator function. It lets you use numbers or the word for the number for mathematical equations, unit conversions, and physical constants. Only a bit of a description of all the functions are available on the calculator section of the help page. One "Easter Egg" in the calculator comes up when searching answer to life the universe and everything where it displays '42,' the answer from Douglas Adams' The Hitchhiker's Guide to the Galaxy.
Google has added an alert service for its news databases. The Google News Alerts is in beta and is also listed on the Google Labs page. With the demise of other free alert services, especially Northern Light's current news alerts, this is a great addition for anyone who wants to keep up with the latest news. Just be careful not to choose search terms that will return too many hits. The default "once a day" option should help if you do, but be careful with the "as it happens" choice.
Google has introduced a new operator, the tilde ~, for searching for synonyms. It should be placed immediately before a search term, with no space, for which you want Google to look for synonyms. For example, a search on query ~analysis finds matches with query statistics and query analyzer. A brief entry about ~ is available on their help page.
Using some of the technology behind the Google Sets, the ~ seems to include plural and singular forms as well as synonyms. Use the - operator to get a sense of what synonyms have been searched, as in ~hiking -hiking. Some of the automatically generated terms may not be helpful, but when you are not aware of the vocabulary in a field, this could be quite helpful.
Google review updated. I put the synonym operator under the Truncation section, since that is at least one use that can be made of it.