October 2003 Archive
Sometime between June and October, Gigablast took away the option to sort by date on the advanced search form. As the only search engine to offer that option, it is a shame to see it disappear. At least Gigablast still reports the date indexed and the last modified date stamp as of the last crawl. Also, the Add URL page is "temporarily disabled."
With Amazon's launch of a searchable databases of the full-text of over 120,000 books, it comes as no surprise that Google is also in talks with publishers to do something similar. Publishers Weekly reports that Google has been in talks with publishers and that Google "has reached agreements that allow it to enter as many as 60,000 titles in its database and also presented extensive mock-ups to publishers of how book-relevant searches will look."
On top of talking to publishers, Google is also working with OCLC to include a subset of OCLC's WorldCat database of library holdings in regular Google results. According to an Information Today NewsBreak, these results could start appearing at Google in November. This is part of the Open WorldCat Project.
How Google will implement either initiative, if at all, will be interesting to see. If none of the other search engines do something similar, then Google will have a unique component to its database with library holdings records and/or full-text search access of books.
Yesterday, Amazon introduced a major new searchable database, called Search Inside the Book. Covering over 120,000 books, the full text is searchable with the pages that match an Amazon book search query also being displayable. Then you can view two pages forward and back of the matching hit.
This is a significant resource for many information needs from finding books in a local library collection that have specific information to finding references and quotations. However, the Author's Guild has raised concerns about this new database possibly infringing on authors' rights.
Also, for the information professional, it takes a bit of work to use this search feature. It is designed to help sell more books and does not have a separate search form. These "inside the book" matches are displayed after title and keyword matches. Using the Amazon Books advanced search can help or just try a search on very specific words.
About.com's owner PRIMEDIA announces that it has entered into a four year agreement with Google to place Google AdWords ads on the About.com meta sites. In addition, part of the deal is that Google is buying About's Sprinks (the current pay per click ad network running ads on the About.com sites). The Google ads are not yet appearing on About, but if it cuts down on the pop-up ads and the very heavy advertising that currently appears on About.com sites, it will be a welcome relief for anyone that tries to view the quality text content on those sites. This deal also shows Google's continued movement into being an advertising network.
A new search engine from Australia, Mooter Search is available in beta, although the site has gone done a few times recently. Mooter divides results into clusters with a diagram. Clicking the "Moot Quicker!" button gives the clusters on the left and search results on the right. The underlying database is not clearly identified but according to Pandia uses a combination of meta search and some of their own crawling.
Google has moved one of its Google Labs projects into the mainstream. The Google Glossary function is now available directly from Google in two ways using "define." Enter a search that starts with "define" and the first Google glossary results shows at the top. For example,
define environmental protection agency. To see all the definitions, use "define:" as in
define:environmental protection agency. For phrases, it makes no difference whether quotations are used or not. This can work well for acronyms, too.
Note that the definitions found come from an automatic pattern recognition program that tries to identify definitions on Web pages. Many of these are inaccurate and some are just plain wrong. Use this for getting a sense of common definitions on the Web, not for a definitive answer, unless you trust the originating Web site.
Further integrating Teoma search features, Ask Jeeves now has an advanced search page. It has all the same features as the Teoma advanced search page except for the ability to get more than 10 results at a time and to open results in a new browser window. However, both of these options are available not at Ask Jeeves via the "Your Settings" preferences page. In addition, an adult content filter is available there which is not available from Teoma. No link to the advanced search page is available from the main Ask Jeeves page at this point, but there is a link on every results page.
Google Alert announces new delivery options. "Results can now be delivered as email, HTML, RSS 1.0, RSS 2.0 or TrackBack feeds." It also now includes direct links to Google's cache.
Gigablast makes another bold move -- indexing, searching, and displaying of all generic meta tags. Previously, some search engines would index words in a meta keyword or meta description tag. Some would use the meta description tag for the summary in the search engine results list. But none of the major search engines ever indexed any other meta tags. Now, Gigablast claims to be indexing all "generic" meta tags.
In addition, Gigablast can display the meta tags in the results list. Doing this requires adding commands to the URL of the results list. At the end of the url, add a
&dt= followed by the word(s) for the meta tags, followed by a colon, and then a number to represent how many characters from each meta tag should be displayed.
So, for example, adding
&dt=keywords+author+generator+description:30, will display the meta tag content for meta keywords, meta author, meta generator, and meta description tags for any records retrieved. Use a + between meta tag words.
It seems that this "generic" meta tag approach excludes more complex meta tags like Dublin Core which use a syntax like
DC.Creator. The dot syntax will not work for the display command, although Gigablast does index some of the content of these tags.
This are great new features, but the display is difficult to get. I hope that the advanced search page soon includes options for displaying various meta tags and that a field search for meta tags will be added as well. Let's hope that unscrupulous Web sites do not abuse these meta tag abilities.
Gary Stock, founder of Google Whacking has posted information about recent strange problems at Google. He has dubbed these GoogleNACK (as in Negative ACKnowledgements) and offers detailed examples. Seth Finkelstein postulates that the malfunction is related to Google's spam defenses. As of today, some of these searches are fixed, but others like
keyboard bracelet and
motorcycle candle fail with "Results 1 - 19 of about 48,600" and "Results 1 - 69 of about 64,000" respectively.
In another unrelated (I assume) peculiarity, a search on Google for pages only on Google's own Web site (using site:www.google.com) and searching for the word "google" finds several results that are on completely different hosts. Reported on Slashdot on Oct. 6, the inconsistent results continue. As of Oct. 11, a search on
site:www.google.com google should only find pages at Google. Yet with the number of hits set to 100, some records come up from adobe.com, digits.com, osdn.com, and even washington.edu.
Google Inconsistencies page has been updated with these problems.
Yesterday, LookSmart announced that their results will no longer show up on MSN Search after Jan. 15, 2004. According to a SearchDay report, MSN will move up Inktomi results (which already display when no matches in LookSmart are found or when you use the advanced search) when LookSmart results are gone. Eventually, Microsoft plans to also replace Inktomi with its own custom-built search product, but that product is not ready yet.
So will LookSmart survive? Today, LookSmart announced that it is getting into the ad bidding engine business with what they call a "bid for placement" program. This will compete with pay-per-click ad bidding programs at Google and Overture. LookSmart continues to have several partners including CNET, Road Runner, InfoSpace, LookSmart.com, Cox Internet, and others, but MSN delivered over half its income from such sources. They also still have the WiseNut search engine, but there has not been much publicly-visible development of WiseNut and it tends to have quite old data.
Google announces that AOL has agreed to continue using both the Google Web database and Google's ads. Called a "multi-year alliance," this renews the AOL deal that started in May 2002 when AOL announced a switch from Overture ads and an Inktomi Web database to Google for both. The Google Web database did not go live on AOL until July 31, 2002, so it has only been a bit over a year since AOL switched to Google.
With the renewal comes several changes to AOL Search:
- Addition of the Google Images database (with "strict" filtering on) and as a separate tab
- New People Search tab for AOL members for search AOL Chat Rooms, Message Boards, Home Pages and Groups
- Local searching capabilities for AOL members
- Popular Searches, in a box listed as "Hot Searches"
The directory continues to be the Open Directory. The Google Web database has the English language limit turned on by default. No advanced search page is available nor any ability to change defaults. Two pages per site are shown, but without the indentation available at Google.
AOL offers no compelling reason for professional searchers to go to their site rather than direct to Google. Of more interest is the AOL Hometown page which has a separate search interface to AOL "Journals" (i.e., blogs) and member home pages. While some of the member home pages show up in the regular AOL Search, Google, and other search engines, the journal pages and some of the member home pages do not. So you may find additional content via AOL Hometown that is not elsewhere searchable.
Now AltaVista, AlltheWeb, Inktomi, and Overture are all owned by Yahoo! The press release quotes CEO Terry Semel, "We are excited to combine the two companies to build the largest position in the rapidly growing Internet advertising market." While the ad market is what pays for the search engines, the real question is which of these search engines will continue and at what sites? For now, AltaVista and AlltheWeb continue to be available at their historic locations, and they may share the same underlying database very soon. Already, AlltheWeb has lost a few search features like the URL Investigator no longer displaying the number for the links, but overall both still work with all of their old search features. Inktomi still is the back-end search engine at MSN Search and remains available at HotBot.
Usually, search engines will replace all punctuation marks with a space when they index Web pages. And if you use a punctuation mark between words in a query, the search becomes a phrase search. In other words, a search on
import-export is the same as
"import export". However, Google has a couple exceptions to this rule for two characters: the ampersand & and the underscore _. Both can be searched by themselves or as part of a character string. In other words, a search on
adv_search gets different results than
"adv search" and
&tc differs from
tc. And for programmers, while it would not search # or + in most cases, it does
c. It does not, however, differentiate
c and both
c+. Other punctuation marks may change the sorting of results. So Google does some different treatment of punctuation marks, and it has changed over time as well.
I've updated the "unique" section of my Google Review. I also updated the site search page by finally removing the defunct Northern Light search box and the defunct xrefer search box. I also updated the Reference Search Tools page.