September 2002 Archive
The Google dance appears to have begun yesterday and there is much weeping and gnashing of teeth in the optimization community. The Webmaster World forum thread discussing the update already has over 430 posts since it started yesterday morning. What is the Google dance? It occurs when Google is launching a new database, first on www2.google.com or www3.google.com and then eventually on the main site. I can take several days until the whole new dance to finish. So right now on the main www.google.com, the bulk of their database appears to have come from an early August crawl. The database on www2 is from a late August early September crawl. So searchers take note. Try www2 for the more current records, but expect changes over the next few days from what you got at Google earlier this week.
So why all the frantic discussion in the forums. It seems that Google may have made a more significant change than usual to their relevance ranking algorithms. According to a related Webmaster World thread the changes have moved Microsoft out of the top spot for a phrase search on "go to hell" and perhaps has increased the importance of anchor text from the Open Directory. Again, the point for searchers is that the results will likely change compared to what you have seen. Whether better or worse relevance ranking remains to be seen, but it will probably depend greatly on the search terms.
Gigablast may be around for awhile if it can make a go at offering site search to paying customers. The announcement appeared on their site today stating that "for a teeny fraction of the other guys' prices you can have an account on Gigablast.com that can support millions of web pages." Of course, the monthly price for a million page site still costs US$2,500.
I've made minor updates to several reviews including HotBot, Lycos, MSN, Fast Search, and Inktomi. Changes include updates on which Inktomi features work at HotBot and MSN, and a note on how to get the MSN advanced search to work without a search term. (Thanks, Gary, for that tip.) I've also added the Postion Tech Inktomi search to the Inktomi review.
Daypop, the recent news and Weblog search engine appears to be back up after being down for several weeks. The front page still says that it is out of disk space, but it is working again. The Top 40 and Top News are not yet functioning, but the search engine is. For more on Daypop and blogging, see my article in the latest issue of ONLINE: "The Blog Realm: News Sources, Searching with Daypop, and Content Management." ONLINE 26(5): 70-72, Sept.-Oct. 2002. And
While Google gets lots of press for its relaunch of its news search, AlltheWeb has been busy the past few weeks. As of yesterday, they have started rolling out a dynamic keyword-in-context (KWIC) extract in the results list. Available at Lycos for awhile now, this feature is finally making its way into the AlltheWeb results. This is the kind of display Google usually provides where the extract contains the actual search terms along with some of the surrounding text.
In addition, AlltheWeb has a new field search of
site: which is more precise and easier to remember than their older
url.domain:. I've updated the AlltheWeb Review and the example in the Fields section notes that
site:www.total.com finds different results than
Give it a try, but expect that there may be changes to the way it works over the next few weeks.
The Google News has greatly expanded its number of news sources (to "approximately 4,000") and the depth of its archive. It also has a newly redesigned look and has finally added the News Tab on the main page and on search results pages. According to the About page
"Google News continuously crawls more than 4,000 news sources from around the world. This number will continue to grow as we develop the service further" and it now "includes articles that appeared within the past 30 days." There is still no advanced search, although Tara points out that adding
&num=100 to the end of a results URL will give 100 results at a time. Even easier, just change your regular Google preferences to default to 100, and you don't even need to add the special code.
I can't say I'm impressed with the "Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention" boast or the lack of a list of those 4,000 sources. However, the results are certainly much broader than what was offered before.
Whether it is a temporary glitch or a permanent change, Yahoo! is not giving "Research Documents" as another category in their search result today. Previously, the "Research Documents" link show up on a search results page after "Web Pages" and "News." The link was to full-text articles from divine, Inc.'s Northern Light Special Collections. A Yahoo! Help file still describes them, but the link is gone today.
I just noticed today that WiseNut no longer displays a number in the upper left corner. Formerly, WiseNut posted " 1,571,413,207 Web pages and counting!" there. Now that they have finally launched a fresher database (as I posted earlier) apparently they are either no longer counting, or more likely, it is a smaller overal database. Strangely enough, the old number is still up on their corporate contact page.
I am finally starting the slow process of updating the site design for Search Engine Showdown. At this point, part of the site has been converted, and over the next few weeks, I hope to finish the rest. In addition to the redesign, news and updates from the past few months are finally being posted. See in particular the new reviews for Gigablast and Openfind, an updated search engine features chart and the search engines by features page. There is a new news archive which includes subject access to news postings (at least for those since about May 2002).
I know there is lot more to be done, but if you have any opinions about the new design or other comments about the site, email me at email@example.com. Oh, and if you want to link to a particular news post, use the [link to this story] link at the end for a story-specific URL.
At some unknown and unannounced point in the past month, WiseNut finally refreshed its database. For most of the past year, WiseNut had almost no new content from any later than July 2001. By this past July, it was a year-old database. Now, it has launched a new database (even though it still claims the same 1.5 billion pages) which appears to be primarily from May 2002. So while it is still not very up-to-date, it is much fresher than it used to be.
Gary Price points out that some changes are going on at the Google News search. Search engines like to experiment by giving one out of say a thousand queries the experimental interface or results and then gauging their reactions. That makes it hard for the rest of us to see the details of the experiment unless someone grabs a quick screen shot. Just earlier this week on Yahoo! I noticed that the "Web Pages" link was not highlighted unless you clicked on other of the other links first. And the Powered by Google had moved way down to the bottom. Was this the beginning of a change to another search engine or an attempt to lessen the amount they pay to Google? Or what it just Yahoo! experimenting with some different approach. Time may or may not tell.
Interesting article on FAST from the unrelated but similarly-named Fast Company magazine.
AlltheWeb has extended the deadline for the AlltheWeb Alchemist Contest to Oct. 31, 2002.
I have been disappointed for awhile that the Wayback Machine from the Internet Archive has had no new pages included for most of this year. In today's TVC News, Gary Price reports that he got a reply from them saying that "they are about six to eight months behind in adding data to the archive. But they expect to make pages from the first half of 2002 available during the next four weeks."
People posting in the WebMaster World forums report that both HotBot UK and HotBot Germany have abandoned Inktomi and moved to Fast. Considering Lycos' stake in FAST it is surprising this did not happen sooner, but it has not yet changed at the U.S. HotBot.com. Unfortunately, the new underlying data at these HotBots has not changed some problems such as putting a 'see results from this site only' link even when there is only one hit from that site.
AltaVista issued a press release about their search engine being blocked in China. It includes several ways for Chinese users to get around the block by going to other AltaVista sites.
Danny Sullivan reports on Inktomi's new 'conceptual search' which Danny prefers to call anti-proximity. The idea is that for single term, Inktomi's ranking will prefer uses of the term by itself rather than in common phrases. For example, a search on 'york' or 'mexico' will push pages to the top where those terms are used by themselves rather than in other common phrases like 'new york' or 'new mexico.' It's an interesting approach that other search engines may wish to consider.
True search engines do not always like meta search engines that seem to freeload on their hard-built databases and yet contribute not cash to the process. Google usually blocks meta search engines from retrieving Google's results, but now they have reached an agreement with InfoSpace to include regular search results and ads from Google's AdWords database on their meta search engines including Dogpile, Excite, WebCrawler, MetaCrawler, and InfoSpace. See the InfoSpace press release or the one from Google.
Pandia reports and translates some interesting experimental work FAST is doing on the display of results. It involves "technology that recognizes person names and geographical locations in all types of documents." You can see a screen shot in the original Norwegian article at Digi.no.