Site Updates Category Archive
I know it has been a long time (several years) since I have done much with SearchEngineShowdown. Now that a very busy few years are winding down, I will try post more frequently and start updating much of the old content on the site. For readers still with me, thanks for your patience!
For some time now I have been speaking and writing about ways of speed searching and search switching. Somehow, I've neglected to add the links to my site. So I'm fixing that tonight, before my presentation on Wed. at CIL 2008. The new Search Switching page includes sections for Search Switching Between Web Search Engines, Geographic Search Switching, Book Search Switching, and other options including another link to my Bookmarklets page (with its search transfer bookmarklets).
See also my article, "Speed Searching," in the March 2008 issue of Online (available for fee at ITI's InfoCentral or free from many library databases such as AccessMyLibrary) and my article, "Switching Your Search Engines," from the May 2007 issue of Online (available from many library databases including AccessMyLibrary).
In recent months I have been speaking and writing about some of the language search and translation features of the search engines. Which search engine has the most language limits? Which online translator has the most language pairs? And which ones offers translated search? (Exalead, Yahoo!, and Google, respectively). So I've added a new Language Search Tools page, with links to pages about Language Limits, Online Translation, and Translated Search. I have also finally made a major update to my Search Engines by Search Features page and linked the language page from there.
For many searchers, especially those of us in the middle of a fairly mono-lingual part of the U.S., the language tools may have little appeal, but even in the midst of Montana, I still find times that I come across a non-English site, email, or term that can benefit from the use of these tools.
I've updated my search bookmarklets page due to changes from some of the search engines including Gigablast's new interface, finally changing msn.com to live.com in the code, and the addition of several new links:
- An animated .gif of the search transfer bookmarklets
- Moving the search transfer bookmarklets to the top
- A new bookmarklet for numbering Yahoo! results
- Updating the bookmarklet for numbering Google results
- Gigablast and Exalead search box links
I've made a fairly major redesign to several core aspects of the site, including this blog. It will be a rather slow process before I get all the pages fixed, especially since I'll be updating much of the content as well. So be patient, and if you see any egregious errors (or especially display problems), please let me know with the feedback form! Thanks for your patience. Once I've processed more of the update, I should be posting more regularly again.
Over the next month, I'm planning lots of updates to the site both on the backend and for the content. For anyone tracking the RSS feed, please note a change in the feed address. The index.rdf will eventually go away. The new address is at feeds.feedburner.com/sesnews. I'll eventually have a redirect for the file. If you notice any broken links, please let me know!
I have updated the search bookmarklets page. It includes several fixes to the older bookmarklets, the addition of several select and transfer bookmarklets so that Ask Jeeves, Exalead, Gigablast, Google, MSN Search, Teoma, and Yahoo! are all included. I also added a Yahoo! site search and a numbering of MSN results bookmarklet. For those new to bookmarklets, the page includes instructions and a demonstration of how to set up and use the bookmarkelts.
I don't know how recently this changed, but Yahoo! used to not search the stop words within a phrase search. In other words, searching for "difference in principle" would find matches with "difference of principle." While this is a good thing for phrase searching, it breaks several neat tricks you could do with Yahoo! In particular, the Yahoo! hack to get it to search for a wildcard word within a phrase search no longer works. This also breaks Tara's YNAPS -- Yahoo Non-API Proximity Search.
Google went through this same process a few years ago. However, they began allowing the use of an asterisk * to be a wildcard word in a phrase when they started searching stop words in a phrase. I would love to see Yahoo! do the same (and for that matter allow the * to function as a regular truncation symbol)! Until that happens, for proximity searching, we now only have Exalead with its NEAR operator and the unofficial GAPS. Yahoo! Search review updated.
With MSN's new database and search engine, I've finally updated the MSN Search review to reflect the changes. I also left up the old review of MSN Search when it was using the Inktomi database from Yahoo! for the sake of comparison.
I posted a new showdown based on how well the search engines handle very long words. The long word showdown found that Gigablast ranked best for long word searching since it could handle a query with a word 1,896 characters long. Google can't handle a 155 character query that both MSN and Yahoo! can find. Who will ever search a word that long and who cares? Probably only search geeks, but hey, I was curious about it.
Although I've had a review of Yahoo! as a directory for several years, now that Yahoo! has launched its own search engine, I've made a first attempt at a review of its search features. Since it is fairly new, I expect to see the features change over the next few months, but at least I have something up that seems accurate as of today. A few notes about the current version of Yahoo! Search and items highlighted in the review:
- The Yahoo! databases appears to primarily be an Inktomi-like database, but there are significant differences from other Inktomi-based search engines like MSN Search and HotBot.
- Both cached copies of pages and HTML versions of PDF and other file types are available
- Only the first 500 KB of a document are indexed, which is better than Googles 101KB but still short of full document indexing that has been available at AlltheWeb
- Full Boolean searching using AND, OR, NOT, and parentheses for nesting seems to work
- Field searches are available with intitle: inurl, site:, link:, hostname:, and url:
- The new search engine database is available on the main Yahoo! site and directly at search.yahoo.com.
Usually, search engines will replace all punctuation marks with a space when they index Web pages. And if you use a punctuation mark between words in a query, the search becomes a phrase search. In other words, a search on
import-export is the same as
"import export". However, Google has a couple exceptions to this rule for two characters: the ampersand & and the underscore _. Both can be searched by themselves or as part of a character string. In other words, a search on
adv_search gets different results than
"adv search" and
&tc differs from
tc. And for programmers, while it would not search # or + in most cases, it does
c. It does not, however, differentiate
c and both
c+. Other punctuation marks may change the sorting of results. So Google does some different treatment of punctuation marks, and it has changed over time as well.
I've updated the "unique" section of my Google Review. I also updated the site search page by finally removing the defunct Northern Light search box and the defunct xrefer search box. I also updated the Reference Search Tools page.
For more than a month now, the intitle: and inurl: field searches have been broken. I first heard of this on May 27, 2003. The advantage of intitle: and inurl: over the advanced search page Occurrences section or the allintitle: and allinurl: field searches was that they applied to only a single term and could be combined with other search terms that would look through the record. So now, searchers can not do a search that looks for one word in the title and another in the body. A search that tries like "market research" intitle:tourism retrieves many results that do not include 'tourism' in the title.
At first I thought this was a temporary glitch from the strange May update, but it has persisted through the June update and has continued for some time. Hopefully it will be correct sometime soon. I've updated the Google Inconsistencies page with this problem and several others long term problems.
In addition, I updated several parts of the Google Review, including the addition of several language limits added in early 2002 that I had missed: Croatian, Indonesian, Serbian, Slovak, and Slovenian.
I have posted new results from a comparison of the freshness of the search engines' underlying databases. With data from May 17, there is a wider spread of crawling this time. AltaVista had the most recent page, from the same day as the comparison, followed by Inktomi and AlltheWeb. Google's most recent page was two days old. And althought AlltheWeb still has a few extremely old pages and even Google has some stragglers from over five months ago, most of the records show that the crawling was primarily from roughly one month before the comparison. Note that all the results used for this comparison all linked to pages with the current day's date.
I finally got several pages updated on the site, including a revised HotBot Review reflecting the current version, in particular how it handles Inktomi. I also updated the search engine chart and the search engines by features pages, removing Openfind and NLResearch and updating the HotBot and MSN lines.
I've finally updated the listings on my Other Internet Search Tools page which covers searchable sources for articles, forums, email lists, blogs, etc. Two new pages linked from there are the Reference Search Tools covering just a few selected free online reference tools and the Archives page with sources for cached copies of Web pages and other ways to find old or dead pages.
As of Feb. 28, 2003, Northern Light Current News has stopped being updated at both northernlight.com and nlresearch.com. I suppose it is not too surprising, since divine, Inc. (owner of Northern Light) has filed for bankruptcy earlier last week. Northern Light Current News search was a great resource because rather than searching and crawling news Web sites, it had access to actual wire news feeds. Oh well, another great resources appears to be headed for the dust bin.
Googlert and SearchAlert.net are two new free services that offer email alerts when new search engine results are available. Googlert was launched in January, but I'm not sure when SearchAlert.net started. Googler works only on Google and does require registration for a free Google API key. SearchAlert.net says that it "continually monitors the big Web search engines" but does not specify which ones. Alerts page updated with both of these.
AlltheWeb has added full Boolean searching from the advanced search page. After selecting the "boolean expression" drop down option, you can now use and, or, andnot, and parentheses for nesting. In addition, they have introduced a 'rank' operator, language detection, and new search tools.
First of all, the full Boolean searching with all three operators and nesting is a welcome addition. But it takes a bit to find where to use it. You have to use the Advanced Search and then select "boolean expression" from the drop down menu. Note that the NOT operation uses 'andnot' with no space. The old +, -, and the parentheses for an OR do not work when "boolean expression" is chosen, and the Boolean operators will not work unless it is, so be careful. The 'rank' operator also only works with "boolean expression" chosen and is supposed to boost results that contain the specific keywork. So a search such as
term1 or term2 rank term3 should change the ranking so that those records with term3 score higher, although it is still not required. But the 'rank' operator sometimes does strange things, so be wary of the results.
AlltheWeb now tries to identify searchers based on their IP address. It will then default to the main language or languages of that country plus English. This will appear on the simple search form as the default language limit with an "Any Language" option as well to the left. To change the default, just click on the language to go directly the language section of the preferences pages.
AlltheWeb also introduced a variety of quick links, bookmark shortcuts, and search options for Internet Explorer, Netscape, Opera, and Mac OS / Sherlock from their Search Tools page. You can use these tricks to search AlltheWeb directly from the address box, by highlighting a term on a Web page and then clicking a bookmark, and more search shortcuts.
AltaVista's wild card or truncation symbol, the * or asterisk, has expanded so that it covers an unlimited number of characters. It used to only represent 0-5 extra characters and a double asterisk (**) had to be used for unlimited. Also, in addition to using the < for a before operation, the > works for after. This all probably happened sometime last year, but I have finally noticed and documented it now. Search Engine Feature Chart, Search Engines by Feature Page, and the AltaVista Review have all been updated.
I've updated the Search Engine Showdown Current Awareness page to reflect the death of the Northern Light alerts and to add several other alert services.
I've finally updated my Relative Size Showdown along with the related Change Over Time and Total Size Estimate analyses. Despite the efforts of several competitors, Google stayed solidly in the lead and for the first time since I've been doing these comparisons, Google ranked first on every one of the 25 searches. Even so, AlltheWeb grew significantly since last March and certainly narrowed the gap, while the oft-forgotten AltaVista also made a major size increase and pulled into third place. These comparisons looked at the results from 25 small searches where I could verify the results.
At least all the main search engines found more results than they did last March, except for WiseNut and the nearly dead Northern Light (via NLResearch). This was also the first comparison to include Gigablast. I neglected OpenFind since the results were too inconsistent and still full of errors. I should also note that I am just finally publishing these results, but the data is from Dec. 31, 2002. There has been much change since then with a new Google database and further updates at AlltheWeb and others. Still, I hope it is useful as one snapshot in time of how the size of the search engines databases compare with each other.
So how old are those search engine databases? I've posted a new Freshness Showdown. Google, MSN Search, HotBot, and AltaVista all had pages indexed in the last day while AlltheWeb and GigaBlast had pages from 5 and 6 days back, respectively. The comparison looked at URLs, all of which are updated daily, and if the search engines would recrawl those URLs they would have the current days date. The analysis notes the most recent pages at each search engine, a rough idea of the general date of most of the rest of the records, and the oldest page found. On the other side, several search engines had some very old portions of their databases dating back six months, a year, and more.
I've made minor updates to several reviews including HotBot, Lycos, MSN, Fast Search, and Inktomi. Changes include updates on which Inktomi features work at HotBot and MSN, and a note on how to get the MSN advanced search to work without a search term. (Thanks, Gary, for that tip.) I've also added the Postion Tech Inktomi search to the Inktomi review.
I am finally starting the slow process of updating the site design for Search Engine Showdown. At this point, part of the site has been converted, and over the next few weeks, I hope to finish the rest. In addition to the redesign, news and updates from the past few months are finally being posted. See in particular the new reviews for Gigablast and Openfind, an updated search engine features chart and the search engines by features page. There is a new news archive which includes subject access to news postings (at least for those since about May 2002).
I know there is lot more to be done, but if you have any opinions about the new design or other comments about the site, email me at email@example.com. Oh, and if you want to link to a particular news post, use the [link to this story] link at the end for a story-specific URL.
After more than two years, I have finally updated my Overlap and Unique Hits reports using data from the early March size comparison. These reports show how much overlap there is between the search engines and which search engines found Web pages that none of the others found.
The list of dead and dying search engines on my reviews page has been expanded to include dead search engines for which I never wrote a review: Magellan, WebCrawler, and WebTop.
My column for the May issue of ONLINE is now online: "Dead Search Engines." Online 26(3): 62-64, May-June 2002.
I have finally posted results from my comparisons of the freshness of the search engines. These analyses look at the age of the Web pages indexed by the search engines using pages that are changed every day. The most recent freshness showdown uses data from April 4, but I also posted older comparisons from March 7, 2002 and Aug. 13, 2001. Statistics page updated with link. See also my article "Freshness Issue and Complexities with Web Search Engines." ONLINE 25(6): 66-68, Nov.-Dec. 2001. The Teoma Review was also updated.
The Direct Hit Review and the Dead Search Engines section of the reviews page updated to reflect the death of Direct Hit. My May Internet Search Engine Update for ONLINE is now available on their Web site.
Several new additions to the Search Engine Statistics section. I have updated my Relative Size Showdown and the Total Size Estimate analyses with data from March 4-6, 2002. Using 25 search terms, and verifying the actual number of hits available for the largest search engines, Google has maintained a solid first place, followed by WiseNut and then AllTheWeb . I also updated the Database Change Over Time page which compares the same searches run on the search engines at various times. In addition, I have posted two new pages on Google: the Google Database Components which compares the components of the Google Web database based on the statistics analysis and one on Google's Unindexed URLs which has an explanation and example of Google's barely-indexed URLs. Google Review also updated.
Several minor updates on the site. The Search Engine Feature Chart and Search Engines by Feature Page have additional information on AltaVista proximity and truncation and AllTheWeb's size and date limits. The AltaVista Review, AllTheWeb Review were also updated. Also noted the 101K limitation on Google's full text indexing and updated some URLs in the Google Review.
I have updated both the Search Engine Feature Chart, the Search Engines by Feature Page, the Excite Review, the Northern Light Review, and other pages to reflect the loss of Excite and Northern Light as a general search engines.
Several site updates today. On the Inktomi review, I added a list of former Inktomi clients that no longer use Inktomi which now includes Anzwers which has switched to Yahoo Australia/New Zealand. References to GoTo have been changed to its new name of Overture. I added new information about advanced proximity operators to the AltaVista Review and clarified placement of Overture listings. On the HotBot Review, I updated information about truncation. The Google Review was updated with its new file types and an updated description of its databases. The Reviews page was updated to note that NBCi Live Directory is defunct
Several site updates. I have separated the Fast Review from the AllTheWeb Review, and added the new AllTheWeb features. The WiseNut Review and other mentions have been updated with the new feature and changed usage from WISEnut to the new WiseNut. WiseNut has also wisely dropped the difficult to read white on red for the WiseGuide categories. It is now white on black.
I have updated my Relative Size Showdown and the Total Size Estimate analyses with data from Aug. 14, 2001. Using 25 search terms, and verifying the actual number of hits available for the largest search engines, Google has maintained a solid first place, followed by Fast's All the Web and then the new WiseNut. I also updated the Database Change Over Time page which compares the same 8 searches run on the five largest search engines at various times from May 1999 to Aug. 2001.
Added a WISEnut review and have added the new WISEnut to the Search Engine Feature Chart, Search Engines by Feature Page, and the Review Page. Those pages were also updated by removing NBCi (now just serving GoTo results) and iWon (primarily serving GoTo results). Excite review updated to note site clustering. Updated case sensitive searching section on AltaVista Review and the Search Engines by Feature Page. Fast Review updated with information about using parentheses for OR:
Added a new Teoma Review and updated the Search Engine Feature Chart, Search Engines by Feature Page, and the Review Page. I also gave a full update to the Search Engines by Feature Page to correct links to MSN, add information about the MSN family filter, and remove Magellan and WebCrawler notes. MSN Search Review also updated with note about the adult filter which is always on.
Updated and expanded Inktomi Review with links to the new MSN Search review, the updated changes in Inktomi over time chart, and various other links and comments.
Sometime recently, Excite finally pulled the plug on Magellan, one of the earliest search engines. Going to the Magellan site (which was always located at www.mckinley.com now brings up an Excite screen. WebCrawler still has separate logos and surrounding text, but it now appears to use exactly the same database and search features as Excite. Therefore, both Magellan and WebCrawler have been removed from the Search Engine Feature Chart. I have added a full review of MSN Search and added it to the chart. I have also updated the chart to reflect the lack of functioning truncation at HotBot and have added a "site" notation under sorting for those sites that cluster by site by default.
My article in Online on title searching is now available, as is a new Title Showdown page. I have also updated the Search Engines by Search Features page and the Lycos Review. The updated Lycos review includes a brief section about their new(?) translation capability from Systran and the shortened versions of the field search commands.
I have updated my Relative Size Showdown and the Total Size Estimate analyses with data from Apr. 7, 2001. Using 25 search terms, and verifying the actual number of hits available for the largest search engines, Google has pulled into a solid first place, followed by Fast's All the Web and MSN Search's Inktomi. I also updated the Database Change Over Time page which compares the same 8 searches run on the five largest search engines at various times from May 1999 to April 2001.
The Showdown Inktomi Review page has been updated for a few changes, including MSN Search as the new Inktomi favorite.
Internet Search Engine Update from the March issue of Online available at the Online Inc. Web site.
The latest issue of Showdown News has been sent out to subscribers. The Web edition is available under the archive listing.
I've finally updated my Google review as well as the feature chart, and search engines by features pages to reflect Google's new allintitle and allinurl field searches available in their advanced search and via the
allinurl: syntax. Just note that you cannot yet combine any other search term with the field search statements.
I have updated my Database Change Over Time with data from Nov. 29, 2000. It compares the same 8 searches run on the five largest search engines at various times from May 1999 to Nov 2000. Most show an increase, except AltaVista which finds less hits than it did in July.
The iWon Review has been expanded and fully updated. The search engine chart and search engines by feature page have been updated with the addition of iWon. The search engine chart was also modified to include a separate column for default Boolean operation. The search engines by feature page was also updated to include entries for word stemming and single character truncation.
Several site updates today including the Directory Showdown, with size figures; a fully updated Yahoo! Review including information on its automatic truncation for search terms with five or more characters; LookSmart Review with updated numbers; Open Directory Review with updated numbers; revision of the Snap Review to reflect its new name of NBCi; and updated links on the Reviews Page.
I have updated my Relative Size Showdown and the Total Size Estimate analyses with data from Oct. 9, 2000. Using 25 new search terms, and verifying the actual number of hits available for the five largest search engines as measured by the last Size Showdown, Fast's All the Web found the most with Google in a close second. Northern Light moved up to pass both iWon and AltaVista. See also, Fast's Press Release.
Today, Google has upped their claimed database size on their main Web page to 1,247,340,000 Web pages. Unfortunately, it does not state how many more fully indexed Web pages are in the index. It used to be 560 million. A quick comparison of Google results from today to how they performed on the Oct. 9 showdown does not show any significant change in results.
The latest issue of Showdown News has been sent out to subscribers. The Web edition is available under the archive listing.
I have finally updated my Relative Size Showdown and the Total Size Estimate analyses. Using 33 search terms, and verifying the actual number of hits available, iWon (using an Inktomi GEN3 database) and Google certainly found the most. The half billion record databases can find more, but they did not find that many more hits than some of the others.
Note that only with the iWon Advanced Search (which does not cluster results) was used. The basic iWon search will only display one page per Web site with no option for seeing the others.
New Brief iWon Review added with this information and a list of some of the databases available on iWon.