May 2006 Archive
I am sorry to see that Daypop is down again. The error page delivered, dated 5/11/06, gives the following message:
Daypop down until further notice... Sorry for the inconvenience. After adding a bunch of submitted sites, Daypop no longer has enough memory to calculate the Top 40 and other Top pages. If there's no simple fix, Daypop won't be back up until a new search/analysis engine is in place. A new engine will take at least a month to get online.
I've made a fairly major redesign to several core aspects of the site, including this blog. It will be a rather slow process before I get all the pages fixed, especially since I'll be updating much of the content as well. So be patient, and if you see any egregious errors (or especially display problems), please let me know with the feedback form! Thanks for your patience. Once I've processed more of the update, I should be posting more regularly again.
I am not sure when this happened, but Yahoo! now again appears to support the wildcard-word-within-a-phrase search technique, this time with an asterisk. Just use an asterisk * within a phrase search to match on any one word in that position. So, for example, to find "addictive semiconscious vice of biblioscopy" when you are not sure of the second word, search "addictive * vice of biblioscopy". Multiple stop words can be used as in "addictive * * of biblioscopy". Up to March 2005, Yahoo! used to support this by using a stop word like a in a phrase instead of the asterisk, but then it stopped working. I'm not sure when the asterisk started working, but now it does. I hope it is not just a short term experiment.
Bill Tancer from HitWise has several fascinating posts derived from their analysis of Internet traffic patterns. He has one on The top 20 most visited Google sites along with their relative percentage of traffic to each. Due to the interest from that post, he followed up with a similar one for MSN and Yahoo! and then compared each of those three within specific categories. At Google, their Web database got about 80% of the traffic among those top 20 Google properties for the week in quesiton while image search had about 10%. That left only 10% for the remaining properties. Google Directory had more traffic than Google Local. Many other interesting points can be seen in these charts and graphs.
ZDNet Reports on Yahoo!'s analyst presentation. In particular, they reproduce slide 12 of the 188 [12.2MB PDF] slide show which presents an increase in the average number of words per query over the past few years. The slide gives the following:
- 1998: 1.2 words
- 2004: 2.5 words
- 2006: 3.3 words
The article also mentions that "Academic studies show user satisfaction also increases as search query length grows" and even links to the study that makes that claim. Of course, the longer the query grows the closer it will get to zero results, so that is only true to a point.
Matt goes on at some length (about 2,000 words) to explain recent changes to Google crawling and indexing process and the Bigdaddy roll-out earlier this year in his Indexing timeline post in his blog. The comments get even longer, but it is an interesting read which explains in part at least why so very old supplementary records have hung around in the Google database for so long.
A (very) long discussion thread at WebMasterWorld, Pages Dropping Out of Big Daddy Index, explains an inconsistency with how the
site: search at Google works at the moment. For some background, Google has a large database of "supplemental results" which typically only show up in search results when the total number of results is less than some Google-only-knows number. These supplemental results (tagged as such after the file size) are updated much less frequently and are often duplicates or dead links. However, sites with large numbers of pages find that some of those pages end up in Google's supplemental database. This discussion thread shows discusses how the
site: search is failing to bring up some results from the supplemental index even though the pages might be found by a keyword search.
Tara reports on her experimenting with Two New Google Operators and Limited Google Clustering based on a report from India about using a
type: prefix that would result in response about the category it falls in along with a source citation. The examples are interesting, but as of a week later, it does not work for me. Presumably, another short-lived user interface (UI) experiment.
In the same post, Tara mentions two other posts about another UI test. This one has some "refine results" suggestions at the top of the results page, a feature other search engines have had for years. I wonder if Google will roll this one out or not?
With a "Yes, we are still all about search" title Google announces four new products that are supposed to "enhance and improve the search experience for our users." Try them out to see if you agree.
- Google Co-op is another foray into social networking and collaborative searching.
- Google Desktop 4 is yet another update to their desktop search with an emphasis on many new "Google Gadgets."
- Google Notebook (which is not even in beta yet -- it is due out next week) sounds like another bookmarking and clipping application similar to many others out there.
- Google Trends is initially the
most interesting to me. It lets users search on a selection of Google search
queries. You can compare search queries (separate them with a comma) or just
see a graph of the search volume on a single query. Unusual queries give a "do
not have enough search volume to show graphs" response.
The Depth and Breadth of Google Scholar: An Empirical Study from the April 2006 issue of Portal should be available to anyone on a campus with a Muse subscription.
For advanced search geeks, if you've not looked at the Google Hacking Database from "I'm j0hnny. I hack stuff," you are missing a fascinating collection of advanced search tricks. Bear in mind that many of these tricks are designed to find passwords and cracks, but the techniques are well worth perusing anyway.
It looks like both Alexa and A9 have switched from using an abbreviated Google Web database to using MSN's (although it is labeled Live.com which is more of a different front end to the older MSN Search database rather than a different underlying database). At the moment, there is no longer any image search at A9 (previous one was from Google). Nor do I see Google text ads on Amazon anymore.