Search Engine Showdown
 
 

« Flash SearchMash | Blog Home | Link Searching at Live »

Updated Cache Sources

A recent posting at ResourceShelf introduced two new sources of cached Web pages and reminded me to update my list of sources for archived/cached pages. I've added several other sources that I'd run across and not added to that page over the last few months, including Alexa, Healia, and WebCite, along with the ones mentioned by ResourceShelf: DiplomacyMonitor and ZoomInfo. I've moved IncyWincy to the former sources at the bottom, since I can no longer find cached links there. That makes at least 14 sources for finding copies of old pages.

The main Web search engines and the Wayback Machine continue to have the easiest interfaces and the largest collection of old, archived pages. Of the additions, Alexa has the broadest content coverage but no crawl date. Healia focuses on consumer health pages while DiplomacyMonitor has only the past 90 days of diplomatic and trade documents. WebCite is probably the hardest to use, even though it may have multiple cached copies of some pages. Access is only URLs, and the scope is very limited to those sources cited in journals from the roughly 100 members who use WebCite. But it is an interesting collection of documents that have been cited. There is no easy way to find what is included (and the exact URL cited must be used), but you can try browsing a collection of these documents via Web searches. Today, Live had the most results (reports over 100,000), followed by 400+ at Yahoo!, and only 46 (oops, now it changed to 104!) at Google.

ZoomInfo is also difficult to use to find a specific page. Cached links appear in the people search results for associated pages. The cached links do show the date the page was cached, but no word search or direct URL access is available. You have to know what person might be on a specific page, search that person, and hope that the page is listed under the Web References section.

By Greg R. Notess. Dated Oct 27, 2007 in Search Features


rss Subscribe