AlltheWeb Category Archive
Yahoo!-owned AlltheWeb had their LiveSearch experiment running for over a year, display suggested search terms as you typed. Now, the LiveSearch address just redirects to Yahoo! I assume that this is because of Yahoo!'s new search assist feature giving a similar experience.
Karen posted about this on Dec. 13, and it is still re-directing, so it seems likely that it is gone for good. AlltheWeb itself continues to work, but remember it is just using some portion of the Yahoo! database.
The AlltheWeb site is still up and looks quite similar, but Yahoo! has changed the underlying database and removed many of the great advanced features that helped make AlltheWeb such a great search engine. The advanced search page has lost the following options:
- FTP database is gone
- Field searches (drop down) for 'in the URL,' 'in the host name,' and 'in the link to URL'
- Boolean operators must now be in UPPER CASE (and NOT replaces andnot)
- Afrikaans, Basque, Byelorussian, Faeroese, Frisian, Galician, Indonesian, Latin, Malay, Serbian, Swahili, Ukrainian, Vietnamese, and Welsh language limits but gained Farsi
- Media type inclusions
- IP range limit
- Flash indexing and limit
- Size limit
One small bit of good news is that for the moment at least, Lycos continues to provide access to the old AlltheWeb database (the FAST Web Search database). How much longer this will continue is unknown, and Lycos lacks most of the options on their advanced search form that AlltheWeb offered, but at least the database is still accessible a bit longer.
Gary Price noticed that HotBot has dropped one of its databases. Originally labeled FAST and then changed to Lycos, the underlying database was basically the same as the one at AlltheWeb. It is interesting that HotBot, which is owned by Lycos, dropped the database that still powers Lycos. This is likely connected with Yahoo! talking about abandoning the AlltheWeb database, which is a real shame.
AlltheWeb and AltaVista are now owned by Yahoo! when they bought Overture. For now, AltaVista and AlltheWeb continue to be available at their historic locations and have separate databases for their Web search results. However, since at least sometime in November, AltaVista and AlltheWeb seem to have merged their image, news, and video databases. Both continue to have differences in the search interfaces and search features but the content appears to be the same. The audio searches are getting more similar and perhaps share a portion of their database, but the Web results still are quite different.
Now AltaVista, AlltheWeb, Inktomi, and Overture are all owned by Yahoo! The press release quotes CEO Terry Semel, "We are excited to combine the two companies to build the largest position in the rapidly growing Internet advertising market." While the ad market is what pays for the search engines, the real question is which of these search engines will continue and at what sites? For now, AltaVista and AlltheWeb continue to be available at their historic locations, and they may share the same underlying database very soon. Already, AlltheWeb has lost a few search features like the URL Investigator no longer displaying the number for the links, but overall both still work with all of their old search features. Inktomi still is the back-end search engine at MSN Search and remains available at HotBot.
Overture announces that it's AlltheWeb search engine now has a database of "approximately 3.2 billion." The AlltheWeb home page says 3,151,743,117 is the current number, although it is actually a bit higher. That trumps Google's claim, up since last November of 3,083,324,652. Of course, Google probably has more than that by now, and I expect they'll boost that number on their home page soon. The exact number is less important than recognizing the AlltheWeb has been able to get their database to about the same size as Google.
It should be a benefit to searchers to have two very large databases. And it does seem to show a commitment on Overture's part to continual improvement of the AlltheWeb underlying database. I have not yet run detailed comparisons, but I expect that both Google and AlltheWeb will find some pages that the other does not have.
Finally, another search engine offers searchable access to about as many file types beyond PDFs as Google does. AlltheWeb has expanded from indexing just PDF, Microsoft Word, and Flash files (beyond the HTML of most Web pages) to include Rich Text Format (rtf), PowerPoint (ppt), Excel spreadsheets (xls), PostScript (ps), and even WordPerfect (wpd) and StarOffice (sdd and sdc) files. There may be more besides these. To limit to one of these new file types, use the
filetype: command followed by the name of the file format. Others use the extensions, so note the difference with the command at AlltheWeb.
Here are the new filetypes that have worked for me:
Yahoo! announces today that they are acquiring Overture, known for its highly profitable ads, ranked by the highest bidder. And Overture earlier this year bought up AltaVista and AlltheWeb. At a price of approximately $1.63 billion in cash and stock, Yahoo! expects to close the deal by the fourth quarter of 2003.
So Yahoo! will own the Inktomi, AltaVista, and AlltheWeb and FAST Web Search properties, three of the major Web search engines. Yet currently Yahoo! still uses Google for the majority of its search results. That should be changing sometime soon, but whether they will combine the three, use only one, and what will happen with the AltaVista and AlltheWeb search sites and advanced capabilities and syntax, no one is saying.
And who's left outside of Google and the Yahoo! group with their own custom build databases? Ask Jeeve's Teoma, LookSmart's struggling WiseNut, and the newcomer (from last summer) Gigablast. Well the consolidation predicted to happen about five years ago is finally occurring. Let's hope that search will still continue to improve, expand, and offer even more options and resources.
It appears that Google's spider is not only checking robots.txt files, it is also indexing and even caching some of them. Try a search on
allinurl:robots.txt to see some examples, or see the cached copy of the Salon.com file.
It would be interesting to know why they are doing this. Other search engines, like AlltheWeb will index robots.txt files that do not follow the protocol as in the search for
disallow user-agent url.all:robots.txt. (The results either have the robots.txt file not located in the root directory or the filename is not all lower case.) But with Google not only indexing the content of the files but also saving cached versions, this opens up some interesting applications for searching for sites that exclude specific bots and also to track changes in a robots.txt file for a specific site by comparing the cached version to the current version.
How long this may remain available will depend on whether this was intentional on Google's part or simply a mistake. Since some of the KWIC extracts (snippets) show some code such as
that are not actually in the original files, I suspect that it may be either a mistake or that it just still has some bugs that need to be worked out.
On top of all the continuing confusion of acquisitions and ownerships changes in the search engine field, comes this one. Earlier this year Overture bought up AltaVista and the FAST Web Search business including AlltheWeb. That left FAST Search and Transfer with the FAST enterprise search business but not the public Web searching business. Now Overture is selling the AltaVista enterprise search portion of AltaVista to FAST. Confused? Try the FAST press release. But basically, Overture now owns the AltaVista and AlltheWeb and FAST public Web search engines. FAST Search & Transfer has the FAST and AltaVista enterprise search engines (for site search, intranet search, etc.).
AlltheWeb has added in suggested spelling corrections. See for example a search for assumpion. One nice touch is that suggestions appear for languages other than English such as German and French as well as for names such as vivenddi.
Overture announces that it has completed the acquisition of the Web search portion of FAST. So Overture now owns AlltheWeb. There is still only the press release about it on the Overture site and no detailed product information.
It is good to see AlltheWeb still innovating despite being acquired by Overture. New today are several small but useful features:
- Dictionary look-up: just like at Google, search words on a results page after the number of results are hot links to a dictionary look-up at Dictionary.com
- New search spy: See the last 10 queries (presumably censored)
- New shortcut keys can be turned on for use at AlltheWeb for going back to the home page, switching to the multimedia databases, and more.
- Calculator: this may have been around for a while, but try a math query like 13*944 to get an answer and a list of functions.
In addition to the other changes at AlltheWeb, the directory depth limit and the home page limit are now gone. It seemed that they originally added those limits to help get the HotBot account. Now that search feature is also gone at HotBot for their FAST database and their Inktomi database.
AlltheWeb has a brand new look. There are both cosmetic and content changes. First, the cosmetic changes:
- Banner ads are gone
- More readable results pages
- Advanced search has a new Boolean box
- New color palette
- "Streamline user interface"
- Four font size change icons are gone
- New slogan: Find it all
Even more exciting is their somewhat hidden new search feature, AlltheWeb URL Investigator. This is invoked if you search for a URL. Enter a URL, and the results page will include some if not all of the following information about that URL. See an example for loc.gov.
- FAST Facts including page language, size, and last update date
- The record and a link for the page itself
- The number of pages that link to the URL
- The number of pages that contain the term
- The number of pages at the site
- An Easywhois link to domain ownership information
- A link to the Wayback Machine copies of how the page used to look
- Subdomains at the site, if any
- Open Directory category the page is in, if any
A few other changes to note:
- There is now an easier switch for the offensive content filter. Just click On or Off in the upper right-hand corner to make the change.
- Also, the FAST Topics option is gone from the preferences settings. After months of waiting for FAST Topics to reappear, it looks like they have given up on that initiative.
- The count on the home page went from 2,112,188,990 yesterday to 2,147,483,647 today.
Another week, another acquisition. Last week, Overture announced plans to buy AltaVista. This week it announces the planned acquisition of the Web Search Unit of Fast Search & Transfer which includes AlltheWeb, FAST Web Search, and the FAST PartnerSite paid inclusion program. Purchase price is $70 million plus a performance-based cash incentive payment of up to $30 million over three years. See also the FAST press release.
Bear in mind that, at least for now, that AlltheWeb and AltaVista continue to have their own, separate database and their own unique search features. How this all will change in the future remains to be seen.
AlltheWeb has added full Boolean searching from the advanced search page. After selecting the "boolean expression" drop down option, you can now use and, or, andnot, and parentheses for nesting. In addition, they have introduced a 'rank' operator, language detection, and new search tools.
First of all, the full Boolean searching with all three operators and nesting is a welcome addition. But it takes a bit to find where to use it. You have to use the Advanced Search and then select "boolean expression" from the drop down menu. Note that the NOT operation uses 'andnot' with no space. The old +, -, and the parentheses for an OR do not work when "boolean expression" is chosen, and the Boolean operators will not work unless it is, so be careful. The 'rank' operator also only works with "boolean expression" chosen and is supposed to boost results that contain the specific keywork. So a search such as
term1 or term2 rank term3 should change the ranking so that those records with term3 score higher, although it is still not required. But the 'rank' operator sometimes does strange things, so be wary of the results.
AlltheWeb now tries to identify searchers based on their IP address. It will then default to the main language or languages of that country plus English. This will appear on the simple search form as the default language limit with an "Any Language" option as well to the left. To change the default, just click on the language to go directly the language section of the preferences pages.
AlltheWeb also introduced a variety of quick links, bookmark shortcuts, and search options for Internet Explorer, Netscape, Opera, and Mac OS / Sherlock from their Search Tools page. You can use these tricks to search AlltheWeb directly from the address box, by highlighting a term on a Web page and then clicking a bookmark, and more search shortcuts.
FAST announces an agreement with Espotting (a paid ranking search engine like Overture). FAST will provide the general search engine results at Espotting's site after the paid ranking results. Also, the press release notes that "Espotting will provide their top three paid listings on AlltheWeb's European search results pages." So European searches will get Espotting "Sponsored Results" while the rest of the world will continue to see "Sponsored Results" from Overture. These are separate from teh regular search results at AlltheWeb. Apparently, the searcher location will be determined by top level domain origin of the searcher, so it will not always guess right.
In addition to the file type limit on their advanced search page for PDF files, Flash files, and Microsoft Word files there is now syntax which can be used directly in the search box:
filetype:docfor MS Word files as opposed to AlltheWeb's new
filetype:msword. AlltheWeb review, search feature chart, and search engines by feature have all been updated. Thanks to Gary Price for this find.
AlltheWeb is now indexing Microsoft Word files. The advanced search form has also added Microsoft Word as a File Format limit. It looks like about 1.3 million Word files (those with a .doc extension) are included. Thanks to Gary Price for catching this one.
AlltheWeb has a new Halloween skin and has announced that it is now fully XHTML and CSS compliant. The Halloween and other skins can be seen and installed in their skins gallery but do require 5.0 and later browsers (i.e. not Netscape 4.7). Beyond design changes, there is little of interest to searchers here, but note that "these new standards provide a future opportunity for AlltheWeb users to perform their searches on platforms such as mobile phones, personal digital assistants (PDAs), among others."
While Google gets lots of press for its relaunch of its news search, AlltheWeb has been busy the past few weeks. As of yesterday, they have started rolling out a dynamic keyword-in-context (KWIC) extract in the results list. Available at Lycos for awhile now, this feature is finally making its way into the AlltheWeb results. This is the kind of display Google usually provides where the extract contains the actual search terms along with some of the surrounding text.
In addition, AlltheWeb has a new field search of
site: which is more precise and easier to remember than their older
url.domain:. I've updated the AlltheWeb Review and the example in the Fields section notes that
site:www.total.com finds different results than
Give it a try, but expect that there may be changes to the way it works over the next few weeks.
AlltheWeb has extended the deadline for the AlltheWeb Alchemist Contest to Oct. 31, 2002.
Like Google's programming contest from earlier this year, AlltheWeb now is having an AlltheWeb Alchemist contest offering prizes to the best designs received.
FAST announced today their support of the FTC advisory about the disclosure of paid listings. AlltheWeb now has a special page offering information about their results.
AlltheWeb announces their AlltheWeb Alchemist tool for customizing the design and the look and feel of AlltheWeb through cascading style sheets (CSS).
FAST announced today that their major growth drive over the past few months has pushed their database at AlltheWeb over the 2 billion mark. Their number 2,095,568,809 is a bit higher than Google's advertised number of 2,073,418,204, but then Google's number includes their unindexed URLs as well. Some preliminary comparisons of actual search results show the two fairly even with the difference often found in the number and kind of duplicates in each.
It looks like AllTheWeb now has fully indexed PDF files in its index. The PDF files usually identified with a [.pdf] designator after the title. While no direct limit is available at this time, you can add
url.all:pdf to a search (or use the advanced search with pdf in the "must include" word filter with "in the URL" selected) to see some examples. Note that unlike Google's PDFs, AllTheWeb indexes the full file. Google tends to stop indexing at about 120K. So while a phrase search on
"truck struck the cherry picker basket" finds no hits at Google, AllTheWeb finds three hits including one PDF (even though that PDF is at Google, the phrase occurs after the indexing stops).
AllTheWeb has launched a new design. The search features are all pretty much the same, but there is easier access to the tabs to move between databases (now at top instead of bottom), and for good (more income) or ill (more ads) they now include Overture ad results at top under the Sponsored Search Listings heading. In addition, the page size designation that had disappeared from the results list for awhile is now back.
AllTheWeb has quietly introduced an IP range limit on its Advanced Search Page. See their help file for the details of the syntax, and let me know what interesting uses you can find for it. It could be useful in trying to identify other Web sites operated by the same company or hosted on the same shared host.
Today, AllTheWeb unveiled several changes from Fast Search. They added a news database to their Web, image, multimedia, and FTP databases. The searchable news database covers more than 3,000 Web news sources. Up to two headlines will be displayed with a typical AllTheWeb search, and the results will note that day and time the news page was last crawled. Pages will only remain in the index for five days, but the news crawl is continuous and fully updates all pages every two hours. AllTheWeb has also added dynamic clustering known as Fast Topics. These appear at the top of the results and divide the top 200 hits into folders using the Open Directory vocabulary or automatically generated terms if no Open Directory term seems to match. On the simple search, AllTheWeb now has dynamic query modification that will automatically detect phrases and use antiphrasing to remove phrases like where can i find information on. All of these new features along with many others can be turned on or off using the newly improved customization capabilities. One additional and unique new feature on the advanced search page is the ability to limit by page size.
Fast has abandoned its no-ads approach at AlltheWeb. Banner ads are now showing up on results pages.
AllTheWeb.com a new look and has updated its interface, introduced some new features, and expanded its customization capabilities. The default drop down list is now for language limits, rather than for choosing AND, OR, or phrase searching, although this can be changed back with the customization form. There is now a check box for phrase searching, and some unusual field searches have been added. The AllTheWeb name (with an updated logo) is now more prominent.
Fast's multimedia databases, which have been available at Lycos, are now also available at All the Web at multimedia.alltheweb.com. This Fast Multimedia search include images, video, audio, and combined databases. Their press release claims that it is the largest multimedia database.