HereUAre, Gigablast, 10 Billion, and Spam
Ever heard of HereUAre, which has "Over 10 billion pages indexed?" Try a search and you may recognize the results as coming from Gigablast. So what's the connection? This leads to a rather strange story of a vanished press release that I've been researching on and off for the past month or so. Here's the story.
In trying to update my site awhile back, I came across one page that linked to a June 19, 2006 press release from Gigablast about a database size increase to 10 billion and a new "report as spam" feature. The linked page (beta.gigablast.com/prnew.html) was no longer live. I did find a cached copy of the page, from Sept. 10, 2006 only at MSN Search. (No cached copy were available on Oct. 8 at Google, Yahoo!, Ask, or the Wayback machine.) Fortunately, when I came across, I FURLed the MSN Search cached copy of the page. In checking today, I could not find a cache or link at any of the main search engines. Since FURL saves a copy of the page, I have the text from the press release. I'm glad I did, since I could not find a cached copy of the page at Live or any of the other search engines today when I checked.
To summarize the release, Gigablast now has a database with over 10 bilion pages, and here is where it calls it the "HereUAre search engine." It also mentions a beta (no longer available), "multi-language support, real-time indexing, and improved spam control." One part of the spam control is that at the end of each search result, Gigablast now has a link labeled "[report as spam]." Click that link on to report an entry as spam. The Gigablast site does not have the 10 billion claim on it, although it does continue to have the [report as spam] links. The HereUAre site does have the 10 billion claim and the spam reporting. It also makes it sound as if the search technology is its own, with no mention of Gigablast. I was also surprised that I found no mention of HereUAre, the Gigablast 10 billion, or the spam report at other search engine news sites. So, I'm posting what I've found out, and in the interest of sharing information, is a copy of MSN's cached copy of the press release.
Here's the actual text:
Gigablast Surpasses 10 Billion Pages
ALBUQUERQUE, N.M., Jun. 19, 2006 -- Gigablast Inc, an internet search engine corporation has created a new search engine with more than ten billion searchable pages (the HereUAre search engine). "As far as search engines giving official size reports, we are the largest in the world. But overall we are probably the third or fourth largest.", claims Matt Wells, founder & CEO of Gigablast, Inc. "We provide deep coverage of the web and are constantly improving ourselves to give the highest quality search results. It is an ongoing and ever-evolving algorithm to stay on top of things, but we are driven to be the best.", Wells continues.
The new search engine can be accessed at http://beta.gigablast.com/ and features other improvements as well, such as multi-language support, real-time indexing and improved spam control. The multi-language support allows Gigablast to index exotic languages like Chinese and Japanese. The real-time indexing, which Gigablast supported several years ago, is back. Running on a significantly-larger network of computers, documents can be indexed in real-time at relavtively higher rates. Users can add their documents instantly using the add url interface at http://beta.gigablast.com/addurl. The improved spam controls come in the form of new algorithms trained to seek and destroy those documents you often see in the search results that have nothing valuable to offer. And Gigablast now allows its users to report search results as spam by clicking on a simple link next to each search results which says "report as spam". The Gigablast team is constantly reviewing such reports and can remove the offending spam pages in under a minute.
Gigablast continues to be advertisement-free. It makes money from selling search feeds and from licensing its search software which Wells says specializes in creating very large indexes. "When it comes to searching and indexing billions of pages, Gigablast requires a tiny fraction of the hardware required by our competitors. Our ten billion page index uses only a couple of hundred single-cpu servers while our largest competitors use hundreds of thousands. That really tells you something about our level of engineering innovation.", says Wells.