Search Engine Showdown
 

Search News

A Comic View of Google

Know anyone who thinks of Google this way? From today's Pearls Before Swine:

    See more »

Dated Oct 21, 2009 in Google - [#permalink]

Google's Skimpy Site Results

For all the subtle and experimental changes Google has been making to its results display lately, such as the jump to links and enhanced page links, and that Google even announced the changes (which does not always happen), I don't think I'd previously seen this particular approach to (what should I call them?) site links, or subsite links, or indented results from the same site. It seems related but not the same as the contextual show more results change from July.

You can see it in the screen shot below or try it yourself. For the past two days, I've seen the same results for this search on both IE and Firefox, and on at least three different computers. What's different from the usual indented display?

    See more »

Dated Oct 1, 2009 in Google - [#permalink]

Google, reCAPTCHA, and the Internet Archive

Google has announced that Google acquires reCAPTCHA. reCAPTCHA is a clever use of scanned, poorly OCRed text as a Captcha that prevents bots from spamming forms and at the same time helps improve OCR (Optical Character Recognition - the process of taking a scanned image of a page of text and converting it into searchable text).

I had always liked the idea of reCAPTCHA, especially since it was reputedly helping the Internet Archive with their scanning of books which (unlike Google books) they make open to everyone and focus on clearly out of copyright works.

However, with the Google announcement, I saw very little mention of how this might impact the Internet Archive. I assumed that Google would switch the reCAPTCHA underlying data from the Internet Archive to the Google Books project (which is not open and it remains to be seen how willing, if at all, Google will be to let other search engines use the searchable data from all their scanning).

Then I was even more surprised to read at reddit that the Internet Archive had never received any correction data from reCAPTCHA. "I don't expect to get any data from the reCaptcha project, since we've asked several times and received no response."

Just another example of a great sounding project that failed to deliver the results it implied. I'm sure Google will make sure to have it help their scanning and OCR projects, but I, for one, am no longer interested in using it.

Dated Sep 17, 2009 in Book Search | Google - [#permalink]

Google Search Focus

Sometimes I find the Google blog posts to be long winded, high on hype, and low on information value. Yesterday's post about Google Search Quality started out in a similar vein, but it quickly improved and contains a number of interesting points about how Google handles searches and ranking. And for all those who like to say, "Just make it more like Google" and expect that to be a simple fix, please note the way Google describes their hard work on search quality is that "more than one thousand programmer/scientist years have gone directly into their development."

Several extracts that I found of interest include:

  • Ranking algorithms include many aspects beyond PageRank:
    • language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes)
    • query models (how people use language today)
    • time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time)
    • personalized models (not all people want the same thing)
  • Evaluation includes automated evaluations every minute (to make sure nothing goes wrong)
  • Change Frequency: "In 2007, we launched more than 450 new improvements"

While these do not, perhaps, have any direct bearing on how we can better use Google, it does help to inform us about the rationale for changing results and different processing from one day to the next.

Dated May 21, 2008 in Google - [#permalink]

Major Expansion at Google Translate

Earlier this month Google expanded the number of languages available in Google Translate. While the press release and most other coverage talked about ten new languages, the number of language pairs (from language X to language Y) increased far more substantially. Previously, Yahoo! Babel Fish had the most with 38 pairs. Google not only upped the number of possible languages, but every language listed can translate to the other. So depending on how you count, Google Translate now has over 500 language pairs available! That's a major increase. As Google Operating System notes, the counting varies depending on how you count Chinese. Only one choice is given for input of "Chinese," but Google Translate seems to accept both the Simplified or Traditional versions. Output can specify either Simplified or Traditional. So, if you count both versions of Chinese as one languages, this means Google Translate can machine translate 506 language pairs. If you consider that as two, it would be 552. And do note that you can input either version of Chinese characters and have it translated to the other.

Also note that Google has not only expanded its machine translation abilities but has augmented its Translated Search as well. Translated Search (also available on the Language Tools page as "Search Across Languages") will translate the query words and then display results in both the original language and in translation. Google translated search can machine translate query words and pages between the following languages. The following ten languages have been added along with the ability to translate between any of the possible language pairs.

  • Bulgarian
  • Croatian
  • Czech
  • Danish
  • Finnish
  • Hindi
  • Norwegian
  • Polish
  • Romanian
  • Swedish

Presumably, Google has been able to make such a major expansion of language translation pairs available by using statistical machine translation developed in house. This process is described in their FAQ: we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model." Moving to this approach certainly seems to have allowed such a major expansion. Bear in mind that all of this automatic translation is prone to error, although it should give some rough sense of the underlying meaning. I've updated my Online Translation and Translated Search pages with the new languages.

Dated May 18, 2008 in Google | Search Features - [#permalink]

Read More:

See the full blog to view older entries, the date archive, and the subject archive.

rss Subscribe

Teaching Web Search Skills Now Available!

Add to del.icio.us Furl It