September 2009 Archive
« May 2008 | Main | October 2009 »
Google, reCAPTCHA, and the Internet Archive
Google has announced that Google acquires reCAPTCHA. reCAPTCHA is a clever use of scanned, poorly OCRed text as a Captcha that prevents bots from spamming forms and at the same time helps improve OCR (Optical Character Recognition - the process of taking a scanned image of a page of text and converting it into searchable text).
I had always liked the idea of reCAPTCHA, especially since it was reputedly helping the Internet Archive with their scanning of books which (unlike Google books) they make open to everyone and focus on clearly out of copyright works.
However, with the Google announcement, I saw very little mention of how this might impact the Internet Archive. I assumed that Google would switch the reCAPTCHA underlying data from the Internet Archive to the Google Books project (which is not open and it remains to be seen how willing, if at all, Google will be to let other search engines use the searchable data from all their scanning).
Then I was even more surprised to read at reddit that the Internet Archive had never received any correction data from reCAPTCHA. "I don't expect to get any data from the reCaptcha project, since we've asked several times and received no response."
Just another example of a great sounding project that failed to deliver the results it implied. I'm sure Google will make sure to have it help their scanning and OCR projects, but I, for one, am no longer interested in using it.

Subscribe