Open Access NewsNews from the open access movement Jump to navigation |
|||
Google creates and searches OCR'd editions of scanned texts Google is stepping up its use of OCR'd editions of image scans in its search index. From its October 30 announcement:
Here's the example from the first paragraph of the announcement:
Comment. Google has been OCR'ing its scanned books from the start (December 2004), in order to make them searchable. But it didn't release HTML editions until July 2007, presumably to prevent easy indexing by rival search engines. When it released the HTML editions, it said its purpose was to help visually-impaired users, whose reading software doesn't work on images. That was a good reason, but I never understood how it overcame Google's famous reluctance to share its work with rivals. As I wrote at the time:
I have a similar mix of appreciation and puzzlement today. But in addition to wondering why Google relaxed its grip on a competitive advantage, I'm also wondering whether this has any connection to the new settlement with book publishers. Today's announcement is not about book texts, but the HTML editions are based on technology Google developed for its book scanning program. |