Open Access News

News from the open access movement

Wednesday, July 04, 2007

Plain-text access to Google-scanned PD books

Bethany Poole, Greater access to public domain works for all users, Inside Google Book Search, July 3, 2007.  Excerpt:

Today we launched a new feature for Book Search to help more people access the world's great public domain works. Whenever you find an out-of-copyright book in our index, you'll see a "View plain text" link, which lets anyone access the text layer of the book. As Dr. T.V. Raman explains on the main Google blog, this opens the book to adaptive technologies such as screen readers and Braille display, allowing visually impaired users to read these books just as easily as users with sight.

This is an exciting step for us in our mission to organize the world's information and make it universally accessible and useful. To learn more about Google's efforts to make books and other digitized content more accessible to everyone, check out Dr. Raman's full post.


  • Access for the visually impaired is important and long overdue.  But the new plain-text layer also provides access for cutting and pasting, text-mining, and other forms of processing.  Making these books accessible as texts, and not merely as images, is a breakthrough for all users.
  • Google always had plain text behind the scenes for searching.  That is, it had to perform OCR on the scanned images in order to build the search index which was the raison d’ętre of the projectI've been assuming that Google didn't want to provide OA to the text versions because it didn't want to provide crawlable texts for rival search engines to index.  If so, then what's changed?  Are the newly accessible texts inferior to the versions Google uses to build its index?  (If so, how?)  Or has Google decided, like the OCA, that it doesn't need to be the exclusive indexer of the books it digitizes?