Open Access News

News from the open access movement

Wednesday, July 04, 2007

Plain-text access to Google-scanned PD books

Bethany Poole, Greater access to public domain works for all users, Inside Google Book Search, July 3, 2007. Excerpt:

Today we launched a new feature for Book Search to help more people access the world's great public domain works. Whenever you find an out-of-copyright book in our index, you'll see a "View plain text" link, which lets anyone access the text layer of the book. As Dr. T.V. Raman explains on the main Google blog, this opens the book to adaptive technologies such as screen readers and Braille display, allowing visually impaired users to read these books just as easily as users with sight.

This is an exciting step for us in our mission to organize the world's information and make it universally accessible and useful. To learn more about Google's efforts to make books and other digitized content more accessible to everyone, check out Dr. Raman's full post.

Comments.

Access for the visually impaired is important and long overdue. But the new plain-text layer also provides access for cutting and pasting, text-mining, and other forms of processing. Making these books accessible as texts, and not merely as images, is a breakthrough for all users.
Google always had plain text behind the scenes for searching. That is, it had to perform OCR on the scanned images in order to build the search index which was the raison d��tre of the project. I've been assuming that Google didn't want to provide OA to the text versions because it didn't want to provide crawlable texts for rival search engines to index. If so, then what's changed? Are the newly accessible texts inferior to the versions Google uses to build its index? (If so, how?) Or has Google decided, like the OCA, that it doesn't need to be the exclusive indexer of the books it digitizes?

Posted by Peter Suber at 7/04/2007 11:31:00 AM.

The open access movement:
Putting peer-reviewed scientific and scholarly literature on the internet. Making it available free of charge and free of most copyright and licensing restrictions. Removing the barriers to serious research.

Why the OAN volume has been low since January 16, 2010

Why I curtailed my blogging on July 1, 2009

I recommend the OA tracking project (OATP) as the best way to stay on top of new OA developments. You can read the OATP feed on a blog-like web page or subscribe to it by RSS, email, or Twitter. You can also help build the feed by tagging new developments you encounter.