Open Access News

News from the open access movement

Friday, August 17, 2007

Brewster Kahle on the OCA, Google, book-scanning, and libraries

Andrew Richard Albanese, Scan this book! Library Journal, August 17, 2007.

….To merge the texts and traditions of our print past and our web future, explains visionary technologist Brewster Kahle, represents a truly historic moment for our culture….

Despite all the librarians who eagerly identify themselves as book lovers, it's hard not to notice that books have had, well, a rather rocky start to the Internet Age. In the first iteration of a World Wide Web, they remained all but hidden on library shelves, and, unsurprisingly, circulation numbers dipped. That led some to surmise that the book was languishing in the throes of obsolescence. But as search technology improved and books became more discoverable through online library catalogs and keyword searches on the wider web, circulation surged back, by double-digit margins in many libraries. Overnight, books that went untouched for years were getting into patrons' hands again. Almost any librarian today will tell you their book circulation is going strong. The question now, however, is where is it going? …

Against this backdrop, in 2005 the forward-thinking Kahle launched the Open Content Alliance (OCA). An alternative scan plan to Google's controversial library project, Kahle's vision of putting books online embraces the values of openness central to librarianship and vital to the work of libraries….

OCA now counts 40 members and “regional scanning centers” in six cities scanning up to 12,000 books a month, over four million pages. For 10� a page, Kahle says, the OCA can now bring public domain books and other materials online, nondestructively, and offer them to the world. And unlike Google's plan, there are no restrictions on public domain books scanned by OCA members. Users are not forced to use proprietary interfaces; OCA scans are not hidden from rival search engines….

LJ recently visited with Kahle in San Francisco to get his take on the challenges of getting books on the web, the progress of the Open Content Alliance, and the intertwined future of books and libraries….

You've been critical of Google's library partnerships. What is Google doing right and/or wrong?

Two problems: one is perpetual restrictions on the public domain. Another is that these negotiations are all going on in secret. It shouldn't take a subpoena to get information from a librarian. But in this new world order, both perpetual restrictions and gag orders are being put in place on libraries by a corporate enterprise. The idea of making all books accessible online in new and different ways is all good news. But if you do this in a way that the materials that have been housed in libraries for centuries are made available only through one corporate interface, that is an Orwellian future.

Are you surprised to see libraries signing up with Google under restrictive terms?

I'm not surprised that a corporation wants to be the only place someone can get information, and I was not terribly surprised that some libraries went forward with this before they understood how they could do it on their own and how much it would cost to do it for themselves, not only to do the digitization but also to create services around these collections. I was surprised to see more libraries jumping on the Google bandwagon after demonstrating how libraries can do this and after actually doing it with the Open Content Alliance….

If libraries had the organization and the will, could we scan our collections ourselves, without such restrictions?

Yes. We've achieved mass digitization at 10� a page, on average about $30 a book. That includes high-resolution color imaging, optical character recognition, and compression and packaging into PDFs. And all of it open, meaning you can download and use these books in bulk. Take a million-book library, which is larger than most libraries in the world. What would it cost to make a million-book library online? At 10� a page, 300 pages in a book, it would price out at about $30 million, costs that could be spread out over many institutions. If the library market in the United States is about $12 billion a year, $3 billion to $4 billion of which goes to publishers' products, $30 million is about one percent of one year's budget. We can do this.

Google and OCA would seem to be natural allies. Why hasn't that alliance happened?

With the OCA, we originally tried to figure out whether to put in some restrictions in such a way that Google would come onboard. We found that when we put some restrictions in, the commercial guys just wanted even more. The public domain is small enough as it stands, we thought, let's not clobber it again as it goes digital. Let's let people use the public domain for whatever.

Microsoft was involved with OCA but hasn't it since launched its own, more restrictive book project?

At the OCA launch, Microsoft committed to scanning a lot of books under OCA principles, but it changed after a year of scanning. It put in more restrictions that make it incompatible with OCA, such as it doesn't want its books surfaced in other commercial services. We're sad to see Microsoft putting more restrictions on its scans. But this is a reaction to the growing environment: if Google would take its restrictions off the public domain, I'm sure Microsoft would follow.

Google's pitch to libraries can be awfully attractive, and it is so ubiquitous. How does the OCA compete for library partners?

Revolutions aren't started by majorities. They come from leaders who see things that need to be done. Boston Public Library, for example, has been courted by Google, but it has said it is going to remain open. The Library of Congress also announced it is going to work with the Open Content Alliance. That's what it takes….

You mention digital rights–managed (DRM) interfaces. How much of a stumbling block is DRM for books online?

DRM used to be called copy protection, and it didn't work for the software industry, it's not working for music, and it won't work for books. It is a bad idea that contributes to the demise of an industry. In the software industry, it was a complete failure….I hope the book industry doesn't feel it needs to have centralized copy protection schemes. It's a trap.

How challenging are copyright issues, such as orphan works, in getting our literary past online and accessible?

It's bad out there….We're showing the results of decades of successful lobbyists with very narrow interests hijacking the information age….

As a “digital librarian” and an Internet pioneer, how do you view the library system?

I see the library system in this country as a $12 billion industry dedicated to preservation and access of materials that are not mediated through a corporate experience. You don't have to sign a nondisclosure form to come up with a new idea in a library. In libraries, materials are preserved in original form, uncensored. The alternative is that the materials people learn from are forever mediated by a relatively small number of commercial companies in terms of selection and presentation. This is one of the biggest issues facing libraries in the future: what services will they perform, and what services will be performed by companies or by nonprofits acting like companies. If all content is moderated by a few companies in the digital world, we'll have a giant bookstore rather than a library system….

Posted by Peter Suber at 8/17/2007 11:43:00 AM.