Open Access News

News from the open access movement


Wednesday, January 02, 2008

Comparative book-scanning

Beth Ashmore and Jill E. Grogg, The Race to the Shelf Continues:  The Open Content Alliance and Amazon.com, Searcher, January 2, 2008.  Excerpt:

Internet giants such as Google, Yahoo!, Microsoft, and Amazon are in the middle of nothing short of a modern-day space race: Who can scan the most and the best books in alliance with the biggest and brightest libraries in the U.S. — nay, the world! — while simultaneously providing print on demand, “find in a library,” and “buy the book” links as well? ...

[T]he Open Content Alliance, or OCA, is giving Google a run for its money. OCA comes armed with an open access philosophy and its own impressive stable of partners, including Yahoo! and, at least initially, Microsoft. Amazon, the dark horse in the race, as scanning and making books available for free online would seem antithetical to its book-selling roots, has gotten into the act, offering to partner with libraries to help scan and sell rare and hard-to-find books from library collections. Under Amazon’s model, the libraries retain their own digital copies along with a portion of any print-on-demand profits. Ultimately, librarians now have choices when it comes to large-scale digitization partnerships....

OCA’s approach to this process has two major differences that set it apart from the Google Book Search Library Project: no scanning of in-copyright materials from library collections (at least not yet) and open access is the guiding principle — meaning that even Google itself could (and does) crawl titles from the OCA repository....

A relatively new and unique [OCA] partner is the Biodiversity Heritage Library, a cooperative project of the American Museum of Natural History, Harvard University Botany Libraries, Ernst Mayr Library of the Museum of Comparative Zoology, Missouri Botanical Garden, Natural History Museum–London, The New York Botanical Garden, Royal Botanic Gardens in Kew, and Smithsonian Institution Libraries. Kahle is particularly proud of this partnership as it represents a trend that could see other disciplines banning together to bring a wealth of knowledge on a particular topic to the open access world. As Kahle explains: “This is a whole branch of science deciding to go open … it is a massive program to digitize tens of millions of pages, basically all of the literature about species. This is important to have in the open because it can be repatriated to the developing countries that actually have these organisms, as well as making it possible to do data mining research on it … It is a commitment of the major natural history museums, natural history libraries and botanical gardens to go and make the information about species public.”

So, why would a librarian choose to go with the OCA over the other partners currently available? Two words: open access....

Another selling point of OCA is its affiliation with the Internet Archive....

OCA may not have the speed or financial resources of Google Book Search to whisk away a library’s holdings and scan them. Nor can OCA scan collections for free, like Google, and we all know how seductive free can be to budget-stretched libraries. OCA is a decidedly community-based effort. It represents a model for the future of digitization efforts that appears viable, provided libraries can cover the associated costs....

Kahle even sees a future for OCA in copyrighted works: “Our approach at the Internet Archive is to start with out of copyright and then move into orphan works, then out-of-print and then in-print. I’m hoping that by the time we get to in-print commercial publishers, we’ll have moved along to help promote their books online and allow them to be downloaded.” ...

In the end, Kahle believes that the OCA’s survival and attraction may lie in its ability to provide the service layers that users require. “This is public domain material. Have the public domain material stay in the public domain and have organizations compete on the service layers. This is the architecture of the World Wide Web.” ...

In 2006, Amazon’s Back in Print initiative demonstrated how rights owners of out-of-print titles could get their titles available through print on demand (POD) via Amazon’s BookSurge division, acquired in 2005. However, it was not until Amazon announced that it would be working through its BookSurge division with Kirtas Technologies and libraries to identify these out-of-print, out-of-copyright titles and add them to BookSurge’s POD service that the library community became active partners....

Linda Becker, vice president for sales and marketing at Kirtas Technologies, Inc., further explains Kirtas’ role: “Customers have two choices. One, they could send us their books and we can digitize the books for them and put them on Amazon. This is what we are doing for New York Botanical Gardens and Cincinnati Public Library. Or, they could purchase a system to digitize materials themselves and send us the work. Then, we do the backend work to get it ready for print on demand and we send it on to Amazon.” This second option is the method by which Emory University, University of Maine, and Toronto Public Library are participating. Becker notes that the project was launched as a pilot in June 2007 with the five libraries mentioned above, but Kirtas is currently talking to approximately 20 more libraries.

In either option, the library is in control of what gets scanned. Beidler points out that the libraries maintain complete control and ownership of the entire process and also the end files that result from the digitization.” ...[L]ibraries put these titles in the Amazon POD program and those books are then available for Amazon customers to purchase directly through Amazon....

Libraries can choose to participate in either the POD or SearchInside! the Book programs on a title-by-title basis, but, according to Beidler, the most common scenario is for participating libraries to place titles in both of these Amazon-provided programs....

In the Amazon/BookSurge/Kirtas model, the libraries function as the publishers, so they create an imprint of sorts that identifies the contributing library as the owner of the material. This means that the library carries the burden for copyright compliance, making sure that the library either owns the copyright or the material is in the public domain. The library also sets the list price for a given title, which varies based on its value, meaning its size, rarity, and other criteria....

Beidler said that no one in this partnership has stipulated that titles must be rare, but many librarians choose to digitize those materials first, as these are the most difficult to access and at the highest risk for damage and deterioration....

Which Project to Pick? ...

Financial concerns certainly must be considered, but there are also some weighty philosophical issues that emerge. The titles included in the Google Book Search program are unavailable to other Web services. Is this a real problem or does Google’s search engine supremacy make this a nonissue? Does OCA have a sustainable model of open access in place and can it continue to scale? Would selling print-on-demand copies of your rare books through Amazon make your digitization project financially feasible? And what do we do about copyright? Some libraries have taken a stance of sorts on these types of issues, as reported in an Oct. 22, 2007, New York Times article, “Libraries Shun Deals to Place Books on Web” . In this article, the author explains the resistance of some libraries, such as the Boston Public Library and the Smithsonian Libraries, to sign up with Google....