Sunday, July 27, 2008

More on Google v. OCA

Jean-Claude Guédon, Who Will Digitize the World's Books? New York Review of Books, August 14, 2008.  A letter to the editor in response to Robert Darnton, The Library in the New Age (June 12, 2008).  Excerpt:

...Robert Darnton extols the value of Google's project to digitize the collections of major research libraries. As he puts it, it is a way to make "all book learning available to all people." While there is much truth in this statement, there are some important considerations about the Google project that should be raised....

[I]t is important to clarify what Google is offering: it is not a digital text that the library will be able to share unconditionally with others. In its contracts with the nineteen libraries now in its consortium, Google has stipulated that the "Universal Digital Copy" of digitized books it provides must be protected from non-Google Web software; and that the number of downloads from texts digitized by Google will be limited. Only Google can aggregate collections of different libraries in order to create the larger digital database that is the most valuable part of the consortium project.

Put another way, Google has strictly limited the "computational potential" of digitized books....

With Google's digitization, for example, it is possible to conduct advanced text mining within a single library's collection; but only Google can provide access through its own Web site to the entire pool of scanned books in the nineteen libraries with which it now has contracts....

It appears that Google is striving to become the main dispenser of algorithmic power over digital books....To give a single company such a grip on the collective memory of the world, its analysis, and even its meaning is frightening to say the least.

Dozens of libraries have understood the danger of the Google Book maneuver and have joined the Open Content Alliance (OCA)...[which] seeks to promote large-scale digitization, but it does so without putting shackles on the participating libraries. Alas, the OCA has nothing like Google's deep pockets, and the recent withdrawal of Microsoft from the alliance makes the OCA's position even more difficult.

But there may be some hope in this situation. Since many different groups have an interest in the free availability of digital texts, the process of digitization itself could be distributed among a wide variety of libraries and other independent groups, much in the way of contributions to Wikipedia and Project Gutenberg. Digitization clubs could emerge not only in public libraries but in schools and museums. In short, mass digitization projects should be designed in ways that are not dependent on market-based corporations or on government subsidies, but can nevertheless profit from forms of support from either kind of institution.

Libraries can have a very important part in promoting these projects and enforcing the standards that must accompany them. In so doing, they would be acting as institutional citizens of the digital document age, and not as grateful (and somewhat passive) consumers of Google's apparent largesse.

From Robert Darnton's response:

...I share Jean-Claude Guédon's worry about the danger of one company monopolizing the "computational potential" of digitized texts, and I agree that the Open Content Alliance is a good thing. But is it an adequate alternative to Google? Grassroots digitizing may help a thousand flowers bloom....But we need to search, mine, and study millions of volumes from all the collections of our research libraries.

Libraries have accumulated those volumes at great cost over many generations, but they have kept most of them within their walls. Digital technology now makes it possible for this common intellectual heritage to come within the range of the common man and woman. Yet corporate interests, flawed copyright laws, unfair restrictions on fair use, and many other obstacles block the public's access to this public good. By removing those obstacles, the United States Congress can clear the way for a new phase in the democratization of knowledge. For my part, I think congressional action is required to align the digital landscape with the public good.