Friday, June 08, 2007

Lorcan Dempsey on the CIC-Google deal

Lorcan Dempsey, Systemic change: CIC and Google, Lorcan Dempsey's weblog, June 6, 2007.  (Thanks to Charles Bailey.)  Excerpt:

Today Google and CIC announce an agreement to digitize ten million volumes across the CIC libraries....The CIC announcement is interesting for several reasons:

  • It is a shared effort across a major group of libraries with significant collections. There appears to be strong CIC institutional commitment. Of course, CIC has a history of collaboratively sourced activities and this 'pooling' model makes increasing sense given the necessary policy and service challenges that need to be addressed....For some things, scale matters.
  • The libraries have a shared approach to managing the digital copies based on shared infrastructure at the University of Michigan, and serving them up to their user communities....
  • Google recently advertized for somebody to work on collection development and we seem to be seeing a stronger focus in this area. Collecting areas of importance within each library [pdf] have been identified for attention.

This initiative in turn prompts some more general thoughts about access:

  • One of the most valuable features of the Google initiative is that it digitizes book content, allowing fine-grained discovery over topics, people, places and so on. Of course this presents interesting questions about indexing, retrieval, ranking, and presentation but the advantage of having this access seems clear. It drives use and sales, and it supports enquiry. Without it, the book literature is less accessible than the web literature.
  • However, as we are beginning to see on Google Book Search, we are really going beyond 'retrieval as we have known it' in significant ways. Google is mining its assembled resources - in Scholar, in web pages, in books - to create relationships between items and to identify people and places. So we are seeing related editions pulled together, items associated with reviews, items associated with items to which they refer, and so on. As the mass of material grows and as approaches are refined this service will get better. And it will get better in ways that are very difficult for other parties to emulate.
  • Currently this material is made available within the Google destination site. Google is an advertizing engine and its approach depends on aggregating attention for adverts. This approach may be difficult to deploy within a more 'data services' approach where others - especially the partners - have remixable access to content and services. However, the 'utility' value of this resource will be diminished if it is not made available in this way....(See the related discussion about the search API.)
  • This type of access seems especially important for the partner libraries. In the early days of this activity there was some discussion of the types of services which would be built on top of the digitized books by the libraries. However, it is difficult, and maybe not very sensible, for the libraries to individually invest in some types of service development. An important factor here is that they cannot benefit from the network effects that arise in larger collections and so are limited in the range of service that they could individually develop. This points again to issues of collaborative sourcing.

For me, the CIC announcement moves the conversation about mass digitization to another level. The Google relationship with libraries has seemed like an interesting initiative. But it now seems plausible to think that we are looking at systemic change in how we engage with particular classes of material. Which in turn will cause us to look at the way in which the systemwide library resource is organized. It touches on so much.

  • Disclosure, discovery, delivery.....
  • Collective collection....
  • Copyright....
  • Knowledge organization....
  • Preservation....