Open Access News

News from the open access movement


Monday, October 13, 2008

More on the consortial repository of Google-scanned books

Jeffrey Young, University Libraries in Google Project to Offer Backup Digital Library, Chronicle of Higher Education, October 13, 2008. Excerpt:

A group of major universities has been quietly working for the past two years to build one of the largest online collections of books ever assembled, by pooling the millions of volumes that Google has scanned in its partnership with university libraries.

One of the most important functions of the project, say its leaders, who plan to unveil the giant library today, is to create a stable backup of the digital books should Google go bankrupt or lose interest in the book-searching business.

The project is called HathiTrust, and so far it consists of the members of the Committee on Institutional Cooperation, a consortium of the 11 universities in the Big Ten Conference and the University of Chicago, and the 11 campuses in the University of California system. The University of Virginia is joining the project, it will be announced today, and officials hope to bring in other colleges as well.

All of the member universities participate in Google's ambitious effort to work with major libraries and with publishers to scan all the world's books....[E]ach library gets a digital copy of each of its scanned volumes....

Each university library originally planned to manage the digital copies of the scanned books on its own, but through HathiTrust, library officials are now working together to create a shared online collection....

Already HathiTrust contains the full text of more than two million books scanned by Google.

But there is an important catch. Because most of the millions of books are still under copyright protection, the libraries cannot offer the full text of the books to people off their campuses, though they can reveal details like how many pages of a given volume contain any passage that a user searches for.

Google follows a similar policy for books it scans, allowing only brief sections of copyrighted works to be displayed in search results. Even so, publishing groups have sued Google for making digital copies of books available without their permission....

Only about 16 percent of the books in HathiTrust —or about 327,000 volumes— are out of copyright so that their full text can be delivered to all readers....

[John P. Wilkin, an associate university librarian for the University of Michigan and executive director of HathiTrust] said a search engine will be added to the project's home page soon, and that members are quickly working to "ingest" their digital books into the shared library....

Google has refused to release...details [of which books it has scanned], but HathiTrust publishes online a list, updated daily, of what is in its collection.

The librarians plan to work together to create new services to search and display the digital books that Google might not provide for its copies....

So why call the project "Hathi" (pronounced hah-TEE)—the Hindi word for elephant?  "The name resonated really well because elephants remember, elephants are large, and elephants are strong," said Bradley C. Wheeler, chief information officer at Indiana University system....

PS:  For background, see our past post on the HathiTrust.

Update (10/13/08).  Also see press releases from the HathiTrust, Indiana University, and the University of Michigan.

Update (10/24/08).  Also see Adam Hodgkin's comments.  Excerpt:

...Google Book Search is still the elephant in the library, but the existence of this consortium shows two things. Three years ago major libraries were saying that they could never do the kind of thing which GBS contemplates. Now several of them are collaborating in a much more ambitious project than anybody would have dreamed of in 2002. When we have a universal library in the 'computing cloud' there will be not one, but many literary digital platforms. There will be a whole herd of digital literary elephants kicking around. There will be a lot of platforms to choose from, partly because there is a lot to be done and different ways of doing it. The second and immensely encouraging feature of this new consortium is that it is obviously condoned if not encouraged by Google. The members of the consortium are almost all working with Google and it is to be concluded that Google is keen to see 'collaborator/competitors' in the digital book space that Google has pioneered. Good for Google and good for all of us. The universal library will be open because there will be a herd of elephants. Google may be the dominant male, but not a monopoly.