Open Access News

News from the open access movement

Thursday, August 09, 2007

Interview with Open Library's Aaron Swartz

Scott McLemee, Open Library, Inside Higher Ed, August 8, 2007.  Excerpt:

Open Library is a new online tool for finding information about books – even (perhaps especially) for titles that are out-of-print, scarce, or likely to find one reader per decade, if even that....If a text is available in digital format, there is a link. you to it. Citations and excerpts from reviews will be available. Likewise, cross-references to other works on related topics. A user of Open Library can see the cover of the book and, in some cases, search the contents....

The basic framework is being established by my appallingly accomplished young friend Aaron Swartz — who, at the age of 21, has already helped create RSS (that was in his early teens), published a couple of computer-science papers, and developed Infogami, a system enabling his digitally clueless elders to set up their own websites.He studied sociology as an undergraduate at Stanford University, presumably in his spare time....

Q: How is Open Library funded? Are you working on it full time? And how many people are involved in the project?

A: It’s currently being funded by the Internet Archive, with the help of some state and federal library grants. We have some volunteers, but also about 5 people working full-time (a couple programmers, a designer, and a product manager).

Q: What will Open Library offer that you can’t already find online? What was missing from the existing array of online book-data resources – WorldCat, Google Books, Amazon, etc. – that makes it worthwhile to create a new one?

A: ...I’m often looking for interesting books on an obscure topic. I can look on Amazon, but its coverage of out-of-print books is pretty poor. (In my experience, most of the really interesting books are out of print.) I can search an academic library or WorldCat, but the quality of data is pretty weak — you can get basic bibliographic info, but no reviews and weak search and a painful interface and most require a subscription.

So I wanted to build a site where one could more easily find those hidden great books, by combining all the data we have on them in one place and letting the people who love them go back and annotate and highlight them.

Q: With any Web 2.0 project, the question of safeguards comes up. Are any built in? I mean, to keep people from going through and systematically attributing the complete works of Shakespeare to Francis Bacon, or whatever.

A: Our plan is to leave it open and then lock things down as need be....

Q: Will you be asking permission before incorporating data from, say, an academic library’s online catalog?

A: Yes, we’re talking to the academic libraries to make deals on how to import their catalogs. Our main pitch so far has been that this is an opportunity to contribute to a public commons — contribute your library catalog to the public, and not only make it available to interested library users everywhere, but also contribute to a system where you’ll get back everyone else’s work, just like libraries have done with RLG.

Q: Open Library will also serve as a central directory for books available in digital formats. Some such material is freely available to everyone (e.g., the Project Guttenberg editions). And some of it has more limited access. Will you link to the latter? And do you have a policy or opinion about dealing with Google Books?

A: Yes, we hope to link to everything interesting — free or not, although obviously we prefer free and can do more with it. We’re planning to link to Google Books and we’re hoping we can get copies of their public domain books.

Q: Do you have a long-term plan to make digitizing books part of the Open Library project? Or does it make more sense to leave that kind of initiative to others?

A: The Internet Archive has a big book digitization project....We hope Open Library can raise money to increase their scanning....

Q: So what is your sense of the master plan for this project? The future course of development?

A: We’re taking it step by step. Our first goal is to get catalog information for every book — a big project in itself....

After that, we want to work on improving the book-reading interface for books that we have scans of. We’re hoping to make the scanned text into a wiki as well, so that people can fix typos and correct errors in our processing (OCR) of the scan....One idea is a “Scan this book” button on every out-of-copyright book, where for $50 to $100, we’ll page the book from a library, deliver it to the scanners, and then email you a PDF of the book and put the full text online, with a little nameplate thanking you for funding it.

And then, of course, we want to expand beyond just books. We’re eager to do the same thing with journal articles: one open site where we list every journal article, all the journal articles by a particular author, sorts by subject and topic, the abstracts and references, and links to places where you can find a full text copy....