Thursday, March 22, 2007

Beyond OA for reading to OA for processing

Tim O'Reilly, How Google Books is Changing Academic History, O'Reilly Radar, March 22, 2007.  Excerpt:

Peter Brantley writes in email: "a Berkeley grad student disses the experience of the Berkeley library system and lauds Google." ...

It's important to remember, though, that finding and reading out of print books is just the beginning of the benefits of digitization. (That's why it's important for at least the out-of-copyright books to be available in more open formats.) Last year, Gregory Crane asked "What Can You Do With a Million Books?," and pointed out that things get most interesting when you can compute against this corpus of books. Computing doesn't just mean measuring or counting (though those things may also be useful). It may mean reshaping in creative, unexpected ways.

At O'Reilly, we've done things like create automated content statistics, extracted just the examples so they could be used for code search -- both by us, and by other code search engines. We're all just taking baby steps, though.

The clearest example I've yet seen of the possibilities of using digital technology to breathe new life into old material remains David Rumsey's work with maps. Once he'd digitized his collection of 30,000 old maps, he was able to do things like georectify them, mapping them to a consistent size and coordinate space so that maps from different eras could be overlaid on each other, creating timelines showing the evolution of cities and landscapes. This is an awesome demonstration of why access to otherwise unavailable materials (the creative commons Lessig talks about) leads to the creation of new value.

Bringing this thought round full circle, academic historians have long been immersed in this kind of creative re-use, but as Jo Guldi wrote in the blog post that I quoted from above, their work is being turbocharged by online access and book search.