Tuesday, January 22, 2008

Richard Poynder interviews Peter Murray-Rust

In the latest installment of his Open Access Interviews, Richard Poynder interviews Peter Murray-Rust, January 21, 2008.  This is another superb, wide-ranging Poynder interview, covering the importance of separate treatment of open data (OD) and OA for texts, the benefits of OD for research, technical and legal barriers to text- and data-mining, publishers who claim copyright on data,  licensing OD, the distinction between price barriers and permission barriers,  the difficulty of determining publisher policies on OD and OA, the need for a central organization to pursue OD, and the deep connections between OD and open source software.  From the introduction:

Peter Murray-Rust is a committed advocate of Open Access (OA). He is, however, a disappointed one. He is disappointed not because so few researchers are willing to self-archive their scholarly papers on the Web, not because it is proving so hard to persuade funders and research institutions to introduce Open Access mandates, but because of a failing he sees within the movement itself. Out of his disappointment, however, has come a new movement: the Open Data movement.

As a Reader in molecular informatics at the University of Cambridge Murray-Rust is interested in scholarly papers less for their textual content, more for the raw data contained within them — the graphs and tables, the molecular structures, the spectral and crystallography data, the photographs of proteins, and all the other factual information that litters science papers.

As such, much of Murray-Rust's time is spent not on reading the scholarly literature, but mining it — using various software tools to automatically extract the "embedded data" contained in the tables, the charts, and the images in science papers, and capturing the "supplemental information" that invariably accompanies the papers. After aggregating all these data Murray-Rust will compare them, input them into programs, use them to create predictive models, and reuse them for a variety of different purposes.

In short, Murray-Rust is working at the frontline of what has been dubbed Science 2.0, an online interactive environment where a great deal of the information used is more likely to have been discovered, aggregated and distributed by software and machines than it is by humans; an environment where data are constantly used and reused — pumped through new tools like RSS feeds, and displayed in mashups, wikis, and the various other tools developing around Open Notebook Science.

Murray-Rust's ultimate goal is to create and exploit what he calls the chemical semantic web — a web that would assume most scientific information was unencumbered by proprietary interests, and able to be freely shared and exchanged.

In practice, however, mining the scholarly literature remains a difficult and risky activity, explains Murray-Rust — not so much because the technology is still in its infancy, but because scholarly publishers routinely appropriate the content of research papers, and then lock it up behind financial firewalls and prohibit its reuse.

Assuming that the Open Access movement was committed to removing these barriers, Murray-Rust became an OA advocate. After all, as leading OA advocate Peter Suber puts it, Open Access implies scholarly literature that is "digital, online, free of charge, and free of most copyright and licensing restrictions". That, says Murray-Rust, is what is needed to build the semantic web.

But while the definition of Open Access agreed at the launch of the 2001 Budapest Open Access Initiative (BOAI) states that any paper made Open Access must be free of copyright and licensing restrictions, Murray-Rust discovered that in most cases publishers and authors still fail to provide the necessary permissions when making papers Open Access. Where a paper is flagged as being Open Access, reuse is often prohibited. And even where there is no specific prohibition, usage conditions are frequently not specified, effectively placing the paper into licensing limbo....

Further limiting what he can do, adds Murray-Rust, traditional subscription publishers like the American Chemical Society and Wiley explicitly forbid text mining of papers they publish. At the same time these publishers insist that authors not only sign over the copyright in the paper, but also ownership of the supplemental data, despite the fact that factual data is not subject to copyright.

After failing to persuade Open Access advocates to hear his concerns, Murray-Rust began to direct his energies to what he calls the Open Data movement, for which he is now a leading advocate. While he remains an advocate for OA, he explains, he has come to believe that the issue of Open Data needs to be addressed separately. For where the Open Access movement is concerned only with ensuring that scholarly papers are human readable, the Open Data movement requires that they are also machine readable. And since Open Data implies reuse, it is vital that licences are provided that specifically permit this....