Wednesday, September 12, 2007

Permission to harvest data from online files

Peter Murray-Rust, Nature: How much content can our robots access? A Scientist and the Web, September 12, 2007.  Excerpt:

In this blog (Copyrighted Data: replies [1], Wiley and eMolecules: unacceptable; an explanation would be welcome - [2]) , and elsewhere we have been discussing the “copyright” of factual information, or “data”. In [2] I ask a major publisher whether copyright applies to some or all of the factual scientific record they publish. So far I have had no reply. Here I ask another, Nature, who - at least through Timo Hannay - have been very helpful in discussing aspects of publication (most other publishers have been silent).

The issue arises in “supplemental data” or “supporting information” which is the factual record of the experiment - increasingly required as proof of correctness. Some major publishers (Royal Soc Chemistry, Int. Union of Crystallography, Nature) do not claim copyright over this; others such as American Chemical Society and Angewandte Chemie (Wiley) appear to do so, though I haven’t had a definitive public statement from either....

Our vision for the future is that a large part of published scientific data could be made directly machine-understandable, if the publishers collaborate in this....

So I am going to ask Nature what I can do and what I can’t. What my robots can do and what the can’t. If the answer is not “YES” to a question it is “NO” - there can be no “middle ground” for robots. If you don’t know then the answer is NO. If I have to ask for permission the answer is NO....

PMR elaborates in a follow-up post, showing the kinds of data and images he'd like to be able to harvest and re-use.