Wednesday, August 01, 2007

Overcoming hypopublication

Peter Murray-Rust, Cyberscience: Changing the business model for access to data, A Scientist and the Web, July 31, 2007.  Excerpt:

I have been reviewing the availability of Open Data for cyberscience - concentrating recently on crystallography and chemical spectra as examples. I’ll propose a new business model here, still very ill-formed and I welcome comments. It applies particularly to disciplines where the data are collected in a fragmented manner rather than being coordinated as in, for example, survey of the earth or sky. I call this fragmentation “hypopublication”.  However the Internet has the power to pull together this fragmentation if the following conditions are met:

  • the data are fully Open and exposed. There must be no cost, no impediment to access, no registration (even if free), no forms to fill in.
  • the data must conform to a published standard and the software to manage that standard must be Openly available (almost necessarily Open Source). The metadata should be Open.
  • the exposing sites must be robot-friendly (and in return the robots should be courteous).

Such a state nearly exists in modern crystallography….

[C]ouldn’t this be a model for all of science? As I have posted recently I’m going to write to the editors of Elsevier’s Tetrahedron suggesting that they make all their crystallographic data available Openly. They agree it’s not their copyright, so it’s just a question of how to do it - files on a website shouldn’t be a major expense.

And funders should encourage this. If you are urging authors and journals to publish Open full-text, please extend this to data. Yes, there are some technical difficulties in some cases such as metadata, complexity and size but they probably aren’t too scary. And in any case the community will help work out how to use them.