Open Access News

News from the open access movement


Sunday, June 22, 2008

CrystalEye, "an exemplar of open data"

Peter Murray-Rust, CrystalEye - an example of a data repository, petermr’s blog, June 22, 2008.
I shall be writing a number of posts about (chemical) crystallography - which may be of wider interest to those interested in data quality assessment, robotic harvesting, robotic calculation, hyperlinking, repositories and the free access to scientific data. I’ll start by talking abour CrystalEye - what it is and where it may be going.

We are generally interested in the area of data-driven, or data-enabled science in the scientific “long-tail”. Can machines extract useful information from the hetereogeneous mass of data that increases daily. And - because we are chemists - we have chosen to do this in chemistry, although it has serious problems of restrictive access to data. The area which has turned out to be most fruiful has been chemical crystallography - the determination of the structures of “small molecules” by diffraction methods. In this we pay great tribute to the International Union of Crystallography which is probably unsurpassed in its commitment to data quality and data preservation. ...

The basic questions included:
  • Can machines aggregrate enough public data to be useful? ... The answer is definitely yes ...
  • Is the data of high enough quality to do useful work with? This is difficult to answer ...
  • Is the data of scientific value? ... [T]he answer is generally “yes”. ...
We chose to expose the aggregated data to the world as “Open Data” since we feel it is fundamentally Open. ...

The initial reason for exposing CrystalEye was (a) because Nick has created a valuable resource in its own right (b) as an exemplar of Open Data. We are happy for anyone to do whatever they wish (subject to acknowledging us) but we make no claims for the data or its value. ...

We also see CrystalEye as a starting point for the Departmental or domain repository for chemistry, and perhaps more widely for long-tail scientific data. ...

Finally there is the emerging concern over whether crystallographic data (a) should be and (b) is free and Open. There is no technical reason against this - the costs are so marginal that they are negligible. It’s simply a question of allowing or requiring another piece of supplemental information. ...