Thursday, March 06, 2008

OA repositories for scientific data

Peter Murray-Rust, Repositories for Scientific Data (at OR08), A Scientist and the Web, March 4, 2008. An abstract of a forthcoming keynote address at Open Repositories 2008 (April 1-4, 2008, Southampton).

Scientists are producing data at an ever increasing rate (”the data deluge”) due to automated instruments, image capture and simulation tools. This holds the promise of “data-driven science” where scientific discovery can be made by linking or mining existing data. The reality is, unfortunately, that almost all this data is lost. Although some publishers welcome data as an adjunct to “fulltext”, many do not and most do not have the domain expertise to store and curate the data. And although “big science” (such as high energy physics, geospatial imaging, genomics and structural biology) can often provide domain repositories (e.g. in bioinformatics) most science (”the long tail”) cannot.

There is an urgent need to address this problem. Current Institutional Repositories (IRs) are geared to storing and disseminating scholarly manuscripts and while some are prepared to accept other digital artefacts the practice is fragmented and does not scale. We need to define “Data Repositories” (DRs) which serve the interests of the scientists directly. This is highly domain-dependent and there is no one-size-fits-all solution. ...