Tuesday, April 28, 2009

Defining open data

Peter Murray-Rust, BioIT in Boston: What is Open?, A Scientist and the Web, April 27, 2009.

... [T]he fundamental postulate of Openness is: ANY barrier to access and re-use, however small and seemingly trivial COMPLETELY destroys public semantic data. ...

Why am I so insistent on this? I’ll leave the moral and ethical arguments aside here and concentrate on the technical aspects. ...

There are many data resources which are described as “Open” but they fail in one or more aspects. The commonest failures are:

  • to expose only part of the data. A database system with a query interface is normally not Open Data even if individual items can be downloaded without barrier. It is generally impossible to extract the whole work ...
  • to limit the amount downloaded ...
  • To forbid re-use ...
  • To require access through specific technology. A search form limits the access.
  • To require any form of signin, even if free. Robots are illiterate in this aspect
  • To restrict purpose of re-use. Thus CC-NC (“no commercial reuse”) is NOT OKF-compliant
  • To fail to provide a clear statement that the data are open and comply with the Open Knowledge definition. It’s almost universal that data are NOT labelled as Open. This is easy to fix – just add the OKF’s tags

So the message is simple, though it will take time to spread: Use the OKF definition for all your data and tag it as such

Update. See also Where do we get Semantic Open Data?.