Monday, December 17, 2007

Science Commons protocol for open data

Announcing the Protocol for Implementing Open Access Data, Science Commons blog, December 16, 2007. 

Today, in conjunction with the Creative Commons 5th Birthday celebration, Science Commons announces the Protocol for Implementing Open Access Data (”the Protocol”).

The Protocol is a method for ensuring that scientific databases can be legally integrated with one another. The Protocol is built on the public domain status of data in many countries (including the United States) and provides legal certainty to both data deposit and data use. The protocol is not a license or legal tool in itself, but instead a methodology for a) creating such legal tools and b) marking data already in the public domain for machine-assisted discovery.

We built the Protocol after a year- long process of meetings and consultations with a broad set of stakeholders, including representatives of the geospatial and biodiversity science communities. We solicited input from international representatives from China, Uganda, Brazil, Japan, France, Netherlands, Germany, Italy, the United Kingdom, Colombi, Peru, Belgium, Catalonia and Spain.

We expect to convert this work into a working group with founding members from our existing communities of practice. However, the world is moving very quickly in terms of data production, and as such we created the Protocol as a guide and as a tool to bring together the existing data licensing regimes into a single space.

As part of that decision, Science Commons has worked with data licensing thought leaders and is pleased to announce partnerships with Jordan Hatcher, the lawyer behind the Open Database License; Talis, the company behind the Open Database License process; and the Open Knowledge Foundation, creators of the Open Knowledge Definition.

Jordan has drafted the Open Data Commons Public Domain Dedication and License - the first legal tool to fully implement the Protocol. It is available at his Web site. This draft is remarkable not just for the Public Domain Dedication but for the encoding of scholarly and scientific norms into a standalone, non-legal document. This is a key element of the Protocol and a major milestone in the fight for Open Access data. Talis, a company with a strong history in the open science data movement, played a key role in birthing Jordan’s work, and we’re pleased to work with them as well.

We are also pleased to announce that the Open Knowledge Foundation has certified the Protocol as conforming to the Open Knowledge Definition. We think it’s important to avoid legal fragmentation at the early stages, and that one way to avoid that fragmentation is to work with the existing thought leaders like the OKF.

We will be launching a wiki for comments on the Protocol soon, and will announce a strategy for versioning the Protocol in 2008.

From the protocol itself:

This memo provides information for the Internet community interested in distributing data or databases under an “open access” structure. There are several definitions of “open” and “open access” on the Internet, including the Open Knowledge Definition and the Budapest Declaration on Open Access; the protocol laid out herein is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases....

This memo...will be submitted to the World Wide Web Consortium for consideration....

The motivation behind this memorandum is interoperability of scientific data....


  • This protocol is much needed and well conceived.  It's a very good sign that so many key stakeholders are already part of the process and support it.  (There's little chance of interoperable data without cooperating stakeholders.)  Kudos to John Wilbanks and Science Commons for this feat of coordination and problem-solving.
  • It's also persuasive.  For example, I've been thinking that any open data standard would probably have to require attribution, if only to recruit participating researchers.  But the protocol cogently argues that such a requirement would result in "attribution stacking" (e.g. crediting "40,000 data depositors in the event of a query across 40,000 data sets") and violate the "principle of low transaction costs."  It's equally cogent in arguing that open data should not limit re-use with "share-alike" and similar contractual restrictions.

Update.  Here's a related announcement from Jordan Hatcher, mentioned in the Science Commons post above:  "We’ve created a site [a blog, Open Data Commons] solely for the Open Data Commons project."  From the inaugural post on the new blog:

The new Open Data Commons set of legal tools are now available for comment. There are two documents for you to review:

Public Domain Dedication & Licence (PDDL)

Community Norms

We’ve created a FAQ for some of the initial questions here. A FAQ addressing some of the in-depth legal issues of the PDDL will be forthcoming.

The current draft PDDL is compliant with the newly released Science Commons draft protocol for the “Open Access Data Mark” and with the Open Knowledge Foundation’s Open Definition....

Update. Also see John Wilbanks' blog post on "the personal story that led us [at Science Commons] to the position that we reached."

Update (12/20/07). Also see the new Database Protocol FAQ.