Open Access News

News from the open access movement

Tuesday, May 13, 2008

How to free your facts

Donna Wentworth, How to free your facts, Science Commons blog, May 12th, 2008.

... [W]e’re getting more emails with questions about how best to share collections of factual data. One of the most common questions: How do I mark my data explicitly as “open access” and free for anyone to use?

In general, we encourage you to choose waivers, like the Open Data Commons Public Domain Dedication and License (ODC-PDDL) or the Creative Commons CC0 waiver, rather than licenses, such as CC-BY, FDL or other licenses.

The issues surrounding how to treat factual data are complex. To help bring more clarity for those of you exploring your options, here’s a short overview of the reasons why we generally advise using waivers, prepared by Science Commons Counsel Thinh Nguyen.

Facts are (and should be) free
There is long tradition in science and law of recognizing basic facts and ideas as existing in the public domain of open discourse. At Science Commons we summarize that by saying “facts are free.”

... When Congress wrote the Copyright Act, it made sure to spell out that facts cannot be subject to copyright. ...

And there are good reasons for this. Imagine if you couldn’t reference physical constants — like the height of Mount Everest — without permission. ... We all need access to a basic pool of ideas and concepts in order to have any kind of meaningful discourse. So copyright is supposed to protect creative expression–the unique and individual ways we express ourselves–but not the invariant concepts and ideas that we need to think and carry on a conversation.

Licensing facts can cause legal uncertainty and confusion
So why is it that increasingly, especially online, there is talk about licensing factual data–assertions of rights and obligations over assertions of facts? Part of the answer is that as facts get represented in formats that look more like computer code, the impulse is to treat it like any other computer code. And that means putting a license on it. Part of the answer is that the law is still struggling with how to treat databases, and in some countries, database rights have expanded (particularly in Europe under the database directive). Other countries have loosened copyright standards to allow purely factual databases to be protected. (For a more detailed discussion of these issues, see the Science Commons paper, Freedom to Research: Keeping Scientific Data Open, Accessible, and Interoperable [PDF].)

But even if you could find a legal angle from which to impose licensing or contractual controls over factual data, why would you want to? ...

Attribution for facts can add complexity and hamper reuse
Many people cite the desire to receive attribution. In scientific papers, we have a tradition of citing sources for facts and ideas. But those traditions evolved over hundreds of years. There’s a lot of discretion and judgment that goes into deciding whom to cite and when. ... But what happens to common sense when you convert that requirement into a legal requirement? ...

Imposing licensing on data creates all kinds of unanticipated problems. If you have a database with thousands or hundreds of thousands of pieces of facts, does each fact have to come with their own attribution and licensing data? How do we aggregate and recombine such data? ... In the future, will every database need its own database of attribution? ...

This problem, which we call “attribution stacking,” can saddle science with an unbearable administrative burden. It could shut down present and future sites that aggregate and federate data from many different sources. ...

The solution: use a waiver for factual data, not a license or contract
... We think the best answer is to go back to what scientists themselves have been doing for centuries: giving attribution without legal requirements. We think Congress got it right when it excluded facts and ideas from copyright protection. And we think it should stay that way, even when those facts happen to get incorporated into databases. ...