Open Access News

News from the open access movement


Sunday, May 25, 2008

Green OA and open data

On Thursday, Peter Murray-Rust posted some thoughts on open data in chemistry.  When I blogged them, I added this comment:

I follow and agree with all of this, with one exception:  [PMR said,] "Green Open Access is irrelevant to Open Data (I think it makes it harder, others disagree)."  I don't understand the claim or the argument, but I  imagine we'll hear more in time.... 

Today Peter responded to my comment in a blog post.  (Thanks, Peter.)  Excerpt:

Green Open Access describes a process - primarily of an author self-archiving her “paper” to an Institutional repository or their own web page. There are mechanisms for indexing repositories....

Green Open Access results in the full-text (versions may vary) of a paper being publicly visible, indefinitely, without price barriers. There are no default permissions - Green does not per se remove any permission barriers. In particular GOA does not actively support the extraction of data (of course an author may be permitted by some publishers to allow data extraction)....

GreenOA does not, in general, say anything about copyright or licences. The paper may or may not carry a publisher’s copyright, an author’s copyright and (frequently) none. There is almost never a formal licence. There is almost always no formal statement of policy for re-use....

There is no explicit mention in the GreenOA upload model for items other than the “full-text”. The repositories may provide such support but - at least in the early days - the focus was completely on full-text only....

I hope we can all agree on these and I’ll start making my argument here....

So by default GreenOA items are designed to be human-visible but without any support for Data, in any of upload, legal access and technical access. The primary goal of Stevan Harnad - expressed frequently to me and others - is that we should strive for 100% GOA compliance and that discussions on Open Data, licences and other matters are a distraction and are harmful to the GOA process. I suspect that many other do not take such a strong position. However if Open Data is irrelevant or inimical to GOA then it is hard to see GOA as supportive of Open Data.

However my main argument is that lack of support for Open Data in GOA is potentially harmful to the Open Data movement. Let’s assume that Stevan’s approach succeeds and we get 100% of papers in repositories through University mandates, funders et. al. (I’ll exclude chemistry from the argument). GOA will encourage the deposition of full-text only.

So a GreenOA paper may often be a cut-down, impoverished, version of what is available - for a price - on the publishers website. It may, and usually will, lack the supporting information (supplemental data). It will probably not reproduce any permissions that the publisher actually allows. So - if we concern ourselves with matters other than human eyeballs and fulltext - it is almost certainly a poorer resource than the one on the publisher site....

So my major concern is that GreenOA will lead to substandard processes for publishing scientific data. I’d be happy to find Repositories that insist on data upload. I doubt they are common.

So here is a challenge to the community: How many instances are there of crystallographic data (CIF) self-archived with GreenOA papers. It’s allowed to archive the data. There are enough publishers (Wiley, Elsevier, Springer) who allow GreenOA. If no-one can find examples then again I would justify the use of “irrelevant”....

Many funders (Wellcome, and we heard from Robert Kiley 8 other major UK medical funders) require ultra-strong-OA for their archival. Because they care about data. And several publishers (PLoS, BMC) also insist on CC-BY. This is, of course, great for scientific data. But it’s a long way from GreenOA.

Comments 

  • First, I generally agree with PMR's opening characterization of green OA.  I'd only add that we should distinguish green OA itself from the strategy proposal (which I do not endorse) to slow down on the pursuit of open data until we succeed with open texts.  As usual, I think we should proceed on all fronts at once.  I generally agree as well with PMR's understanding of the state of open data in OA repositories.  But in describing this state, I'd put the accent in a different place. 
  • It's true that most OA repositories today are optimized for texts and not optimized for data.  It's also true that few institutions (universities, funders, publishers) encourage or require the deposit of data files in repositories.  Finally, it's true that most OA repositories will accept data files, even if few researchers are depositing data files.  With this background, my response reduces to to two quick points:
    1. First, it doesn't follow that green OA is "irrelevant" for open data, merely that we are under-using the opportunities it provides for open data.  We shouldn't confuse researcher practices or institutional policies with repository capacities or green OA.  If under-using an opportunity made it irrelevant, then conservation would be irrelevant to climate change and green OA would be irrelevant even to text files. 
    2. Second, we have a long way to go to make most repositories as useful for data files as they are for text files.  But it doesn't follow that green OA is irrelevant or harmful for open data, merely that its capacity to help users do useful work with OA data files must continue evolving.
  • There are many projects trying, in many different ways, to make green OA even more relevant and useful for data than it is now, e.g. by increasing data deposits in repositories and allowing fuller use of data already on deposit.  For example, see ASSDA (from ANU), CESSDA (from NSD), Commons of Geographic Data (from the U of Maine), DANS (from the Royal Netherlands Academy of Arts and Sciences), LEAP (from AHDS), LinkingOpenData (from W3C), Pangaea (from a coalition of German research institutions), and StORe (from JISC).