Open Access News

News from the open access movement

Wednesday, October 17, 2007

Does green OA support text- and data mining?

Stevan Harnad, How Green Open Access Supports Text- and Data-Mining, Open Access Archivangelism, October 16, 2007.  This is a response to Peter Murray-Rust's post from earlier the same day, Why Green Open Access does not support text- and data-mining, which you should read first.  (I blogged PMR's post yesterday but without an excerpt.)  Excerpt:

Summary:  Data-mining robots like SciBorg can harvest Green OA full-texts, self-archived in their authors' Institutional Repositories (IRs) and “repurpose” them for better functionality. The postprint is the author’s own refereed, revised final draft. Green journal publishers endorse author posting of postprints in their own IR, free for all. The author can certainly revise that draft further, making additional corrections, updates and enhancements, including marking it up in XML and adding comments. Those corrections need not be done by the author's own hands: They could be done by a graduate student, a collaborator, a secretary, or a hired hand. The author could also have SciBorg “repurpose” his postprint -- under one trivial condition, easily fulfilled, which is that the locus of the enhanced postprint, the URL from which users must download it, remains the author’s own IR, not a 3rd-party website. It is not only unnecessary but would be highly inimical to the progress of Green OA mandates to insist instead that the Green publisher’s endorsement to self-archive the postprint in the author’s IR is "not enough" for full-blooded OA — that the author must also successfully negotiate with the publisher the retention of the right to assign to 3rd-party harvesters like SciBorg the right to publish a “derivative work” derived from the author’s postprint.