Friday, May 18, 2007

Publishers doubt the OA impact advantage

The Publishing Research Consortium has released a new report, Do Open Access Articles Have Greater Citation Impact? A critical review of the literature, May 17, 2007.  The authors are Iain D. Craig (Wiley-Blackwell), Andrew M. Plume (Elsevier), Marie E. McVeigh (Thomson Scientific), James Pringle (Thomson Scientific), and Mayur Amin (Elsevier).  See the summary paper and press release, May 17, 2007.  From the Executive Overview (in the summary paper):

  1. The last few years have seen the emergence of several Open Access options in scholarly communication which can broadly be grouped into two areas referred to as ‘Gold’ and ‘Green’ Open Access (OA). In this article we review the literature examining the relationship between OA status and citation counts of scholarly articles, and take no position on the relative value or sustainability of these communication models.
  2. Early studies showed a correlation between the free online availability or OA status of articles and higher citation counts.
  3. The authors of many of these studies implied that this correlation was causal, without due consideration of potential confounding factors.
  4. More recent investigations have applied sophisticated bibliometric methods to dissect the nature of the relationship between article OA status and citations.
  5. Three non-exclusive postulates have been proposed to account for the observed citation differences between OA and non-OA articles: an Open Access postulate, a Selection Bias postulate, and an Early View postulate.
    • The Open Access (OA) postulate suggests that authors are more likely to read, and thus cite, articles that are made available in an OA model.
    • The Selection Bias (SB) postulate suggests that the most prominent (and thus most citable) authors are more likely to make their articles available in an OA model, and that they are more likely to do so with their most important (and thus most citable) articles.
    • The Early View (EV) postulate relates only to articles posted before final journal publication, and suggests that the period between the early posting of an article (either pre-print or post-print) and the appearance of the cognate published journal article allows for earlier accrual of citations. Failing to account for this effect must necessarily give a biased result.
  6. The most rigorous study to date, conducted in the field of condensed matter physics, showed that after controlling for a clearly demonstrated Early View postulate, the remaining difference in citation counts between OA and non-OA articles is explained by the Selection Bias postulate. No evidence was found to support the OA postulate per se; i.e. article OA status alone has little or no effect on citations.
  7. As citation practices vary widely by discipline, further studies using a similarly rigorous approach are required to determine the generality of this finding in other fields of research. Such studies must account for the heterogeneous distribution of citations across any group of articles and establish the date of earliest availability of each article in the study, as citation accumulation is time sensitive.


  1. I'm not ready to evaluate this study.  But I'd point out that any attempt to identify the "most rigorous" studies to date would have to include Chawki Hajjem, Stevan Harnad, and Yves Gingras, Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact, IEEE Data Engineering Bulletin, December 2005, and Gunther Eysenbach, Citation Advantage of Open Access Articles, PLoS Biology, May 2006.  Their results confirm the OA impact advantage and disentangle it from other interpretations of the OA-citation correlation.  To decide for yourself, see Steve Hitchcock's comprehensive annotated bibliography of the studies.
  2. This excerpt from the study's conclusion shows a strained attempt to deny the role of OA in citation impact:

    Assuming that citation differences are due solely to the free availability of an article implies that many scholars working in a given discipline are currently totally unaware of important, relevant literature in their field and are unable to read and cite it. This further suggests that authors will limit their citations to those works that are readily available in favour of citations to works that are of the highest relevance. This view of citation behaviour dismisses any contributing role from long-established and robust means of scientific and scholarly communication – namely, all mechanisms of peer communication, the influence and availability of cited references, and the inherent value a given researcher will place on the content of a paper, independently of the mechanism by which it might have been retrieved.

    The authors bend over backwards to reject the more natural interpretation here:  that scholars fail to cite important, relevant literature when they don't know about it or can't access it.  It's just sloppy to say that this interpretation "dismisses any contributing role" of peer communication, existing citations, and the researcher's own estimation of a paper's quality and relevance.  All these factors undoubtedly play a part.  But after they've had their effect, we shouldn't be surprised to see that good relevant literature that is easier to find and retrieve is cited more often than good relevant literature that is harder to find and retrieve.  Or, if a careful study concluded that this view is false, then one might expect it to be more careful in summarizing the reasons why.