Peter Suber, Open access, impact, and demand

This editorial was originally published in BMJ, May 14, 2005. I've made this open-access copy because the BMJ version was initially embargoed and accessible only to subscribers. The editorial comments on Jonathan Wren's article, Open access and openly accessible, from BMJ for April 12, 2005.
Open access, impact, and demand: why some authors self-archive their articles
by Peter Suber
Before Jonathan Wren's study came out (BMJ, April 12, 2005) we knew that open-access (OA) copies of scientific journal articles published in non-OA journals were a fairly small subset of the overall journal literature. Wren studied just which subset it was, and found that papers from high-impact journals were more likely to have free online copies at other locations around the web than papers from low-impact journals.
To show why this matters, and why it's puzzling, let me back up and say more about what we knew before he went to work. We knew that some scientists deposited copies of their published articles in OA repositories, a process called self-archiving. We knew that about 80% of subscription-based journals allowed their authors to do so. Hence, we knew that self-archiving was compatible with copyright and with publication in a non-OA journal. We knew that it took an author about 10 minutes to self-archive one paper. We knew that the OA archives where authors deposited articles were "interoperable", which means that they conformed to a common standard allowing users to search them all at once, as if they comprised one grand, virtual archive. We knew that there were many effective cross-archive search tools to take advantage of this interoperability. We also knew that Google, Yahoo and other mainstream or non-academic search engines were indexing these archives. We knew that there were more than 400 standard-compliant archives around the world, with new ones launched every week. We knew that, because of their wider reach and increased visibility, OA articles were cited 50-300% more often than non-OA articles from the same journal and year, although we still don't know how many authors and journals realize this. We knew, in other words, that self-archiving was a small investment for authors with a large payoff.
We knew that the practice of self-archiving was catching on. But we also knew that proponents of OA were frustrated with the slow rate of its growth. We knew that most publishing scientists were not opposed to OA but didn't know much about it or its benefits. We knew that OA proponents wanted more authors to understand that self-archiving was quick, easy, lawful, and beneficial. Meantime, authors who did practice self-archiving were steadily creating a critical mass of peer-reviewed, OA research literature.
Wren's result matters because it gives us some insight into the motivation of authors who self-archive. They already have comparatively large audiences for their articles in high-impact journals. They might be seeking even larger audiences (OA articles reach a much larger set of readers than any priced journal, in print or online). They might be showing off, posting copies to display their success in having been accepted by a prestigious journal. They might be practicing what media scholars call "push", affirmatively bringing their work to the attention of those who might not know about it, even though the pushees already had free online access to it. These are all different ways of saying that self-archiving authors were advertising themselves and their work. This is not a cynical diagnosis. On the contrary, this kind of notice can advance research in the author's niche and advance the author's career.
It's possible that many of these free online copies were posted by readers, not authors, though Wren has no data on this. For convenience, I'll assume that reader posting was the exception rather than the rule, but I concede that this might simplify the analysis.
What's puzzling is that authors who publish in low-impact journals turn to OA at lower rates. It seems that they have the same interest in enlarging their audience and impact as authors who publish in high-impact journals, if not more. One possibility is that they are not proud of where they published and fear that the advertisement would be double-edged.
Another possibility is that more high-impact journals than low-impact journals give authors permission for self-archiving. Wren didn't investigate this possibility, but he did name the 13 journals he chose to study. I looked up their self-archiving policies and found that the high-impact journals on his list *were* more likely to permit self-archiving than the low-impact journals. However, most of the high-impact journals did not permit archiving the published PDF and Wren's study was limited to free online PDFs. Hence, this alluring alternative explanation largely disappears and we're back to the puzzle.
Wren made another, even more enigmatic discovery. Articles from OA journals were just about as likely to have free online copies elsewhere online as articles from non-OA journals. What's puzzling is that authors would provide OA for articles that were already OA. One possibility is that this is still self-advertising. Authors may put copies where they are more likely to be seen, even if existing copies sufficed for readers who ran searches or knew where to look. Another possibility is that the free online copies were posted by readers, not by authors. When I've found readers copying and reposting my own articles, some tell me that they want more assured access, not knowing how long the originals would remain freely available.
Some OA journals deposit their own articles in OA repositories in order to assure their long-term preservation and accessibility. But Wren's study included no journals with such policies.
Wren's data show a steady upward trajectory over the past decade for OA copies of journal articles retrievable by Google searches, his most encouraging result. This suggests that author self-archiving is increasing, reader reposting is increasing, or linkrot is making older copies less visible --most likely some of each.
One way that Wren summarizes his conclusion needs some elaboration. He says, "Decentralised sharing of scientific reprints through the internet creates a degree of de facto open access that, though highly incomplete in its coverage, is none the less biased towards publications of higher popular demand." This is accurate but may leave the impression that most high-demand articles are OA somewhere, when all we know so far is that most OA articles in the set he studied were high-demand. It's possible that the vast majority of high-demand articles are not yet OA, and indeed this seems likely. Most publishing scientists do not yet self-archive their work and their reasons seem entirely unrelated to the demand, impact, or quality of their work --e.g. they know too little about self-archiving or believe they are too busy.
This is important because we ought to use Wren's results to understand why authors self-archive and how to appeal to authors who don't. One lesson is that existing OA is demand-driven to some degree. But this doesn't mean there is little or no unmet demand. On the contrary, unmet demand may be the norm, just as the sale of food is demand-driven while the unmet demand exists in catastrophic proportions.

Peter Suber
Open Access Project Director, Public Knowledge
Research Professor of Philosophy, Earlham College
Senior Researcher, SPARC
peters@earlham.edu

This work is licensed under a Creative Commons Attribution 2.5 License.