Open Access News

News from the open access movement

Thursday, July 17, 2008

Online researchers have access to more articles but cite fewer

James Evans has an article in the July 18 issue of Science Magazine, showing that when researchers have access to more papers, they cite fewer of them in their own work. The July 18 isn't yet online, but here's an article from today's Economist about Evans' research. (Thanks to Heather Joseph). Excerpt:

...[James Evans] found that as more journals become available online, fewer articles are being cited in the reference lists of the research papers published within them. Moreover, those articles that do get a mention tend to have been recently published themselves....

[Evans used Thomson citation data on] 6,000 of the most prominent academic journals, some going back to 1945. By cross-referring these to a database called Fulltext Sources Online, he was able to work out when each of these journals became available on the web�and whether a journal had posted back-issues electronically as well. The result was a set of 34m research papers, which he was able to mine in search of his answers.

For each research paper he looked at, he calculated the average age of the articles cited as references. He then calculated, for each of those cited articles, the number of back-issues of the journal it had been published in which were available on the web at the time when it was cited, and averaged that too. Finally, he looked for correlations between the two averages.

What he discovered was that, for every additional year of back-issues of a journal available online, the average age of the articles cited from that journal fell by a month. He also found a fall, once a journal was online, in the number of papers in it that got any citations at all. Indeed, he predicts that for the average journal today, five extra years� worth of online availability will cause a precipitous drop in the number of articles receiving one or more citations�from 600 to 200 a year....

Why this should be so remains unclear. It does not seem to have anything to do with economics. The same effect applied whether or not a journal had to be paid for. One explanation could be that indexing works by titles and authors alone, as happened with printed journals, forced readers to cast at least a cursory glance at work not immediately related to their own�or even that the mere act of flicking through a paper volume may have thrown up unexpected gems. This may have led people to make broader comparisons and to integrate more past results into their research....

Also see the press release and video from the NSF, which funded Evans' work. From the press release:

..."More is available," Evans said, "but less is sampled, and what is sampled is more recent and located in the most prominent journals."

Evans's research also found that this trend was not evenly distributed across academic disciplines. Scientists and scholars in the life sciences showed the greatest propensity for referencing fewer articles, but the trend is less noticeable in business and legal scholarship. Social scientists and scholars in the humanities are more likely to cite newer works than other disciplines.

So what is it about doing research online versus in a bricks-and-mortar library that changes the literature review so critical to research? Evans has identified a few possible explanations. Studies into how research is conducted show that people browse and peruse material in a library, but they tend to search for articles online Online searches tend to organize results by date and relevance, which leads allows scholars and scientists to pick recent research from the most high profile journals. Some search tools like Google factor the frequency with which other users select an item during similar searchers to determine relevance. Online, researchers are also more likely to follow hyper-linked references and links to similar work within an online archive. Because of this, as more scholars choose to read and reference a given article, future researchers more quickly follow.

Does this phenomenon spell the end of the literature review? Evans doesn't think so, but he does believe that it makes scholars and scientists more likely to come to a consensus and establish a conventional wisdom on a given topic faster. "Online access facilitates a convergence on what science is picked up and built upon in subsequent research." The danger in this, he believes, is that if new productive ideas and theories aren't picked up quickly by the research community, they may fade before their useful impact is evaluated. "It's like new movies. If movies don't get watched the first weekend, they're dropped silently," Evans said....

Comments

It's hard to say much based on a newspaper summary and a press release. But at first glance, Evans' results conflict with the many studies showing that OA articles are cited significantly more often than non-OA articles. These studies differ from one another on how to explain the correlation between OA and increased citation counts, but they agree on the correlation. However, there may be ways to reconcile the two sets of results. For example, authors may cite fewer articles when they have more to choose from, but they may still cite OA articles relatively more often than TA articles. Or the average number of citations per article may decline with the growth of the total number of articles accessible to authors, but OA articles might bring the average up, and TA articles might bring it down. Or the multiplication of ejournals may be narrowing the scope of the average paper, and therefore shortening the average reference list, but citations may be growing overall and the citations of OA articles may be growing faster than the citations of TA articles. (On the other side, the Economist said that "the same effect applied whether or not a journal had to be paid for" --though without specifying exactly which effect.)
Evans' results also appear to conflict with a recent study by Arthur Eger, Database statistics applied to investigate the effects of electronic information services on publication of academic research � a comparative study covering Austria, Germany and Switzerland, GMS Medizin - Bibliothek - Information, June 26, 2008. Eger found that "a larger content offering coincides with a dramatic increase in Full Text Article requests, and an increase in Full Text Article requests, after about 2 years, coincides with increased article publication." If Evans is right that "less is sampled", then the two studies are definitely incompatible. But if we look only at Evans' conclusions about citations, the two studies may be compatible. Evans is saying that access to more literature reduces the number of different sources one cites, and Eger is saying that it increases ("dramatically" increases) the number of articles one requests or samples. Researchers may be viewing more articles but citing fewer. Are they using their enhanced access to browse neighboring topics? Are they exploring serendipitous discoveries, only some of which turn out to be citable? Does their wider reading help them zero in on citable research?

Update (7/18/08). Evans' paper is now online: Electronic Publication and the Narrowing of Science and Scholarship, Science, July 18, 2008. Only this abstract and the supporting online material are free online:

Online journals promise to serve more information to more dispersed audiences and are more efficiently searched and recalled. But because they are used differently than print�scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse�electronically available journals may portend an ironic change for science. Using a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005), I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. The forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. Searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.

Update (7/18/08). Brandon Keim's blog post on the article at Wired Science has triggered a discussion in the comment section.

Update (7/18/08). Also see Lila Guterman, Access to Online Journals Reduces Breadth of Citations, Study Finds, Chronicle of Higher Education, July 18, 2008 (accessible only to subscribers). Excerpt:

...Mr. Evans's results puzzle Carol Tenopir, a professor of information sciences at the University of Tennessee at Knoxville. Along with Donald W. King, a research professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill, she has been studying scholars' reading habits since 1977.

"We found exactly the opposite" of Mr. Evans's results, she said. After 20 years of holding steady, the number of older articles that researchers read has increased in the past 10 years. So has the number of journals from which researchers read at least one article.

She suggested that citations lag behind reading by several years and that, because many journals put their older files online only recently, Mr. Evans may find a change in the trend if he looks again in a few years....

Update (7/19/08). Also see Bill Hooker's analysis. Excerpt:

...It's potentially worrisome if more citations are going to fewer journals, but once again I see no more reason to attribute that to increasing online availability than to attribute it to the sharply rising cost of scientific journals in any form. It's well documented that as journal prices have continued to rise, researchers and institutions have had to cut back on the number of subscriptions they take. It is not difficult to imagine that "long tail" and "preferential attachment" phenomena (see, for instance, Evans' own references 14 - 18...) would drive the concentration of likely subscriptions towards a pool of "must have" journals. Indeed, publishers actively promote the concept of such a pool and compete strongly to be seen to be part of it.

Finally, and to me most importantly, Evans seems to me to gloss over the question of what proportion of the online archives are freely available, and what effect that has on the phenomenon he is attempting to model....
I take issue with [Evans' conclusion that OA and TA show a similar effect]. On one of three [of Evans' own] measures they have the opposite effect, and on the other two measures commercial access has by far the stronger effect.
What this suggests to me is that the driving force in Evans' suggested "narrow[ing of] the range of findings and ideas built upon" is not online access per se but in fact commercial access, with its attendant question of who can afford to read what. Evans' own data indicate that if the online access in question is free of charge, the apparent narrowing effect is significantly reduced or even reversed. Moreover, the commercially available corpus is and has always been much larger than the freely available body of knowledge (for instance, DOAJ currently lists around 3500 journals, approximately 10-15% of the total number of scholarly journals). This indicates that if all of the online access that went into Evans' model had been free all along, the anti-narrowing effect of Open Access would be considerably amplified....

Indeed, I would suggest that if the entire body of scholarly literature were Openly available, so that every researcher could read everything they could find and programmers were free to build search algorithms over a comprehensive database to help the researchers do that finding, then in fact the opposite effect would obtain....

In support of this assertion, consider the expanding body of literature on the Open Access "citation advantage" -- studies which show that the likelihood of a given paper being cited is increased up to several hundred percent if the paper is OA rather than commercially available. There is some controversy over that literature, but it stands in direct contrast to the idea that online access of any kind tends to narrow citation reach.

There are more data in Evans' paper that speak to the free-vs-commercial issue, and some of those data show free access having a stronger "narrowing" effect than commercial access. I'd go through it in detail, but I am probably already pushing the limits of fair use so I'll have to refer you to the published article -- in particular, Figure 2 panels A and B. My response is much the same, that the apparent effect suffers from a loading in "favour" of commercial access, because of the wildly disparate sizes of the two different bodies of online literature.

Update (7/19/08). Bora Zivkovic has collected a good number of early comments on the paper.

Update (1/5/09). Also see the January 2, 2009, letter to the editor of Science by Yves Gingras, Vincent LaRiviere, and Eric Archambault. Excerpt:

J. A. Evans's report, "Electronic Publication and the Narrowing of Science and Scholarship" (18 July, p. 395) suggests that (i) the average age of citations to scientific papers dropped over the years as more electronic papers became accessible and (ii) the citations are concentrated on a smaller proportion of papers and journals. Such conclusions are not warranted by Evans�s data.

To measure the evolution of the average (or median) age of the references contained in papers, one has to look at all the references in all published papers and observe the evolution of their age over time. As we have shown using Thomson Reuters�s Web of Science data for the period 1900 to 2004 (for a total of 500 million references in 25 million papers), the average (and median) age of all references began to decrease in 1945 but has increased steadily since the mid-1960s. This trend is visible in all sciences, including the social sciences and the humanities....The median age of references in fields of science and engineering moved from 4.5 years in 1955 to more than 7 years in 2004, and in medical sciences it increased from 4.5 to 5.5 during the same period....In fact, Evans�s conclusions only reflect a transient phenomenon related to recent access to online publications and to the fact that the method used does not take into account time delays between citation year and publication year. Our data also show that in disciplines in which online access has been available the longest (such as nuclear physics and astrophysics), the age of references declines for a number of years in the 1990s but then increases from 2000 to 2007, the last available year of our data set. We have also measured the concentration of citations (and journals) by three different methods, including the one used by Evans. All three measures clearly show that concentration is in fact declining for papers as well as for journals....Although many factors affect citation practices, two things are clear: Researchers are increasingly relying on older science, and citations are increasingly dispersed across a larger proportion of papers and journals.

Posted by Peter Suber at 7/17/2008 05:46:00 PM.