Open Access News

News from the open access movement


Monday, December 17, 2007

Why we need OA to citation data

Mike Rossner, Heather Van Epps, and Emma Hill, Show me the data, Journal of Cell Biology, December 17, 2007.  An editorial.  Excerpt:

The integrity of data, and transparency about their acquisition, are vital to science. The impact factor data that are gathered and sold by Thomson Scientific (formerly the Institute of Scientific Information, or ISI) have a strong influence on the scientific community, affecting decisions on where to publish, whom to promote or hire, the success of grant applications, and even salary bonuses. Yet...to our knowledge, no one has independently audited the underlying data to validate their reliability....

Thomson Scientific makes its data for individual journals available for purchase. With the aim of dissecting the data to determine which topics were being highly cited and which were not, we decided to buy the data for our three journals [at Rockefeller University Press] (The Journal of Experimental Medicine, The Journal of Cell Biology, and The Journal of General Physiology) and for some of our direct competitor journals. Our intention was not to question the integrity of their data.

When we examined the data in the Thomson Scientific database, two things quickly became evident: first, there were numerous incorrect article-type designations. Many articles that we consider "front matter" were included in the denominator. This was true for all the journals we examined. Second, the numbers did not add up. The total number of citations for each journal was substantially fewer than the number published on the Thomson Scientific, Journal Citation Reports (JCR) website (subscription required). The difference in citation numbers was as high as 19% for a given journal, and the impact factor rankings of several journals were affected when the calculation was done using the purchased data (data not shown due to restrictions of the license agreement with Thomson Scientific)....

When queried about the discrepancy, Thomson Scientific explained that they have two separate databases—one for their "Research Group" and one used for the published impact factors (the JCR). We had been sold the database from the "Research Group", which has fewer citations in it because the data have been vetted for erroneous records. "The JCR staff matches citations to journal titles, whereas the Research Services Group matches citations to individual articles", explained a Thomson Scientific representative. "Because some cited references are in error in terms of volume or page number, name of first author, and other data, these are missed by the Research Services Group."

When we requested the database used to calculate the published impact factors (i.e., including the erroneous records), Thomson Scientific sent us a second database. But these data still did not match the published impact factor data. This database appeared to have been assembled in an ad hoc manner to create a facsimile of the published data that might appease us. It did not....

It became clear that Thomson Scientific could not or (for some as yet unexplained reason) would not sell us the data used to calculate their published impact factor. If an author is unable to produce original data to verify a figure in one of our papers, we revoke the acceptance of the paper....

Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific's impact factor, which is based on hidden data. As more publication and citation data become available to the public through services like PubMed, PubMed Central, and Google Scholar®, we hope that people will begin to develop their own metrics for assessing scientific quality rather than rely on an ill-defined and manifestly unscientific number.

Update.  Also see Stevan Harnad's comments.  Excerpt:

Rossner et al are quite right, and the optimal, inevitable solution is at hand:

  1. All research institutions and research funders will mandate that all research journal articles published by their staff must be self-archived in their Open Access (OA) Institutional Repository.
  2. This will allow scientometric search engines such as Citebase (and others) to harvest their metadata, including their reference lists, and to calculate open, transparent research impact metrics....

Update. Also see Thomson's response to the editorial.

Update (1/10/08). Rossner, Van Epps, and Hill have written a second editorial in response to Thomson's response.