Open Access News

News from the open access movement


Friday, January 23, 2009

When depositing articles in a repository, include metadata about their cited references

Stevan Harnad, The fundamental importance of capturing cited-reference metadata in Institutional Repository deposits, Open Access Archivangelism, January 22, 2009.  Excerpt:

On 22-Jan-09...Francis Jayakanth wrote on the eprints-tech list:

"Till recently, we used to include references for all the uploads that are happening into our repository....Our experience has been that when the references are copied and pasted...from the PDF file, invariably non-ascii characters found in almost every reference. Correcting the non-ascii characters takes considerable amount of time. Also, as to be expected, the references from difference publishers are in different styles, which may not make reference linking straight forward. Both these factors forced us take a decision to do away with uploading of references, henceforth...."

The items in an article's reference list are among the most important of metadata, second only to the equivalent information about the article itself....If each Institutional Repository (IR) has those canonical metadata for every one of its deposited articles as well as for every article cited by every one of its deposited articles, that creates the glue for distributed reference interlinking and metric analysis of the entire distributed OA corpus webwide, as well as a means of triangulating institutional affiliations and even name disambiguation.

Yes, there are some technical problems to be solved in order to capture all references, such as they are, filtering out noise, but those technical problems are well worth solving (and sharing the solution) for the great benefits they will bestow....

(Roman Chyla has replied to eprints-tech with one potential solution:

"The technical solution has been there for quite some time, look at citeseer where all the references are extracted automatically (the code of the citeseer, the old version, was available upon request - I don't know if that is the case now, but it was in the past). That would be the right way to go, imo....")