Open Access News

News from the open access movement


Monday, April 14, 2008

How to text-mine PubMedCentral

The ChemSpider blog contains a post, dated April 6, reacting to the question by Peter Murray-Rust about the ability to conduct automated information extraction from PubMedCentral.

Having blogged on this before I think it important to emphasise that you CAN spider PubMed Central. They even have their own utilities designed specifically for the mass downloading of articles in the form of an OAI feed. What you cannot do is spider the article URLs directly (you must use the XML) because this is forbidden in robots.TXT and you will be blocked on this basis.

PubMed Central is one of the most innovative and open chemistry resources on the web with fantastic metadata and article retrieval tool sets designed to facilitate (not prevent) the spread of chemical information at no cost.