Monday, July 18, 2005

Open XML architecture for chemical data

Peter S. Murray-Rust, John B.O. Mitchell, and Henry S. Rzepa, Communication and re-use of chemical information in bioscience, BMC Bioinformatics, July 18, 2005.
Abstract: The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. Valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture as proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.
Also see the BMC press release on this article. Excerpt: 'A commentary article published today in the Open Access journal BMC Bioinformatics argues that it is time chemistry followed in the footsteps of bioinformatics and structural biology and moved towards the creation of an open semantic web facilitating access to chemical information. In the article, Peter Murray-Rust, from the University of Cambridge, UK, and John Mitchell and Henry Rzepa from Imperial College London, UK argue using three case studies that conventional methods such as cutting-and-pasting chemical information are time-consuming and introduce errors. The authors argue in favour an open XML architecture linking to connection tables or open databases such as PubChem, to identify chemical compounds mentioned in the biomedical literature. This comes as additional support for open chemical databases like the NIH's PubChem, which is currently at the centre of a legal battle between the NIH and the American Chemical Society (ACS). The ACS runs the very lucrative Chemical Abstracts Service and is directly threatened by public databases.'