Thursday, May 15, 2008

Comments on WiChempedia and Chempedia

Peter Murray-Rust, Chemical compounds in Wikipedia, petermr’s blog, May 15, 2008.

... Recently two derivative works of [Wikipedia] compounds were announced: [WiChempedia and Building Chempedia].

This post is primarily to welcome these developments and add some general comments.

  • The style of the two sites is different and they appear to be completely independent. They are somewhat complementary ...
  • I think both sites use the WP title and URL as the primary identifier in WP. WP also has a set of numeric identifiers which I think represents the internal WP uniquification system. This may matter at some time as WP entries can be deleted or moved while the identifiers are sacrosanct.
  • Both sites have a search capability (I have not compared them). I may have missed it but there was no clear way to download results.
  • It is not clear what the ingestion strategy is for either site. ...
  • I am not clear what data transformation (if any) is carried out automatically by the ingest process. ... An ingestion program either has to deal with all lexical variants (quite a problem) or simply ingest the string. ... Scientific units are not always easy to extract.
  • Does either site have an RSS feed for new entries? ...

Our own work on collections of common compounds using RDF is progressing well though it has been technically harder than we thought mainly due to variability in data input. ... We shall, of course make our results freely and Openly available, modulo the difficult issues which have been raised about data sharing are re-use.