Open Access News

News from the open access movement


Thursday, March 16, 2006

Why we need RDF for useful OA to data

Eric K. Neumann, RDF — The Web’s Missing Link, Bio IT World, March 14, 2006. Excerpt:
You’d be hard-pressed to carry out any research project today without using the Web’s linked nature. The Web satisfies two large needs of science: as a resource to large and diverse data sets and as the primary communication system for scientific publishing and searching research discoveries. However, along with its increasing importance in R&D, its simplicity as linked pages based on HTML has also constrained its ability to more intelligently assist scientists in searching, sharing, and annotating data. Using HTML, data can certainly be pointed to via a URL, but its structure depends on externally defined formats. Even the use of XML doesn’t remedy this problem, as witnessed in the long process of defining document type definitions (DTDs): Without developing a parser for a predefined DTD or XML schema, no applications will be able to understand how you represent your data. Counter to the nature of the Web is the practice of defining data in one monolithic structure. Where does the data about a given gene end, and where does the pathway it is involved in begin? At the splice variant form, the modified protein level, or the complex it’s part of? The goal to connect complex information is not being advanced by quibbling over the boundary positions between biological, chemical, and medical object. Must the parsers be updated each time there is a new innovation in the science? How should we “link in” new data, annotations, and external references?...RDF is a W3C specification that provides the missing link required to do for data what HTML did for pages. RDF is central to the Semantic Web and is about linking data. It allows people to treat each data element more like a linkable document, which can be linked to any other data element.