Open Access News

News from the open access movement

Wednesday, June 06, 2007

Carl Lagoze and Herbert Van de Sompel, Compound Information Objects: The OAI-ORE Perspective, Open Archives Initiative, May 28, 2007.

Compound information objects are aggregations of distinct information units that when combined form a logical whole. Some examples of these are a digitized book that is an aggregation of chapters, where each chapter is an aggregation of scanned pages; a music album that is the aggregation of several audio tracks; an image object that is the aggregation of a high quality master, a medium quality derivative and a low quality thumbnail; a scholarly publication that is aggregation of text and supporting materials such as datasets, software tools, and video recordings of an experiment....
Several information systems, such as repository and content management systems, provide architectural support for storage of, identification of, and access to compound objects and their aggregated information units, or components....In most systems, the components of an object may vary according to semantic type (article, book, video, dataset, etc.), media type (text, image, audio, video, mixed), and media format (PDF, XML, MP3, etc.). Depending on the system, components can themselves be compound objects � allowing recursive containment of compound objects. Also, components may vary in network location....
Unfortunately, the manner in which information systems publish compound objects to the web is frequently less-than-perfect and, without commonly accepted standards, ad hoc. In many cases, advanced functionality provided by individual information systems is lost when publishing compound objects to the web. Frequently the exposure to the web is targeted towards human users rather than machine agents. The structure of the compound object is embedded in �splash� pages, user interface �widgets� and the like. This approach can leave the essential structure of compound objects opaque to machine-based applications such as crawlers, search engines, and networked desktop applications....
The absence of these standards affects the functionality of a number of existing and possible web services and applications. Crawler-based search engines might be more useful if the granularity of their result sets corresponded to compound objects (a book or chapter, in this example) rather than individual resources (single pages). The ranking algorithms of these search engines might improve if the links among the components of a compound object were treated differently than links to the object as a whole, or if the number of in-links to the various component resources was accumulated to the level of the compound object instead of counted separately. Citation analysis systems would also benefit from a mechanism for citing the compound object itself, rather than arbitrary parts of the object....
A core goal of OAI-ORE � Object Reuse and Exchange � is to develop standardized, interoperable, and machine-readable mechanisms to express compound object information on the web. The OAI-ORE standards will make it possible for web clients and applications to reconstruct the logical boundaries of compound objects, the relationships among their internal components, and their relationships to the other resources in the web information space. This will provide the foundation for the development of value-adding services for analysis, reuse, and re-composition of compound objects, especially in the areas of e-Science, e-Scholarship, and scholarly communication, which are the target applications of ORE....

Posted by Peter Suber at 6/06/2007 12:52:00 PM.