Open Access News

News from the open access movement


Sunday, January 20, 2008

Addressing the limitations on online data access

Rick Luce, Learning from E-Databases in an E-Data World, Educause Review, February 2008.  Excerpt:

...Massive amounts of data produced on a daily basis require more-sophisticated management solutions than are available in today’s database environments; the use of the Internet as an enabling infrastructure for scientific exchange has created new demands for data accessibility as well....The limitations of the current database environment will be increasingly magnified in an era of e-Research and e-Science....

Imagine trying to support collaborative e-Science projects without large-scale, automated data processing. In an era when we’d like the data to speak to other data, a large number of scientific databases aren’t equipped with programming interfaces enabling software developers to query those databases from within their own programs and systems.

Public access to these interfaces is rarely provided....

Self-described XML files that could be readily harvested would solve many of these problems....

Financial and political issues drive the most controversial dimension, that of ubiquitous access to data and databases. It seems obvious that free access for all to scientific data and databases would be beneficial, but the reality is more complex. Data curation with highly qualified staff is costly, and as a result, sustainability and financial issues arise. Most funding agencies do not provide long-term support for data curation, so alternative funding models are required. Depending on the funding model selected, different trade-offs result.

Some important databases are cost prohibitive and not widely available (e.g., Chemical Abstracts). Others are freely accessible through a web interface, although downloading is not permitted. Some providers block requests from entire domains when they suspect someone is attempting to “steal” data using automated data parsing from a web interface.

Licensing conditions of “free” licenses may impose considerable obstacles—for example, when database providers demand that the origin of the data be transparent to the user. Another licensing problem is data redistribution, which may not be permitted. The newest wrinkle is the demand that any publication making use of the database in any way must grant coauthorship to the database. Clearly, a universal legal framework for database interoperability is overdue....