Open Access News

News from the open access movement


Wednesday, October 22, 2008

Data handling in different repository software

Dorothea Salo, Content, presentation, and behavior, Caveat Lector, October 20, 2008.

... DSpace and EPrints make certain assumptions about the files they take in. Key for our purposes is that they assume that all they have to do to mediate between a file and its end-user is serve it up in response to a request. Ask, give, end of story. ...

For a dataset, this ask-and-give assumption is pure disaster. Hardly anybody wants a whole dataset boiled down into a single file. Hardly anybody creates a dataset that way. Sure, they’ll tell you they just have the one spreadsheet, but that doesn’t count the data dictionary and the lab notebooks and the field notes and the et cetera. What’s more, datasets don’t want to be treated as unitary objects; ask-and-fetch just doesn’t work. Query, slice-and-dice, facet, analyze, number-crunch, mash up—that’s what people want to do with a dataset. They want it to have an API.

And all DSpace and EPrints can do is say “durrr, here’s a file.” ...

Les Carr, Data Access in Repositories - Don't Overlook What We Already Have!, RepositoryMan, October 21, 2008.

Dorothea Salo's latest blog entry takes EPrints and DSpace to task for not being able to help users analyse (query, slice-and-dice, facet, analyse, number-crunch, mash-up) data files.

You can already do that, at least you can in Microsoft Excel anyway. As an example, I chose a data file that is already in the MINDS reporisoty (DSpace) and one that is in my school repository (EPrints) and created a new spreadsheet on my desktop that referenced data ranges in both of the archived data sets. ...

It is an interesting issue, to think what the data-oriented functions are that a repository can provide. However, we should not overlook the functions that we already have! ...