Open Access News

News from the open access movement

Saturday, August 02, 2008

This table is to accompany an article in the August issue of SOAN, which I just mailed. But I hope it will also be useful in its own right. (SOAN uses plain text and doesn't support tables.)

	Gratis OA (removing price barriers)	Libre OA (removing both price and permission barriers)
Green OA (through repositories)	1	2
Gold OA (through journals)	3	4

Some observations:

In April 2008, Stevan Harnad and I proposed some terms to describe two kinds of free online access: the kind which removes price barriers alone and the kind which removes price barriers and at least some permission barriers as well. The distinction is fundamental and widely-recognized, but we saw right away that our terms (weak OA and strong OA) were ill-chosen and we stopped using them. However, all of us who work for OA and talk about OA still need vocabulary to describe this basic distinction. The most neutral and descriptive terms I've been able to find so far are "gratis OA" and "libre OA", and I've decided use them myself until I find better ones. This choice of terms is personal and provisional. But to make it more effective, I wanted to explain it in public.
"Gratis" and "libre" may not be familiar terms in the domain of scholarly communication and OA. But in the neighboring domain of free and open source software, they exactly express the distinction I have in mind.
The main point of this table is to show that the gratis/libre distinction is not synonymous with the green/gold distinction. The green/gold distinction is about venues. The gratis/libre distinction is about user rights or freedoms.
All four cells of the table are non-empty. Green OA can be gratis or libre, and gold OA can be gratis or libre.
Libre OA includes or presupposes gratis OA. But neither green nor gold OA presupposes the other, although they are entirely compatible and much literature is both.
All four cells can contain peer-reviewed literature. None of these parameters is about bypassing peer review.
Because there are many different permission barriers to remove, there many different degrees or kinds of libre OA. Gratis OA is just one thing, but libre OA is a range of things.
The BBB definition describes one kind or subset of libre OA. But not all libre OA is BBB OA.
I'm not proposing a change in the BBB definition, and I haven't retreated an inch in my support for it. I'm simply proposing vocabulary to help us talk unambiguously about two species of free online access.

This blog post is just a sketch. For more detail, see the full SOAN article.

Posted by Peter Suber at 8/02/2008 11:31:00 AM.

August SOAN

I just mailed the August issue of the SPARC Open Access Newsletter. This issue takes a close look at two species of free online access and proposes some vocabulary to help us talk about them unambiguously. The round-up section briefly notes 115 OA developments from July.

Labels: Hot

Posted by Peter Suber at 8/02/2008 11:31:00 AM.

Access to Haitian law

Ha�ti-Archives-Technologie : Num�riser deux si�cles de l�gislation ha�tienne, Ha�ti Press Network, July 22, 2008. Read it in the original French or Google's English. Thanks to Laurie Taylor, who provides this English summary:

... Digitizing Haitian law is a major project with great significance because like all democratic societies, access to the law and legal information is necessary for the public to be involved in the democratic process. ... The Haitian law digitization project will present a complete inventory of Haitian law from 1804 within a clear and ease to use database so that lawyers and the general public will have equal access to the law.

The first part of the project is focusing on more recent legal documents, with all documents added eventually. This is a wonderful project and the University of Florida Digital Library Center is excited to be able to contribute to it through the Digital Library of the Caribbean. The Digital Library of the Caribbean (dLOC) includes many partners with projects focused on preservation and access for cultural heritage materials and contemporary needs, including many law projects. The Haitian law project embodies the core goals of preservation and access, and the ideal goals of digitization - presenting and preserving the past as it bears on the present through rare cultural materials that relate to current needs and future desires, and presenting them in ways that make them more useful and usable than in their original form. To see actual Haitian law documents, check in the Digital Library of the Caribbean and for more on Haitian law, see �Researching Haitian Law� by Marisol Flor�n-Romero from FIU (another dLOC partner).

Posted by Gavin Baker at 8/02/2008 12:57:00 AM.

On the IR at the University of Bras�lia

UNB: Maior acesso � produ��o cient�fica, Universia Brasil, July 28, 2008. Read it in the original Portuguese or Google's English.

Posted by Gavin Baker at 8/02/2008 12:39:00 AM.

Two articles on OERs

The Community College Consortium for Open Educational Resources blog lists two articles on OERs published in the current issue of the New England Journal of Higher Education. (Thanks to Open Education News.) The articles are:

Judy Baker, "A Culture of Shared Knowledge"
Nicole Allen, "Digital Textbooks: A Student Perspective"

Comment. I can't find these articles in the journal's table of contents (and neither can a commenter on the CCCOER blog), and the blog doesn't link to self-archived versions of the articles. But the issue does list other articles on OERs, such as "Opening Universities in a Digital Era" by John Palfrey and Urs Gasser, and "The OpenCourseWare Story: New England Roots, Global Reach" by Stephen Carson.

Posted by Gavin Baker at 8/02/2008 12:19:00 AM.

Friday, August 01, 2008

Videos of presentations on campus OA policies

The videos from the SPARC-ACRL Forum on the Harvard open access policy (Anaheim, June 28-29, 2008) are now available. The presenters are:

John Ober, California Digital Library
Stuart M. Shieber, Harvard University
Kevin L. Smith, Duke University
Catherine Candee, University of California

Posted by Gavin Baker at 8/01/2008 11:58:00 PM.

CAUT advises authors to retain copyright

The Canadian Association of University Teachers (CAUT) recommends that scholarly authors retain copyright. (Thanks to Heather Morrison.) From the July 2008 CAUT Intellectual Property Advisory:

The purpose of this advisory is to assist academic staff in retaining copyright ownership in the articles they publish in journals. Without copyright ownership, academic staff can lose control of their own work and may no longer be entitled to email it to students and colleagues, post it on a personal or course web page, place it in an institutional repository, publish it in an open access journal or include it in a subsequent compilation....

The publication agreement between the journal and the author is the key document in ensuring that academic staff can take full advantage of new forms of scholarly communication. These agreements are always negotiable, so it is critical that academic staff read them carefully and, if necessary, amend their terms to ensure that journals receive only the minimum rights that are actually required to publish the work. Typically this is a simple statement of permission to publish, not a full transfer of copyright.

The SPARC Canadian Author Addendum

The Canadian Association of Research Libraries (CARL) and SPARC (the Scholarly Publishing and Academic Resources Coalition) have created an Author Addendum that amends publishing agreements in such a way that authors retain key rights to the journal articles they publish. The Author Addendum is attached to this advisory as Appendix A....

Conclusion

Journals require only your permission to publish an article, not a wholesale transfer of the full copyright interest. To promote scholarly communication, autonomy, integrity and academic freedom, and education and research activities more generally, it is important for academic staff to retain copyright in their journal articles.

Comment

This is good advice. It's not true, however, that retaining copyright is necessary to publish in an OA journal. (Some OA journals let authors retain copyright and some don't; but either way, clearly, you can publish in them by signing their publication agreement.) Nor is retaining rights usually necessary for self-archiving: about two-thirds of TA journals allow postprint archiving without modifying the standard copyright agreement. But it's important to change the custom of routinely giving publishers all rights, and therefore the OA decision. Authors should give publishers only what they need for publishing and retain the rest, This will let them self-archive even when they publish at the ungreen one-third of journals. It will also insure them against publisher decisions to rescind the permission to self-archive or to restrict it by prohibiting deposit in certain repositories, imposing fees or embargoes on self-archiving, or limiting re-use rights.
But even when authors do retain key rights, that only secures permission for OA, not OA itself. CAUT should also recommend that faculty self-archive their peer-reviewed manuscripts and/or submit their work to peer-reviewed OA journals.

Update. Also see Gavin Baker's comment. Excerpt:

...[F]or journals that do ask for exclusive copyright, the problem isn�t that the author is giving the journal too many rights....Rather, the problem is that the author isn�t keeping enough rights. If we were discussing a tangible object, then the preceding two sentences would be semantically identical, but copyright is an intangible: the author can give away rights and keep them at the same time. This point isn�t always made clear.

The ideal approach, then, gives the broadest rights to both the journal and the author. Most important here is the author....Ideally, the author should end up with a set of rights as broad as copyright itself: either copyright itself, or a non-exclusive, royalty-free, irrevocable license to do anything with the work (including to sub-license it)....

Posted by Peter Suber at 8/01/2008 11:51:00 PM.

Interview with Tony Hey

Jon Udel has interviewed Tony Hey, the VP of Microsoft's External Research Division, July 31, 2008. (Thanks to Charles Bailey.) Excerpt from the transcript:

TH: ...Another area of focus is education and scholarly communication. We'll be unveiling plugins for our tools that make them more useful for scientists to do what they want to do.

JU: The NLM add-in for Word is an obvious example. Are there others?

TH: Yes, we'll announce a Creative Commons plug-in. Many people use Word, PowerPoint, and Excel, and are happy to share their documents. We'd like to give them a plug-in that will help them attach Creative Commons licenses to those documents.

We'll also have a research repository. At the university, I was supposed to monitor the output of my faculty -- 200 academics and 500 post-docs and grad students. What we did was insist on keeping a digital copy of not only publications, but also presentations at conferences, research reports, videos, data...

JU: ...especially data. That's a huge new area.

TH: It is in my view, yes....[W]hat is the role of the library? My view is very much the MIT DSPACE view that's been promoted. The role of a research library in a university is to be the guardian of the intellectual output of the university. And that needn't just be research, it can be teaching materials.

So we've used SQL Server, and the Entity Framework -- a bit like the RDF model of Tim Berners-Lee and friends -- to capture some semantic knowledge. So it tells you this is a presentation, Tony Hey gave it, the local organizers were so and so, it was done on this date, and so on.

JU: There's also the general notion of wrapping services around raw data sets. I've talked with Timo Hannay at Nature about how often, nowadays, somebody winds up publishing a paper as a "fig leaf of analysis" to cover what's really the publication of some data set.

TH: Timo and I absolutely agree on this. Research repositories which contain text and also data are going to be increasingly important....

PS: For background, see this week's announcement from Tony Hey's division of a set of free software tools to support scholarly communication and OA. Also see Richard Poynder's December 2006 interview with Tony Hey, which focused on Tony's commitment to OA and how Microsoft could support and advance it.

Posted by Peter Suber at 8/01/2008 11:27:00 AM.

Stanford School of Humanities and Science is considering an OA mandate

SPARC posted some new details (July 16, 2008) on the OA mandate at the Stanford School of Education. Excerpt:

...[John] Willinsky, who joined the faculty at Stanford in September, was prepared for a heated debate, but instead encountered enthusiasm to move ahead. After a one-hour discussion, the motion was approved unanimously at the retreat attended by most of the 50-member faculty.

�It really does signal a change in people�s understanding, awareness, and sensitivity to the issue because it was such an easy sell,� says Willinsky, a long-time advocate of open access....

Once the Stanford University School of Education faculty approved the policy, it was then sent to the university�s general counsel for review. Willinsky consulted closely with the Harvard Law School to craft the policy, author�s addendum and assemble a packet of supporting documents for the university. The general counsel gave the go ahead for the policy in late June. The repository is now in place and Willinsky is helping work out the details to implement the open access policy.

The move by the School of Education has triggered interest elsewhere on campus. The School of Humanities and Science has expressed an interest in pursuing an open access policy and Willinsky hopes there will be others at Stanford.

Labels: Hot

Posted by Peter Suber at 8/01/2008 11:02:00 AM.

Another society launches an OA journal

Plant Genome is a new peer-reviewed OA journal from the Crop Science Society of America. The inaugural issue (July 2008) is now online.

Posted by Peter Suber at 8/01/2008 10:07:00 AM.

OAD list of implementation resources for the NIH policy

The Open Access Directory (OAD) list of Implementation resources for the NIH policy is now open for community editing. The list started life as a short article in the April 2008 issue of SOAN.

OAD is a wiki, and you can help keep its lists comprehensive, accurate, and up to date.

Posted by Peter Suber at 8/01/2008 08:56:00 AM.

Thursday, July 31, 2008

SPARC releases "teaser cards" with pro-OA messages

SPARC has released a set of teaser cards as part of its student-oriented The Right to Research campaign. From the description:

Eye-catching and inexpensive to distribute, our new Open Access teaser cards are designed to grab student attention where they roam. Order copies or print your own, tear apart, and place this guerrilla piece strategically around campus - in library carrels, around the coffee shop, or around the department. ...

The teaser cards are 2" x 2" each and come in sets of 6. The six messages are:

Access to scholarly journals can cost as much as a car, every year. Your library can't afford it.
The article you couldn't read might have earned your paper an A+. But you'll never know.
While researching the newest cancer treatments for a family member, you can't get past the abstracts.
Your research will continue after graduation -- the same time your library card expires.
Our taxes funded the research you need. But you can't read it.
The journal you need right now is at the library -- 100 miles away.

On the reverse side, each card includes the message You don't have access, with the URL of The Right to Research Web site. Professionally printed sets of the card are available for purchase from the SPARC site.

Disclosure: I am a paid consultant for SPARC, including work on The Right to Research campaign.

Posted by Gavin Baker at 7/31/2008 11:58:00 PM.

Videos of keynotes from iCommons iSummit

The keynote addresses from the iCommons iSummit (Sapporo, July 29-August 1, 2008) are available in streaming video:

Here's a list of streaming times in GMT:

30 July: GMT 00:00-01:30 GMT and GMT 07:00-08:00
31 July: GMT 01:00-02:00 and GMT 08:00-09:00
1 August: GMT 01:00-02:00 and GMT 05:40-06:20

Posted by Gavin Baker at 7/31/2008 11:54:00 PM.

Fedora 3.0 released

Fedora Commons released version 3.0 of its Fedora repository software on July 30. (Thanks to Charles Bailey.) See the press release for a detailed list of new features.

Posted by Gavin Baker at 7/31/2008 11:32:00 PM.

Dutch university launches temporary OA journal fund

The Delft University of Technology has launched a fund to help faculty pay publication fees at fee-based OA journals. From its announcement (July 1, 2008):

TU Delft Library is supporting the world-wide propagation of knowledge via open access publication. An Open Access incentive fund has been set up to help with the financial side of Open Access publication....

Researchers will be able to apply to the fund for help with financing OA publication costs (OA author�s fees) until the end of 2008. All applications for a refund of the author�s fee for OA publications in peer-reviewed journals will be honoured until the fund runs dry. Our website provides more information on this subject and an order form for requesting a refund of the author�s fee....

From the fund page (English version):

The funding of Open Access publishing is primarily the responsibility of the faculty or research group to which a researcher belongs. If your faculty will not pay the OA author fee, then help is available from a (temporary) Open Access Publishing Fund. Researchers can, at least until the end of 2008, utilise this fund to finance OA publishing costs. For the sake of clarity: this money is intended only for the funding of 'open access publications with peer review' from 1 April 2008 and is not intended for the �normal� publication costs. Applications meeting the defined criteria will be met as long as the fund is not exhausted.

Comment. I applaud this support for OA. But I'll add that any university willing to pay these fees should also be willing to adopt a policy to encourage or require OA archiving for the research output of the institution. The two strategies are compatible and complementary. Delft signed the Berlin Declaration, runs an institutional repository, and has hosted a useful wiki devoted to OA since April 2007. But I don't believe it has yet adopted a strong OA policy for its own research output, for example, as Harvard and 20+ other universities around the world have done.

Posted by Peter Suber at 7/31/2008 05:27:00 PM.

Depositing chemical data in OA institutional repositories

J. Downing, Peter Murray-Rust, and six co-authors, SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories, Journal of Chemical Information and Modeling, July 29, 2008. The July 29 issue of the journal isn't online yet, so I'm linking to the abstract at PubMed:

Abstract: The SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) project has investigated the practices of chemists in archiving and disseminating primary chemical data from academic research laboratories. To redress the loss of the large amount of data never archived or disseminated, we have developed software for data publication into departmental and institutional Open Access digital repositories (DSpace). Data adhering to standard formats in selected disciplines (crystallography, NMR, computational chemistry) is transformed to XML (CML, Chemical Markup Language) which provides added validation. Context-specific chemical metadata and persistent Handle identifiers are added to enable long-term data reuse. It was found essential to provide an embargo mechanism, and policies for operating this and other processes are presented.

Posted by Peter Suber at 7/31/2008 04:43:00 PM.

More on OA, downloads, and citations

Philip M. Davis and four co-authors, Open access publishing, article downloads, and citations: randomised controlled trial, BMJ, July 31, 2008. Abstract:

Objective. To measure the effect of free access to the scientific literature on article downloads and citations.

Design. Randomised controlled trial.

Setting. 11 journals published by the American Physiological Society.

Participants. 1619 research articles and reviews.

Main outcome measures. Article readership (measured as downloads of full text, PDFs, and abstracts) and number of unique visitors (internet protocol addresses). Citations to articles were gathered from the Institute for Scientific Information after one year.

Interventions. Random assignment on online publication of articles published in 11 scientific journals to open access (treatment) or subscription access (control).

Results. Articles assigned to open access were associated with 89% more full text downloads (95% confidence interval 76% to 103%), 42% more PDF downloads (32% to 52%), and 23% more unique visitors (16% to 30%), but 24% fewer abstract downloads (�29% to �19%) than subscription access articles in the first six months after publication. Open access articles were no more likely to be cited than subscription access articles in the first year after publication. Fifty nine per cent of open access articles (146 of 247) were cited nine to 12 months after publication compared with 63% (859 of 1372) of subscription access articles. Logistic and negative binomial regression analysis of article citation counts confirmed no citation advantage for open access articles.

Conclusions. Open access publishing may reach more readers than subscription access publishing. No evidence was found of a citation advantage for open access articles in the first year after publication. The citation advantage from open access reported widely in the literature may be an artefact of other causes.

The same issue of BMJ contains an editorial by Fiona Godlee on the study, but only the first 1.5 paragraphs are free online for non-subscribers.

Update. Also see Stevan Harnad's comment:

...To show that the OA advantage is an artefact of self-selection bias (or any other factor), you first have to produce the OA advantage and then show that it is eliminated by eliminating self-selection bias (or any other artefact).

This is not what Davis et al did. They simply showed that they could detect no OA advantage one year after publication in their sample. This is not surprising, since most other studies don't detect an OA advantage one year after publication either. It is too early.

To draw any conclusions at all from such a 1-year study, the authors would have had to do a control condition, in which they managed to find a sufficient number of self-selected self-archived OA articles (from the same journals, for the same year) that do show the OA advantage, whereas their randomized OA articles do not. In the absence of that control condition, the finding that no OA advantage is detected in the first year for this particular sample of journals and articles is completely uninformative.

The authors did find a download advantage within the first year, as other studies have found. This early download advantage for OA articles has also been found to be correlated with a citation advantage 18 months or more later. The authors try to argue that this correlation would not hold in their case, but they give no evidence (because they hurried to publish their study, originally intended to run four years, three years too early)....[PS: Omitting 18 specific bullet points.]

Update. Also see Gunther Eysenbach's comment:

Today, Davis� et al. have published a paper containing preliminary results from their Open Access RCT....[The paper shows] a significant increase in access and use of Open Access articles compared to non-OA articles....Davis et al. failed to show a citation advantage after 9-12 months, from which they conclude that �the citation advantage from open access reported widely in the literature may be an artifact of other causes.� Jumping to these conclusions after only 9-12 months is actually quite outrageous and the fact that the BMJ publishes �negative� results of an ongoing trial before it is even �completed� is deeply disturbing....

[T]o conclude or even imply that any citation advantage is an �artifact� after looking only at citations that occur within the same year of the cited article (9-12 months after publication) is as interesting and valid as doing a RCT on the effectiveness of a painkiller and comparing the pain between control and intervention patients after only one minute, concluding that the painkiller doesn�t work if there is no statistically significant difference between the groups after 60 seconds....

Davis says there were only 20 self-archived articles in his total sample, which is a suspiciously low self-archiving rate of only 1.2% (with an unreported contamination rate in his control group), while my PNAS sample had 10.6% of all articles in the control group self-archived. What Davis et al. unfortunately fail to report is when were the searches for self-archiving done? The low self-archiving rate suggests to me that this was perhaps only tested once right after publication, rather than continuously after publication? ...

Update (9/3/08). For more comments pro and con, also Tracey Caldwell's article in Information World Review, September 3, 2008.

Posted by Peter Suber at 7/31/2008 04:13:00 PM.

More on the OA mandates at Harvard

The Harvard Office of Scholarly Communication has added two new pages on the OA mandate at the Harvard Faculty of Arts and Sciences:

FAQ for Publishers
Harvard Open-Access Policy Summaries (on the FAS and Law School OA mandates, clearly preparing for more to come)

Posted by Peter Suber at 7/31/2008 02:53:00 PM.

Open data at ESOF 2008

Alma Swan has blogged some notes on her open-data session at the EuroScience Open Forum 2008 (Barcelona, July 18-22, 2008). Excerpt:

...I organised a session on open research data. The session reflected three perspectives - those of a researcher, a science publisher and a research funder.

Representing research, Peter Murray-Rust spoke about the ways in which data contained within the body of scientific articles can be mined and mashed by clever software (some of it developed by his doctoral students) to create new understandings and knowledge. He thanked the publishers who permit this and help to make it possible, but not all of them do. Peter spoke not from slides but using a series of web pages to illustrate his points, so the most useful link to his material is his recent article on the topic of Open Data in Nature Precedings.

Philip Campbell, editor-in-chief of Nature, gave a publisher's perspective on data, emphasising that Nature aims to assist the sharing of data wherever possible. He explained Nature's considerable efforts to help the development of Open Data over the last four years and gave examples of how Nature editors deal with scientists who do not comply with Nature's requirement for them to make supporting data openly available when they submit their articles. Philip also touched on the logistic and technical issues that publishers have to deal with, some of which are challenging.

Finally, Max Voegler from the German research funder DFG (Deutscheforschungsgemeinschaft) gave a funder's view on data. He explained why the DFG thinks sharing data is important, and covered issues such as ownership of data, giving due recognition for data and what long-term views on data must take into account. The last topic here involves the issues of funding, where data will be collected and who will be responsible for looking after them. A sustainable future for data - and who knows when a particular dataset might be required again? - is not a simple matter and funders need to think and plan carefully to ensure that the best systems are in place to ensure data are curated and archived optimally.

Posted by Peter Suber at 7/31/2008 12:41:00 PM.

Notes on Dorothea Salo in Edinburgh

Stuart Macdonald has blogged some notes on Dorothea Salo's keynote address at The Repository Fringe (Edinburgh, July 31 - August 1, 2008). Excerpt:

Dorothea Salo's keynote speech provided a controversial overview of why IRs are dead and need to be reinvented!

She claims "we built it but they didn't come!" and that we (as in the repository community) ignored or didn't quite comprehend the world that academics are immersed in - 'their narrow field of battle'. Or their 'paranoid' (her words not mine) and legitimate questions such as plagiarism of their academic works, is this the institution acting as big brother?, what is authoritative version?, will my publisher be happy? etc.

So, as she states - it is not as simple to say - yes, lets have open access. For example, the software platform that repositories are based on don't have download statistics nor versioning; they won't let you edit your metadata, there are no facilities to digitise analogue material, can't stream videos etc etc. She painfully admits "I helped to kill the IR - it is dead! - so lets mourn the death of IR".

However she sees the shape of opportunity in the ashes. Repository software made the same bad assumptions as we did; workflows that don't work for born digital materials; protocols that don't do enough; there's services that could/should be offered but aren't; there's a stunning amounts of redundant effort aimed at redressing these problems. What we should be doing is putting effort into better software and better services before the web whizzes past us as we try to catch up! Currently the 'Institutional Repository' is not mashable....

She asks us, the repository community to 'take one step back - then two steps beyond' - beyond the idealism and the 'green OA'. Our experience is now telling us that peer-reviewed research is not all we care about, that useful research products happen long before publication and as such open access is a by-product not an end product. She wants us to look beyond the silos of digital resources and do a good job with the 'stuff', not to be too obsessed by where it goes, to be profligate with our 'stuff' - mash it up, expose it, manage it, mainstream it - no matter where it eventually ends up.

She highlighted the fact that self archiving doesn't have a management component, she's 'tired at watching good code fly past' i.e open utilities that could be utilised within the repository environment but aren't....

For example, regarding harvesting - the content is out there - just have to get our hands on it; lets have more APIs, allow programmers to be more flexible, lets learn from and invest in relations with commercial services and disciplinary repositories....

Posted by Peter Suber at 7/31/2008 12:38:00 PM.

How to hide OA content from search engines

The experienced folks at SHERPA have compiled a list of Ways to snatch Defeat from the Jaws of Victory. (Thanks to Peter Millington.) Excerpt:

You may have set up your repository and filled it with interesting papers, but it is still possible to screw things up technically so that search engines and harvesters cannot index your material. Here are some common gotchas:

Require all visitors to have a username and password

Harvesters and crawlers will be locked out, and a lot of end users will give up and go away. It is reasonable to require a username and password for depositing items, but not for just searching and reading.

Do not have a 'Browse' interface with hyperlinks between pages

Search engine crawlers will never index past your first page. Button-style controls cannot normally be followed.

Set a 'robots.txt' file and/or use 'robots' meta tags in HTML headers that prevent search engine crawling

Google, Yahoo!, etc., may find your pages, but if you tell them not to index them or to follow the links, they won't.

Restrict access to embargoed and/or other (selected) full texts

Search engines and harvesters may index the metadata pages, but not the full texts of the relevant items.

Accept poor quality or restrictive PDF files

Some PDF-making software packages (usually free, cheap, or esoteric) generate poor quality PDF files that sometimes cannot be read properly by harvesting and indexing programs. However, you can still cause problems even with high-end software if you use it to restict the functionality of the PDF file - e.g. preventing copy-and-paste. It may not be possible to index such files.

Hide your OAI Base URL

If harvesters cannot find your OAI Base URL, they cannot harvest your data. Good places to give the OAI Base URL are on your repository's 'About' page or home page. Also, register it with OpenDOAR and ROAR.

Have awkward URLs

Many harvesters and firewalls will spit out or block:

Numeric URLs - e.g. http://130.226.203.32/

URLs that use 'https:' instead of 'http:'

URLs that include unusual port numbers e.g. :47231

Stick to 'http:' and alphabetical URLs. It should be possible to avoid using port numbers in URLs.

Posted by Peter Suber at 7/31/2008 12:05:00 PM.

Open data policy from the Geochemical Society

Geochemical Society policy on geochemical databases, a policy statement on open data from the Geochemical Society. The policy was adopted November 27, 2007. It may not count as news, but the previously undated document was only dated today. Excerpt:

...Open access to data, especially those collected with public funds, is already mandated by some divisions of funding agencies, and is talked about by politicians. But, regardless of outside pressures, we, as the Geochemical Society, need to consider whether having centralized databases is in the best interest of our profession, i.e. do databases lead to good science? Stated differently, are there examples of studies where the compilation of a large amount of [open] data has resulted in good science and has moved the field forward? There are many....

This policy is intended for observational data (experimental and analytical) that are collected on samples....Data generated in the laboratory merit inclusion into a database after the manuscript in which the data are presented is accepted for publication (normally after peer review)....

[T]he primary role of the databases is to make the data more readily available to the scientist in an open access format. Data mining tools are in development to allow for related datasets to be discovered. Such tools are becoming critical because the amount of data is increasing and multidisciplinary studies are becoming more common....

In order to create a working environment in the geosciences as a whole that allows scientists more comprehensive views of Earth processes, interoperability between widely varying types of datasets will be an essential goal....

The GS has no method of enforcement; the policy the GS adopts can be an advisory only. It is therefore important that the officers of the GS set the example in abiding by the GS database and data publication policy. Enforcement of such a policy lies with funding agencies first and publishers second. The GS can encourage. The GS is prepared to take the lead in having an international committee that acts as a liaison with the funding agencies....

These guidelines are some of the practical consequences of the policy as well as some "best practices".

Databases housing geochemical information should be available to the community at large (open access)

The metadata, which include sample or experiment description as well as analytical results on standards, are as important as the data. It is the metadata that allow comparison with other labs and use of the data by other studies. Published papers should have a consistent location for the metadata such as an appendix....

After final acceptance of a manuscript for publication, any new data that it contains should be submitted for entry into an established database, if an appropriate database exists. This can be enforced by editors and funding agencies.

In order for published data to be recognized and cited as a publication (instead of citation to the database), it is important that it is linked to a single identifier: the publication. Separate digital object identifiers (DOI) for data will erode the importance of the publication....

Posted by Peter Suber at 7/31/2008 11:58:00 AM.

Wednesday, July 30, 2008

Nature launches its manuscript deposit service

The Nature Publishing Group has launched the manuscript deposit service it announced on July 8. From today's press release:

Nature Publishing Group (NPG) today launches the first phase of its Manuscript Deposition Service. The free service will help authors fulfil funder and institutional mandates for public access.

From today, the NPG Manuscript Deposition Service will be available to authors publishing original research articles in Nature and the Nature research journals. NPG expects to be able to announce the availability of the service for many of its society and academic journals, and for the clinical research section of Nature Clinical Practice Cardiovascular Medicine, shortly.

NPG's Manuscript Deposition service will deposit authors� accepted manuscripts with PubMed Central (PMC) and UK PubMed Central (UKPMC). The service is open to authors whose funders have an agreement with PMC or UKPMC to deal with authors� manuscripts from publishers. PubMed Central will accept manuscripts deposited by NPG where the author is funded by or employed by the National Institutes of Health (NIH) or Howard Hughes Medical Institute (HHMI). UK PubMed Central has agreed to accept deposits from NPG from authors funded by any of its Funders Group: Arthritis Research Campaign, Biotechnology and Biological Sciences Research Council, British Heart Foundation, Cancer Research UK, Chief Scientist Office (Scotland), Department of Health, Medical Research Council, and the Wellcome Trust....

NPG hopes to extend the service to other archives and repositories in the future, to help more of its authors comply with institutional and funder mandates. This will include institutional repositories. NPG is proud to be the first commercial subscription publisher to announce a commitment to deposit in institutional repositories, and is currently in the early stages of the institutional repositories phase of this project.

NPG's License to Publish encourages authors of original research articles to self-archive the accepted version of their manuscript in PubMed Central or other appropriate funding body's archive, their institution's repositories and, if they wish, on their personal websites. In all cases, the manuscript can be made publicly accessible six months after publication. NPG does not require authors of original research articles to transfer copyright. NPG's policies are explained in detail [here].

PS: See my comments on the first announcement of the service earlier this month.

Update (9/18/08). Also see the editorial, Open access archiving, in Nature Cell Biology, September 2008 (accessible only to subscribers).

Posted by Peter Suber at 7/30/2008 10:50:00 PM.

Spanish column on OA

Urbano Fra Paleo, Ciencia y acceso al conocimiento, El Pa�s, June 25, 2008. (Thanks to kaosenlared.net.) Read it in the original Spanish or Google's English.

Posted by Gavin Baker at 7/30/2008 05:06:00 PM.

OA to veterinary science from Spain, Portugal, and Latin America

ReviVec: Red y Portal Iberoamericano de Revistas Cient�ficas de Veterinaria de Libre Acceso [Ibero-American Network and Portal of Open Access Veterinary Scientific Journals] offers access to journals from Spain, Portugal, and Latin America. The site opened in 2008. (Thanks to Accesso.com.)

Posted by Gavin Baker at 7/30/2008 03:26:00 PM.

Proposal for research network on open learning

The Open University and Carnegie Mellon University have posted a proposal for a research network on the design and use of open educational resources. (Thanks to OERderves.)

Posted by Gavin Baker at 7/30/2008 02:24:00 PM.

Will France's HAL deter university OA mandates?

Stevan Harnad, 50th Green OA Self-Archiving Mandate Worldwide: France's ANR/SHS, Open Access Archivangelism, July 29, 2008. Excerpt:

...[The] Green OA self-archiving mandate [at the ANR/SHS is] France's first funder mandate, its second mandate, and the world's 50th....

Note that the situation in France with central repositories is very different from the case of NIH's PMC repository: France's HAL is a national central repository where (in principle) (1) all French research output --from every field, and every institution-- can be deposited and (again, in principle) (2) every French institution (or department or funder) can have its own interface and "look" in HAL, a "virtual" Institutional Repository (IR), saving it the necessity of creating an IR of its own if it does not feel it needs to.

The crucial underlying question -- and several OA advocates in France are raising the question, notably H�l�ne Bosc -- is whether the probability of adopting institutional OA mandates in France is increased or decreased by the HAL option: Are universities more inclined to adopt a window on HAL, and to mandate central deposit of all their institutional research output, or would they be more inclined to mandate deposit in their own autonomous university IRs, which they manage and control?

Again, the SWORD protocol for automatic import and export between IRs and CRs is pertinent, because then it doesn't matter which way institutions prefer to do it.

PS: Also see my post yesterday on the new ANR OA mandate.

Posted by Peter Suber at 7/30/2008 10:33:00 AM.

Tuesday, July 29, 2008

Blog notes on librarians and social studies conference

Jane Secker, Supporting researchers in the social sciences, Social Software, libraries & distance learners, July 29, 2008. Blog notes on Supporting Researchers in the Social Sciences (July 24-25, 2008, Belfast).

... Paula Divine spoke about the service funded by the ESRC called ARK (Access, Research and Knowledge). They host some fascinating data, for example CAIN - the Conflict Archive on the Internet - which is material on the troubles in Northern Ireland since 1968. They also have a lot of survey data from Northern Ireland. ....

The final speaker of the day was Niamh Brennan from Trinity College Dublin who spoke about open access repositories and the extensive work that is going on in Ireland in this area. The breadth of her talk was quite incredible, starting with the 1847 famine in Ireland and the work of the Statistical and Social Inquiry Society of Ireland to tackle this issue. She brought it right up to date, citing how the journal of this society now being on open access. She had some insights such as less than half of NHS funded research reports are available to those who work in the NHS! And how important it is that research gets to policy makers, using an example of how the cure for scurvy (citrus fruit) took over 150 years to become Naval policy. ...

Posted by Gavin Baker at 7/29/2008 06:54:00 PM.

Report on Oxford data repository study

Oxford University has released its report, dated July 25, on Scoping Digital Repository Services for Research Data Management. (Thanks to Charles Bailey.) From the report's conclusions:

The priorities of the project for the next months include the following deliverables: a consultation exercise with support services available in Oxford, the organization of a second workshop and the production of a set of recommendations for digital repository services for research data. ...

See also our previous coverage of the project's plan and blog.

Posted by Gavin Baker at 7/29/2008 06:46:00 PM.

South African presence in WorldWideScience

Eve Gray, Open access repositories begin to reap benefits for South African science as CSIR research goes global, Gray Area, July 29, 2008.

There are interesting signs of an increase in the momentum of change in research communications in South Africa. ...

The latest move has been the announcement in Seoul, Korea of the creation of a global science gateway, WorldWideScience.org ... The good news is that this time there is a good South African presence through the participation of the [Council for Scientific and Industrial Research]'s Research Space repository and the African journals from 24 countries that appear as a result of African Journals Online (AJOL) ...

The CSIR puts South Africa on the map with its participation and its presence on the Executive Board of the [WorldWideScience] Alliance, while the 24 African countries that have journals in the AJOL service give Africa a much stronger presence than it would have otherwise. ...

Posted by Gavin Baker at 7/29/2008 06:19:00 PM.

More on access to space data

Joanne Irene Gabrynowicz, The Law Behind the NOAA Open Letter to Google Lunar X PRIZE Participants, Res Communis, July 28, 2008.

Res Communis received 10,000+ hits for its post, NOAA Open Letter to Google Lunar X PRIZE Participants. There is enormous interest in the fact that, as the letter says, �if your [X Prize] team is based wholly or partially in the USA, you may need to apply for a license from the National Oceanic and Atmospheric Administration (NOAA).� Along with the hits, Res Communis received numerous comments about why this is the case. ...

The primary reasons behind the law are to advance the principle of open access to data by implementing the nondiscriminatory access policy ...

The nondiscriminatory access policy began in 1972 with the launch of the Earth Resources Technology Satellite, later renamed Landsat 1. The policy was formulated to ensure open access to sensed data and to assuage the concerns of the rest of the world that the satellite would be used against them in the form of economic or other espionage. Not all nations agreed that openness of information was a good idea and others feared the satellites being used against them. The nondiscriminatory access policy stated that access to imagery would be available to all, on a nondiscriminatory basis, and any nation could directly download the data, if they also implemented the nondiscriminatory access policy. Canada was the first to do so, followed by numerous nations since then.

Over the years, the policy evolved and has been adopted by all remote sensing nations and is, arguably, the most important part of the U.N. The Principles Relating to Remote Sensing of the Earth from Outer Space. The nondiscriminatory access policy still applies to the Landsat satellites and a modified version can apply to non-federal, civil satellites. ...

Comment. I'm unfamiliar with this area of policy, but the description here seems to contrast with the new repository of data from Indian space exploration, which will offer 18 months of discriminatory access limited to Indian researchers before opening to use by other nationals.

Posted by Gavin Baker at 7/29/2008 06:06:00 PM.

DSpace and Fedora to collaborate

DSpace Foundation and Fedora Commons Form Working Collaboration, press release, July 29, 2008.

Today two of the largest providers of open source software for managing and providing access to digital content, the DSpace Foundation and Fedora Commons, announced plans to combine strengths to work on joint initiatives ...

The collaboration is expected to benefit over 500 organizations from around the world who are currently using either DSpace ... or Fedora ... open source software to create repositories for a wide variety of purposes. ...

The decision to collaborate came out of meetings held this spring where members of DSpace and Fedora Commons communities discussed multiple dimensions of cooperation and collaboration between the two organizations. ...

In the spirit of advancing open source software, Fedora Commons and DSpace will look at ways to leverage and incubate ideas, community and culture to:

1. Provide the best technology and services to open source repository framework communities.

2.Evaluate and synchronize, where possible, both organizations� technology roadmaps to enable convergence and interoperability of key architectural components.

3. Demonstrate how the DSpace and Fedora open source repository frameworks offer a unique value proposition compared to proprietary solutions.

The announcement came on the heels of an event sponsored by the Joint Information Systems Committee�s (JISC) Common Repository Interface Group (CRIG) held at the Library of Congress. The event, known as �RepoCamp,� was a forum where developers gathered to discuss innovative approaches to improving interoperability and web-orientation for digital repositories. ...

Posted by Gavin Baker at 7/29/2008 06:03:00 PM.

Growth in Hindawi journals' Impact Factors

Hindawi, a publisher of OA journals, announced on July 28 that

Hindawi Publishing Corporation has once again seen solid growth in the Impact Factors of its journals, according to the 2007 Journal Citation Report. Hindawi has nine journals that were included in the previous Journal Citation Report, and the average Impact Factors of these journals rose by more than 14%. In addition, five of Hindawi's journals received Impact Factors for the first time this year. ...

In addition to the fourteen Hindawi journals that currently have Impact Factors, three more titles are scheduled to receive their first Impact Factor next year, and many more titles are currently under review.

See also our previous coverage of the 2007 IFs for PLoS and BMC journals.

Posted by Gavin Baker at 7/29/2008 05:57:00 PM.

Milestone for Hertfordshire IR

Gill Hall announced on July 29 the deposit of the 2,000th article in the University of Hertfordshire Research Archive.

Posted by Gavin Baker at 7/29/2008 05:50:00 PM.

New OA journal of podiatry

The Journal of Foot and Ankle Research is a new, peer-reviewed OA journal published by BioMed Central. See the July 28 announcement. It's the official journal of of the Australasian Podiatry Council and the (UK) Society of Chiropodists and Podiatrists. See the inaugural editorial. Authors retain copyright to their work, and articles are released under the Creative Commons Attribution License.

Posted by Gavin Baker at 7/29/2008 05:34:00 PM.

OA mandate at France's ANR

The Humanities and Social Sciences branch of France's Agence Nationale de la recherche (ANR) has adopted an OA mandate, requiring its grantees to deposit their peer-reviewed manuscripts in HAL-SHS, the humanities and social sciences section of HAL. (Thanks to Stevan Harnad.)

Read the French original (in ROARMAP) or Google's English.

In November 2007, ANR adopted a policy to encourage OA archiving, and the new policy strengthens it by requiring project managers to insure that it is done.

Labels: Hot

Posted by Peter Suber at 7/29/2008 05:27:00 PM.

NIH Deputy Director on the NIH policy

Norka Ruiz Bravo, Publication of NIH-funded Research in PubMed Central, ASCB Newsletter, July 2008. (ASCB = American Society for Cell Biology.) Ruiz Bravo is the Deputy Director of the NIH Office of Extramural Research. Excerpt:

Public access of peer-reviewed papers resulting from National Institutes of Health (NIH)- supported research is a public good. Recent legislation requiring that all peer-reviewed manuscripts arising from NIH funds be made publicly available on PubMed Central (PMC), NIH�s digital archive of full-text, peer-reviewed manuscripts and articles, ensures that these papers are archived and are available freely to the public now and in the future. The policy will allow the NIH to monitor, mine, and develop its portfolio of NIH-funded research more effectively. It will also help NIH-supported research become more prominent, integrated, and accessible. This makes it easier for all scientists to pursue NIH�s research priority areas competitively....

The ASCB played an important pioneering role in the development of PMC. Molecular Biology of the Cell, along with PNAS: Proceedings of the National Academy of Sciences, were the sole journals available in PMC when the system first went public in February 2000. Members of ASCB actively supported the creation of the archive and currently serve on the PMC Advisory Committee....

Today, PMC is thriving, having grown into an archive containing over one and a half million research manuscripts and articles that are integrated with other scientific resources, such as PubChem and Genbank....

Posted by Peter Suber at 7/29/2008 03:11:00 PM.

GoogleMaps extension to OpenDOAR

SHERPA has launched a Google Maps extension to OpenDOAR. From today's announcement:

...Just run any search of the directory, and then change the output format from "Summaries" to "Google Map".

Here are a few examples:

Repositories in Japan

Repositories with Spanish language material

United States repositories holding theses & dissertations

Keyword search for "Nottingham"

Repositories using CONTENTdm software

If you click on one of the place markers, a bubble will pop up listing the repositories at that location with links to the repositories themselves and their institutions' home pages. For performance reasons, the amount of information in the bubbles is limited, but bearing this in mind, we would be like to know what data you would like to see there....

This development was inspired by Stuart Lewis's Repository66 demonstrator, which mashes up OpenDOAR and ROAR data.

Comment. Just two days ago, the DOAJ added a page of statistics by country. Now it's easier than ever to track the spread of OA journals by country and the spread of OA repositories by country.

Posted by Peter Suber at 7/29/2008 02:59:00 PM.

OA policy at a Boston U research center

Boston University�s Superfund Basic Research Program is providing OA to its research results. (Thanks to Donna Wentworth.) Excerpt:

The use of open source wiki software encourages communication and collaboration on research, both externally with the public and internally within a research group. BU SBRP is developing an internal wiki for research collaborations.

RSS (Really Simple Syndication) allows subscribers to automatically see the latest products of research, including new publications, events, and research tools freely accessible through a web site.

Emerging permissionless licensing systems allow researchers to choose the terms under which they want to share their work; these include Creative Commons Licenses and the General Public License.

Finally, open access journals are those which make content available to everyone, without requiring a subscription. With the emergence of web-based publishing, this model can make research more easily available to more researchers in more locations. A list of open access journals can be found at DOAJ.

As the products of research come in many forms, these tools can be used in different ways. Statistical techniques and computer code for modeling environmental exposures and health outcomes can be licensed through permissionless systems, written in open source languages and fully commented, and shared through a wiki. Laboratory methods and synthetic data created to test different techniques can also be shared, updated, modified by individual researchers or collaboratively, and discussed through wiki software. Published articles can be made accessible through Open Access on-line journals.

Our ultimate goal is to provide a compelling model for sharing scientific findings, analytical tools, data, and research methods developed by research programs. With the help of alternative licenses for scientific work and web-based technologies that promote information sharing and collaboration, we are making these research results freely available to anyone who can use them. We hope these new technologies will enable research to be shared in a more open, accessible way, ensuring that it will be used widely and effectively for the general welfare.

Comment. I applaud what the SBRP is doing. But I have lots of questions. Is it putting all its peer-reviewed research articles on its wiki? (Some publishers who have no problem with depositing postprints in repositories do have problems with depositing postprints in wikis.) Either way, does it require this kind of OA archiving? Merely encourage it? Does it have an OAI-compliant OA repository in addition to its wiki? Are these questions (largely) moot because all its articles are published in OA journals? Does it require submission to OA journals? Encourage it? Does it pay processing fees at fee-based OA journals?

Update (7/31/08). Raphael Adamek from the Boston SBRP has answered my questions and allowed me to post his answers. (Thanks, Raphael.)

Our current policy is to strongly encourage our researchers to submit to OA journals. With the development of the NIH Public Access Policy we have also developed a centralized method of submitting articles to PubMed Central for our researchers. While still under development, we hope to create an internal archive that will be OAI-compliant while also fulfilling the requirements of the NIH Public Access Policy. We also pay the processing fees for our researchers to submit publications to fee based OA journals.

Additionally, we have also advocated for other Superfund Basic Research Program's to adopt similar Open Access Policies for their institutions.

Posted by Peter Suber at 7/29/2008 02:17:00 PM.

Librarians can improve Google indexing of repositories

Wouter Gerritsma, Google and the academic Deep Web, Wouter on the Web, July 28, 2008. Comments on Kat Hagedorn and Joshua Santelli, Google Still Not Indexing Hidden Web URLs, D-Lib Magazine, July/August 2008 (blogged here on July 16). Excerpt:

...The article by Hagendorn and Santelli shows convincingly that Google still has not indexed all information that is contained in OAISTER, the second largest archive of open access article information. Only Scientific Commons is more comprehensive. They tested this with the Google Research API using the University Research Program for Google Search. They only checked whether the URL was present. This approach only partially reveals some information on depth of the Academic Deep Web. But those are staggering figures already. But reality bites even more.

A short while ago I taught a Web Search class for colleagues at the University Library at Leiden. For the purpose of demonstrating what the Deep or Invisible Web actually constitutes I used and example from their own repository. It is was a thesis on Cannabis from last year and deposited as one huge PDF of 14 MB. Using Google you can find the metadata record. With Google Scholar as well. However, if you try to search for a quite specific sentence on the beginning pages of the actual PDF file Google gives not the sought after thesis....

Interestingly, you are able to find parts of the thesis in Google Scholar, eg chapter 2, chapter 3 etc. But those are the parts of the thesis contained in different chapters that have been published elsewhere in scholarly journals. Unfortunately, none of these parts in Google Scholar refers back to the original thesis that is in Open Access or have been posted as OA journal article pre-prints in the Leiden repository....

Is Google to blame for this incomplete indexing of repositories? Hagendorn and Santelli point the finger to Google indeed. However, John Wilkin, a colleague of them, doesn�t agree. Just as Lorcan Dempsey didn�t. And neither do I.

[Librarians can do more.]...We have to bring the information out there. Open Access plays an important role in this new task. But that task doesn�t stop at making it simply available on the Web.

Making it available is only a first, essential step. Making it rank well is a second, perhaps even more important step. So as librarians we have to become SEO experts. I have mentioned this here before, as well as at my Dutch blog.

So what to do about this chosen example from the Leiden repository. Well there is actually a slew of measures that should be taken. First of course is to divide the complete thesis in parts, at chapter level. Albeit publishers give permission only to publish articles, of which most theses in the beta sciences exists in the Netherlands, when the thesis is published as a whole. On the other hand, nearly 95% of the publishers allow publication of pre-prints and peer reviewed post prints. The so called Romeo green road. So it is up to the repository managers, preferably with the consent from the PhD candidate, to tear up the thesis in its parts �the chapters, which are the pre-print or post-prints of articles- and archive the thesis on chapter level as well. This makes the record for this thesis with a number of links to far more digestible chunks of information better palatable for the search engine spiders and crawlers. The record for the thesis thus contains links to the individual chapters deposited elsewhere in the repository....

Comment. The guide for repository managers Google and I put together in 2005, How to facilitate Google crawling, is now three and a half years old, an epoch in internet time. I wouldn't be surprised if many of the listed suggestions were out of date and many new and valuable suggestions simply not listed. If there are newer or more useful guides for repository managers, not limited Google crawling, please let me know. I'll blog them here. Or if there are many of them, perhaps we could start a list at OAD.

Update (7/31/08). Here's another guide: Ways to snatch Defeat from the Jaws of Victory from SHERPA.

Posted by Peter Suber at 7/29/2008 01:33:00 PM.

William Patry on the NIH policy and copyright

William Patry, Open Access and the NIH, The Patry Copyright Blog, July 28, 2008. Patry is the Senior Copyright Counsel at Google, and formerly copyright counsel to the U.S. House of Representatives Judiciary Committee.

This excerpt picks up after Patry reviews (1) the 1978 deliberations in Congress on whether to make research by government-funded scientists uncopyrightable, as Congress had already done for government-employed scientists, (2) the NIH OA policy, and (3) the APA's short-lived deposit fee for NIH-funded authors.

...STM publishers are claiming that the... [NIH policy has implications for] copyright law, and are attempting to have the Judiciary Committee intervene on their behalf. The claim that the NIH policy raises copyright issues is absurd. First, the policy does not reach the journal at all; only individual articles. Publishers� investment is thus left untouched entirely. Publishers did not invest a dime in the individual articles, and thus have no investment to complain about. They still have a 12 month window of exclusivity for the articles, which is quite long enough to ensure that their only investment � in the journal � is protected. As reviewed at the beginning of this posting, Congress could have chosen to deny all protection to STM articles funded in whole or in part by the government. It is surprising by taking a less extreme, balanced approach, Congress is now being attacked by those who contributed nothing financially to the creation of the works.

Patry may initially have missed the fact that the NIH policy applies to peer-reviewed manuscripts (as opposed to unrefereed preprints), but a reader pointed that out in the comment section. Patry replies that it doesn't change his conclusion:

...[This argument] would be stronger, IMHO, if (1) the STM publisher paid for the peer review, and (2) actually edited the article. But even then, why do you disregard the money that NIH has sunk in? Why shouldn't you flip the analysis and require the publisher, as a condition of publishing the article and charging for the journal, to reimburse NIH for some of NIH's expenses? STM publishers seem quite exercised over articles they pay nothing for being made available to the public, but apparently have no qualms about making money off of research funded by the public. Their moral outrage and accounting seems curiously unidirectional.

Posted by Peter Suber at 7/29/2008 01:23:00 PM.

Data access and curation in Australia

Margaret Henty and three co-authors, Investigating Data Management Practices in Australian Universities, Australian Partnership for Sustainable Repositories, July 2008.

Universities around the world are experiencing an increasing emphasis on the need for effective data management and stewardship to underpin the changing research environment, as research becomes more dependent on data in digital form and computers and networks proliferate. Data is valuable from the moment of creation, not to mention expensive to collect, so there is no point in duplicating its collection. It might also be unique, representing a snapshot in time or space and therefore impossible to replicate. Data can be re-used, sometimes for purposes not originally dreamt of, and it can be re-analysed, either to check original results or to take advantage of new analytical techniques. There is increasing pressure to ensure that data should not go to waste, and for universities to develop the infrastructure needed to care for this invaluable resource....

[T]hree Australian universities decided to investigate the needs of their own communities. The initiative came from The University of Queensland (UQ) and was taken up by The University of Melbourne (UM) and the Queensland University of Technology (QUT)....All three surveys were run in late 2007....

At a time when researchers are being encouraged to make their data available to others, it is pleasing to see that over three-fifths of respondents are willing to share their data, whether �openly� (8.6%), �via negotiated access� (44.0%), �only after the formal end of a project� (6.4%) or �only some years after the end of a project� (2.3%). In addition to these, a small proportion (0.8%) provides access through the Australian Social Science Data Archive, IATSIS or some other data archive. Some respondents pointed out that, in some cases, it is necessary for data to be made available together with journal publication, and it is likely that this is a trend which will grow.

About two-fifths of the respondents say that their data is never made available, for unexplained reasons (19.0%) or because of privacy or confidentiality issues (17.6%). About one-quarter of this group indicated that they would be willing to make their data available if �an easy mechanism� was available to do so....

The possibility of an easier mechanism to allow data deposit and access was welcomed by some, as in the following comments:

I readily share data with colleagues or students working on same or related project on an informal case by case basis. I would like to have access to an area where I could put data files for access and download by colleagues. Sending large files via email is not really possible, and sending data on CDs is also very time-consuming, especially with large files. I would very much welcome a solution to this problem that doesn't cost an arm and a leg to the researcher or school.

Currently this is achieved through project www sites and some formal international data repositories but having an easy to use infrastructure to deal with this would be excellent and I believe would represent a high value intellectual asset.

I have done this but there are no easy ways of doing it. I would very much like the Uni to offer such a service

A key aspect to [Research Centre] operations are electronic links to other organisations to facilitate customer access, manage data and integrate equipment. A standard framework for such access would facilitate such links.

Access is rather ad-hoc and depends on the instruments used. Would be preferable to have a central repository....

Posted by Peter Suber at 7/29/2008 01:01:00 PM.

More sections in the PALINET OA cluster

Walt Crawford has added three sections to the cluster of pages on OA he's building at the PALINET wiki:

Posted by Peter Suber at 7/29/2008 12:43:00 PM.

Free Microsoft tools for scholarly communication

Microsoft Research Unveils Free Software Tools to Help Scholars and Researchers Share Knowledge, a press release from Microsoft, July 28, 2008. Excerpt:

At the ninth annual Microsoft Research Faculty Summit today...Tony Hey, corporate vice president of Microsoft�s External Research Division,...announced a set of free software tools aimed at allowing researchers to seamlessly publish, preserve and share data throughout the entire scholarly communication life cycle....

In the area of scholarly communication, Hey said, �Collecting and analyzing data, authoring, publishing, and preserving information are all essential components of the everyday work of researchers � with collaboration and search and discovery at the heart of the entire process. We�re supporting that scholarly communication life cycle with free software tools....�

Microsoft researchers partnered with academia throughout the development of these tools to obtain input on the application of technology to the needs of the academic community, while Microsoft product groups submitted feedback on how the company�s technology could optimally address the entire research process. The collective efforts resulted in the first wave of many tools designed to support academics across the scholarly communication life cycle.

The following tools are freely available now:

Add-ins. The Article Authoring Add-in for Word 2007 enables metadata to be captured at the authoring stage to preserve document structure and semantic information throughout the publishing process, which is essential for enabling search, discovery and analysis in subsequent stages of the life cycle. The Creative Commons Add-in for Office 2007 allows authors to embed Creative Commons licenses directly into an Office document (Word, Excel or PowerPoint) by linking to the Creative Commons site via a Web service.

The Microsoft e-Journal Service [alpha version]. This offering provides a hosted, full-service solution that facilitates easy self-publishing of online-only journals to facilitate the availability of conference proceedings and small and medium-sized journals.

Research Output Repository Platform [slides, forum, about]. This platform helps capture and leverage semantic relationships among academic objects � such as papers, lectures, presentations and video � to greatly facilitate access to these items in exciting new ways.

The Research Information Centre [forthcoming]. In close partnership with the British Library, this collaborative workspace will be hosted via Microsoft Office SharePoint Server 2007 and will allow researchers to collaborate throughout the entire research project workflow, from seeking research funding to searching and collecting information, as well as managing data, papers and other research objects throughout the research process....

Comments

This is for real. Don't mistake the Microsoft research division, which doesn't sell anything, for the Microsoft product divisions. Tony Hey believes in open access and open data, and is putting Microsoft resources behind them. For background, see Richard Poynder's interview with Tony Hey (December 2006), and my previous post on the Microsoft repository platform (March 2008).
The new tools are free of charge. The announcement doesn't say they will ever be open source, but Microsoft encourages open-source tools in the open chemistry projects it funds. So it's possible.
The authoring add-in should help publishers (including OA publishers) reduce costs, at least if they want to provide XML, and it should help them decide to use XML. The repository platform and e-journal service are even more direct contributions to OA. I don't know much about the e-journal service, apart from a swarm of great ideas raised at a Microsoft brainstorming meeting in November 2005. And I don't know much about the repository platform except that it will be interoperable, play well with Microsoft tools like SQL Server Express, use semantic processing to create arbitrary relationships between resources, and serve as a back end compatible with DSpace and EPrints front ends. I look forward to user reviews.

Update. Also see Peter Monaghan's story in the Chronicle of Higher Education, July 31, 2008. Excerpt:

For example, the Article Authoring Add-in for Word 2007...allows users to create documents in the widely used format developed by the National Library of Medicine's free digital archive of peer-reviewed biomedical and life-sciences journal literature, PubMed Central. But users will also be able to shape the software to suit other formats because the code for the tool is openly accessible and freely adaptable.

The products, initially aimed at scientists, also seek to make it easier for authors and editors to electronically embed into papers details about the research process and its results, such as bibliographies and key phrases. The goal, Microsoft officials said, is to help readers who conduct searches in electronic databases find relevant articles more easily.

The new tools will enable a more dynamic way of discovering and exploring links within enormous and hard-to-search bodies of research, the officials said.

"We've never before addressed what we could put around Office, Excel, SharePoint, and our other programs to make them more useful for science," said Tony Hey, corporate vice president of Microsoft's external-research division. "For example, Word was not tailored for scientific papers. But we decided to see, Can we make it more useful in that way?" ...

Such developments [OA mandates at funders and universities] have increasingly raised concerns about copyrights and fair reuse of archived materials. So to help authors, publishers, and databases embed information about copyrights and licenses in Microsoft Office documents, the company released another free product, called the Creative Commons Add-in for Office 2007.

Mr. Hey says he believes that Microsoft's business goals and academe's needs are in harmony when it comes to research and publishing. Scholarly institutions will happily pay fees, he said, to have companies like his provide products that relieve universities and their faculty members of tasks like managing large databases. After all, he said, scholars are more interested in doing actual research. Mr. Hey, who directed Britain's national e-Science Programme from 2001 to 2005, said that during recent decades he had seen "generations of research scientists sacrificed to being the computer-science techie for their group."

Labels: Hot

Posted by Peter Suber at 7/29/2008 12:20:00 PM.

Monday, July 28, 2008

OA to improve university rankings

Magda R. Brox, Mejora de 'webs' para escalar posiciones [Improve Web sites to increase rankings], El Pa�s, July 4, 2008. Read it in the original Spanish or Google's English.

Comment. The column mentions the rankings by Spain's Centro Superior de Investigaciones Cient�ficas (CSIC), which we blogged previously.

Posted by Gavin Baker at 7/28/2008 09:10:00 PM.

New partners for Flickr Commons

The George Eastman House and the Biblioth�que de Toulouse have joined Flickr Commons and will provide OA to some of their images there. (Thanks to Boing Boing.)

The Biblioteca de Arte-Funda��o Calouste Gulbenkian is also providing access to part of its collection on Flickr, though not as part of Flickr's The Commons project. The images are available under the Creative Commons Attribution-NonCommercial-NoDerivatives license (images in the Commons are in the public domain). (Thanks to Patrick Peccatte.)

Update. See also this article on the Toulouse library's participation from Livres Hebdo (in French). (Thanks to pintiniblog.)

Posted by Gavin Baker at 7/28/2008 08:56:00 PM.

Proposal for open science blog carnival

Plausible Accuracy has two posts proposing a blog carnival on open science:

Posted by Gavin Baker at 7/28/2008 08:41:00 PM.

Cost-benefit analysis of digitizing dissertations

Mary Piorun and Lisa A. Palmer, Digitizing Dissertations for an Institutional Repository: A Process and Cost Analysis, Journal of the Medical Library Association, July 2008. Abstract:

Objective: This paper describes the Lamar Soutter Library's process and costs associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the University of Massachusetts Medical School.

Methodology: Project tasks included identifying metadata elements, obtaining and tracking permissions, converting the dissertations to an electronic format, and coordinating workflow between library departments. Each dissertation was scanned, reviewed for quality control, enhanced with a table of contents, processed through an optical character recognition function, and added to the institutional repository.

Results: Three hundred and twenty dissertations were digitized and added to the repository for a cost of $23,562, or $0.28 per page. Seventy-four percent of the authors who were contacted (n = 282) granted permission to digitize their dissertations. Processing time per title was 170 minutes, for a total processing time of 906 hours. In the first 17 months, full-text dissertations in the collection were downloaded 17,555 times.

Conclusion: Locally digitizing dissertations or other scholarly works for inclusion in institutional repositories can be cost effective, especially if small, defined projects are chosen. A successful project serves as an excellent recruitment strategy for the institutional repository and helps libraries build new relationships. Challenges include workflow, cost, policy development, and copyright permissions.

Posted by Gavin Baker at 7/28/2008 08:35:00 PM.

Ship it or share it

John Wilbanks, "Ship it or share it", john wilbanks' blog, July 24, 2008.

... While we were talking over dinner [Bryan Kirschner] came back repeatedly to the idea that if you�re not going to ship code, you should share the code.

This is an idea that could really benefit the science community. So much work gets left behind on the laboratory equivalent of the cutting room floor that the adoption of this piece of open source philosophy would be welcome.

But, as I get tired of saying, science is a lot more complicated. It takes some work to make the stuff on the cutting room floor useful for other people, whether it�s data, or lab protocols, or DNA vectors. Some of that work becomes part of the lab�s institutional memory and finds its way into other projects at other times. Ship it or share it is going to have a hard road to hoe before it becomes a widely accepted policy.

I would however love to see this become a piece of open notebook science. ...

The issue of how to cite and what citations mean in such an environment is an interesting one, however � you don�t get credit for musing about science, you get credit for proving stuff. We need to have more ways to measure the geneaology of ideas than simple systems based on antique systems of citation, too.

Posted by Gavin Baker at 7/28/2008 07:58:00 PM.

Web searching with CC licenses

Francis Deblauwe, Web Search & CC Licenses, iCommons.org, June 9, 2008.

How many web search engines actually allow to filter for Creative Commons-licensed materials? How many websites allow you to search for CC-licensed images only? How about videos and audio materials? ...

The major, well-known sites that allow CC filtering are Yahoo! and the photo sharing site Flickr. Google basically offers the same convenience but uses plainer wording such as "free to use or share." A few websites use Google�s or Yahoo!�s search technology and rights filtering, e.g., AOL, Go.com. Blip.tv and SpinXpress offer CC filtering for videos and images. Surprisingly, the Internet Archive does not provide CC-licensing as a criterion in searches. After some digging, I found that they offer a cumbersome workaround though. ...

All in all, adoption of CC filtering on search websites is far from widespread. We still have a lot of work to do! In the meantime, the Creative Commons website offers a good search feature that leverages the best of the search site breed.

Posted by Gavin Baker at 7/28/2008 07:54:00 PM.

New IR from Algoma U.

Wishart Library Launches Two New Online Databases, Wishart Library, July 25, 2008. (Thanks to SooNews.ca.)

The Arthur A. Wishart Library at Algoma University is making it easier for the public to access archival resources and student and faculty-driven research with the launch of two new online databases. Both of the searchable databases can be accessed via the Wishart Library website ...

The library has launched two online databases to provide this access. The first holds records for the university's archives and special collections.

"We're bringing some significant regional collections such as Glady McNeice's research on the Ermatinger family, the archives of labour leader, politician and author John Ferris, and the St. Mary's River Marine Society collection to greater public and scholarly notice", said Hernden.

The second database is an institutional repository called DigitalAlgoma designed to hold new student and faculty driven research, often referred to as "born-digital" research. Included in the DigitalAlgoma archives are over two hundred honours theses from past Algoma University psychology and computer science students. ...

Posted by Gavin Baker at 7/28/2008 07:37:00 PM.

Announcing a new OA journal of historical digital entertainment

Henry Lowood, Historical Studies of Digital Entertainment Media, How They Got Game, July 21, 2008. (Thanks to Kotaku.)

The How They Got Game project is pleased to announce that we will be starting up a new journal, with the title Historical Studies of Digital Entertainment Media. ... We have been working with a group of authors for the first issue, which we hope will be published Winter 2009. The theme for this first issue will be "Digital Games: Historical and Preservation Studies." ...

We will be using the Open Journal System of the Public Knowledge Project ... We intend that authors will retain all rights to their contributions. ...

Posted by Gavin Baker at 7/28/2008 07:25:00 PM.

More on the GAO and Public.Resource.Org

Open Access to Compiled Federal Legislative Histories: Coming Soon?, Legal Sources Subject to Open, June 27, 2008.

In recent posts, we�ve spoken highly of Carl Malamud�s efforts to provide public access to government produced legal information. That trend continues here. The United States Government Accountability Office (GAO) produces compiled legislative histories for laws passed by Congress. The GAO has a current contract with Thomson West, whereby the publisher scans the thousands of pages produced and sells access to the information afterwards with the goal of turning a profit . The GAO can access the documents for internal use only, but that free access does not appear to extend to Congress or other governing bodies.

Not long ago, the GAO provided Mr. Malamud digitized copies of a number of histories from the 67th & 68th Congress, as well as from a representative sample of histories from well-known legislation passed since then. These useful and interesting documents have been uploaded to http://bulk.resource.org for public viewing. In an another forward thinking move, Malamud proposed that, after the materials had been given to Thomson West to produce their commercial project, the same documents be used to develop an open access version at no cost to the GAO, other than the original person-hours required to produce the documents. The proposal included a similar arrangement where an outside entity, in this case the highly respected Internet Archive, scan the documents and that the GAO would be provided a digital copy of the scanned material that would be accessible to students, legal professions, and the public at large.

As my Advanced Legal Research class is learning as we speak, finding federal legislative histories can be a difficult row to hoe and a badge of honor for law clerks, first year attorneys, or others new to the legal profession. Having open access to compilations produced by experienced, knowledgeable GAO staff may not make the research easy, but it would be a tremendous leg up. We�ll be watching these developments with keen interest and keeping our fingers crossed.

Posted by Gavin Baker at 7/28/2008 07:20:00 PM.

New OA journal of African studies

Ufahamu: A Journal of African Studies is a new OA journal published by the University of California, Los Angeles' James S. Coleman African Studies Center. The journal was established in 1970.

Posted by Gavin Baker at 7/28/2008 07:13:00 PM.

2 new OA journals from Armenian Academy of Sciences

Two new Open Access Journals from the National Academy of Sciences of the Republic of Armenia, EIFL, July 24, 2008.

Fundamental Scientific Library of the National Academy of Sciences of the Republic of Armenia launched two Open Access journals: Armenian Journal of Mathematics and Armenian Journal of Physics ...

The Open Society Institute Assistance Foundation in Armenia has granted the Fundamental Scientific Library of the National Academy of Sciences with a one year grant to develop an Open Access based scholarly communication system for the national academic community.

The Fundamental Scientific Library of the National Academy of Sciences is the main and largest repository of scholarly publication in the Republic of Armenia. The National Academy of Sciences houses 29 research institutions and is an overall leading producer of scientific content. The National Academy of Sciences publishes 13 peer-reviewed journals with an international reputation. ...

Posted by Gavin Baker at 7/28/2008 07:03:00 PM.

Review of NIH policy and OA

Aaron Welborn, Open or Shut? The Question of Public Access, Off The Shelf, Spring 2008.

... The NIH is the single largest funder of biomedical research in the country. Its $28 billion budget, mostly doled out in the form of grants to medical schools and universities, accounts for nearly one-third of all federal money spent on research every year. ...

Given the amount of tax dollars invested in NIH research, one might assume that the results of NIH-funded studies would already be public information, with or without the new law. Yet that is not generally the case. ...

The situation has provoked some universities, faculty members, libraries, and consumer groups to speak out and demand less restricted access to research that exists for the common good. ...

At the heart of the matter are difficult questions about the economics of research and scholarly publishing:

Who should control the channels of communication that connect the teaching and research community?

What are the costs and benefits of making the system more �open�?

And how are changes in the way people do research already affecting publishers, research institutions, and�of course�libraries? ...

Posted by Gavin Baker at 7/28/2008 05:59:00 PM.

More on the DINI interface to Sherpa/RoMEO

Frank Scholze, Internationalisation of information services for publishers' open access policies: the DINI multilingual integration layer, Philosophy, Ethics, and Humanities in Medicine, July 28, 2008. (Thanks to Michael Schwartz.)

Abstract (provisional): It is essential for the strategy of open access self-archiving that scientific authors are given comprehensive information on publisher copyright policies. DINI, the German Initiative for Networked Information, has developed a German (and potentially multilingual) interface to the English SHERPA / RoMEO service to provide additional information on German publishers' open access policies. As a next step, this interface was enhanced to an integration layer combining different sources on publisher copyright policies. This integration layer can be used in many different contexts. Together with the SHERPA / RoMEO team, DINI aims to build an international support structure for open access information.

Posted by Peter Suber at 7/28/2008 01:35:00 PM.

Journal accommodates the NIH policy

Judith E. Deutsch, JNPT Complies with NIH Open Access Policy, Journal of Neurologic Physical Therapy, June 2008. Not even an abstract is free online, at least so far.

Comment. Just for the record, the NIH policy regulates grantees, not journals or publishers. The question isn't whether a journal complies with a policy which only binds other players, but whether it is willing to publish work by NIH-funded authors. I suppose this is what JNPT means. The question isn't even whether a journal lets authors comply with the NIH policy, since journals are not in a position to allow or disallow it. If a journal offers to publish work by NIH-funded authors but insists that they depart somehow from the NIH policy, then those authors are contractually bound to decline the offer and look for another publisher.

Posted by Peter Suber at 7/28/2008 11:52:00 AM.

TA society journal supports the NIH policy

Paul D. Shepard, Schizophrenia Bulletin and the Revised NIH Public Access Policy, Schizophrenia Bulletin, July 21, 2008. SB is an official journal of the Schizophrenia International Research Society, published by Oxford University Press.

This excerpt picks up after Shepard has summarized the NIH policy and explained that Oxford will automatically deposit SB articles by NIH-funded authors in PMC:

...As awareness of the revised (ie, mandatory) NIH Public Access Policy has grown, several contributors to Schizophrenia Bulletin have noticed that although articles are made freely available on the journal's Web site 12 months after publication, the same material has yet to appear in PubMed Central. The editorial office is working with representatives at Oxford University Press to ensure that this issue is resolved....Until such time as Oxford and PubMed Central reach an agreement regarding the format of publisher-supplied manuscripts, authors and PIs of sponsored research articles appearing in Schizophrenia Bulletin are strongly encouraged to deposit the final, peer-reviewed copy of their manuscripts in PubMed Central as soon as they appear in the Advance Access area of the journal's Web site....

Despite a sluggish start to a process that is now mandated by law, there is widespread support among research institutions and academic libraries for the revised NIH Public Access Policy. Patients, family members, and the grassroots organizations supporting them will enjoy unprecedented access to the primary scientific literature while researchers will benefit from the increased visibility of their work. Archiving the written results of NIH-funded biomedical research, estimated at upwards of 80 000 new manuscripts yearly, in a permanent, universally accessible and searchable archive is also likely to create significant new opportunities for the development of value-added services based on data mining technologies. While the advantages of increased public access to publicly funded science seem incontrovertible, NIH-funded scientists will, at least in the short term, bear the brunt of the effort needed to make the plan work. By permitting publishers to directly deposit material into PubMed Central while insisting that authors and investigators remain active participants in the process, the NIH has created a system in which journals can partner with authors in ushering in this new era of information sharing. We sincerely hope that our colleagues will regard this as an opportunity rather than an annoyance.

Posted by Peter Suber at 7/28/2008 11:24:00 AM.

More on open research and national security

Jeffrey Brainard, Untying the Secret Strings That Bind Research, Chronicle of Higher Education, August 1, 2008 (accessible only to subscribers). Excerpt:

Federal agencies, caught up in the zeal for national security, have been pressing universities to hide research results, even when they come from unclassified projects. The requests are coming more and more often from agencies financing the work, despite a longstanding presidential order that such findings be open and public.

However, the Defense Department � led by Robert M. Gates, a former president of Texas A&M University � has offered an olive branch, releasing a memorandum [in June 2008] that supports the freedom to publish unclassified results of "fundamental" research....

Jacques S. Gansler, a professor in the School of Public Policy at the University of Maryland at College Park...[who] worked for the Pentagon in the late 1990s, supervising research projects, [said] the memo "sends a pretty strong message to other agencies" to back off the restrictive language, he said. "It's an important statement" and "very positive," he said.

A strong message is sorely needed, according to the Association of American Universities and the Council on Governmental Relations. The two groups just issued a report that found 180 instances of restrictive language in research awards to 20 research institutions in 2007. That was up from the 138 found at the same institutions during the association's last such survey, conducted in 2003 and 2004....

The report says that college officials accepted most of these restrictions "with hesitation after protracted negotiations." In only 16 of the 180 instances, colleges rejected the federal money rather than accept the limitations.

The Defense Department memo makes it clear that the agency is asking universities for too much. It came from John Y. Young, the under secretary for acquisition, technology, and logistics, who oversees Pentagon research. It reiterates a 1985 order issued by President Ronald Reagan that fundamental research should generally not be classified. The Bush administration has endorsed that order, but apparently agencies have been paying scant attention....

Mr. Gansler, in "Science and Security in the Post 9/11 World," a report he helped write last year for the National Research Council, argued that restrictions on unclassified work have created more harm than good. Openness, he said, has helped keep basic research at American universities at the cutting edge, actually benefiting America's national security....

PS: I can't find the Young memo online. But if anyone else does, please drop me a line.

Posted by Peter Suber at 7/28/2008 10:59:00 AM.

Sunday, July 27, 2008

More on Google v. OCA

Jean-Claude Gu�don, Who Will Digitize the World's Books? New York Review of Books, August 14, 2008. A letter to the editor in response to Robert Darnton, The Library in the New Age (June 12, 2008). Excerpt:

...Robert Darnton extols the value of Google's project to digitize the collections of major research libraries. As he puts it, it is a way to make "all book learning available to all people." While there is much truth in this statement, there are some important considerations about the Google project that should be raised....

[I]t is important to clarify what Google is offering: it is not a digital text that the library will be able to share unconditionally with others. In its contracts with the nineteen libraries now in its consortium, Google has stipulated that the "Universal Digital Copy" of digitized books it provides must be protected from non-Google Web software; and that the number of downloads from texts digitized by Google will be limited. Only Google can aggregate collections of different libraries in order to create the larger digital database that is the most valuable part of the consortium project.

Put another way, Google has strictly limited the "computational potential" of digitized books....

With Google's digitization, for example, it is possible to conduct advanced text mining within a single library's collection; but only Google can provide access through its own Web site to the entire pool of scanned books in the nineteen libraries with which it now has contracts....

It appears that Google is striving to become the main dispenser of algorithmic power over digital books....To give a single company such a grip on the collective memory of the world, its analysis, and even its meaning is frightening to say the least.

Dozens of libraries have understood the danger of the Google Book maneuver and have joined the Open Content Alliance (OCA)...[which] seeks to promote large-scale digitization, but it does so without putting shackles on the participating libraries. Alas, the OCA has nothing like Google's deep pockets, and the recent withdrawal of Microsoft from the alliance makes the OCA's position even more difficult.

But there may be some hope in this situation. Since many different groups have an interest in the free availability of digital texts, the process of digitization itself could be distributed among a wide variety of libraries and other independent groups, much in the way of contributions to Wikipedia and Project Gutenberg. Digitization clubs could emerge not only in public libraries but in schools and museums. In short, mass digitization projects should be designed in ways that are not dependent on market-based corporations or on government subsidies, but can nevertheless profit from forms of support from either kind of institution.

Libraries can have a very important part in promoting these projects and enforcing the standards that must accompany them. In so doing, they would be acting as institutional citizens of the digital document age, and not as grateful (and somewhat passive) consumers of Google's apparent largesse.

From Robert Darnton's response:

...I share Jean-Claude Gu�don's worry about the danger of one company monopolizing the "computational potential" of digitized texts, and I agree that the Open Content Alliance is a good thing. But is it an adequate alternative to Google? Grassroots digitizing may help a thousand flowers bloom....But we need to search, mine, and study millions of volumes from all the collections of our research libraries.

Libraries have accumulated those volumes at great cost over many generations, but they have kept most of them within their walls. Digital technology now makes it possible for this common intellectual heritage to come within the range of the common man and woman. Yet corporate interests, flawed copyright laws, unfair restrictions on fair use, and many other obstacles block the public's access to this public good. By removing those obstacles, the United States Congress can clear the way for a new phase in the democratization of knowledge. For my part, I think congressional action is required to align the digital landscape with the public good.

Posted by Peter Suber at 7/27/2008 08:52:00 PM.

More evidence that mandates work

Alma Swan shares some preliminary findings on repository deposits in a message posted to the AmSci OA Forum, July 27, 2008. Excerpt:

...I have the results of a small survey of European repository managers waiting for analysis. I have now carried out a v. quick analysis of the relevant questions. The total number of responses from institutional respository managers was 42...

The survey had a question asking how easy it had been to collect content into the repository. I used the answers from that and cross-analysed those against the answers to three other questions: - who deposits the items in the repository - who creates the metadata - what kind of open access policy the institution has.

I apologise for having to describe the results in words only. Normally I would do this graphically too, to make eyeballing easy, but I can't do that here [in a listserv post]....:

1. Comparing the difficulty of collecting content for IRs with (a) no institutional policy, (b) encouragement only and (c) mandatory deposit:

1a. Repositories with no institutional policy for the repository: - Repositories finding it very easy or fairly easy to collect content: 1 - Repositories finding it very difficult or fairly difficult to collect content: 6 - Repositories finding it possible but not easy to collect content: 8

1b. Repositories with an institutional policy encouraging authors to make their work open access: - Repositories finding it very easy or fairly easy to collect content: 2 - Repositories finding it very difficult or fairly difficult to collect content: 3 - Repositories finding it possible but not easy to collect content: 9

1c. Repositories with a mandatory institutional policy on open access: - Repositories finding it very easy or fairly easy to collect content: 5 - Repositories finding it very difficult or fairly difficult to collect content: 0 - Repositories finding it possible but not easy to collect content: 1

Conclusion: The IRs with mandated deposit have the least difficulty collecting content.

2. Comparing the difficulty of collecting content for IRs that have deposit by (a) authors only, (b) librarians only (c) both: ...

Conclusion: The IRs with author-only deposit have the least difficulty collecting content.

3. Comparing the difficulty of collecting metadata for IRs with author deposit vs. librarian deposit: ...

Conclusion: The IRs with author deposit have the least difficulty collecting metadata.

Now that I've been reminded of this survey I shall put out another call for responses. It was specifically designed to compare European repository experiences with the largish study carried out by Charles Bailey and colleagues on US repositories and published by the ARL (SPEC Kit 292). What I haven't done is delve into the registries of repositories, find the biggest ones and ask the managers how things get into their database....

I also encourage anyone interested in this topic to read the very informative and insightful paper on the topic by Les Carr and Tim Brody, Size isn't everything: sustainable repositories as evidenced by sustainable deposit profiles....

Posted by Peter Suber at 7/27/2008 03:19:00 PM.

Interview with Heather Joseph

Mary Page and Bonnie Parks, An Interview with Heather Joseph, Serials Review, June 2008. Only this abstract is free for non-subscribers, at least so far:

Abstract: Heather Joseph talks about her career with SPARC and BioOne. She discusses the NIH mandate that NIH-funded research will be deposited into PubMed Central, and she shares her views on some of the controversial issues the mandate has raised about copyright, peer review, and embargo periods. She also addresses the recent decision by the Harvard faculty to make their scholarly output accessible through the university�s institutional repository, and she suggests ways that librarians can help their faculties prepare for open access.

From the body of the interview:

HJ: ...I think that one central contribution of SPARC has been raising the profile of scholarly communications issues. The issue has expanded from a library-centric issue of �journals cost too much� into a conversation about leveraging new opportunities to expand the scope of dissemination of the results of science, research, and scholarship. It involves not just the library community, but also researchers and the academy, as well as national and international policymakers. The issue of access to and use of scholarly output has become one of great public interest....

MP: Let�s talk about the recent National Institutes of Health mandate that all NIH-funded research will be deposited into PubMed Central. Is this a defining moment for the open access movement?

HJ: I think it is. The debate and discussion on this particular policy played out very publicly over the past several years. Interest wasn�t just limited to the academy and trade publications....The fact that Congress signed this policy into law imbues a significance that cannot be overstated. This isn�t simply an interesting proposal by one special interest group: this is a fully vetted, thoroughly discussed policy that is now the law of the land. Certainly that represents a watershed moment for open access, and it has implications far beyond a single government agency in the United States....

MP: How would you respond to a researcher who deposits his work in his institutional repository where it will be freely available? Why does he have to deposit it into PubMed Central as well?

HJ: First of all, I applaud researchers who take the critical step of ensuring broad access to their work by placing it in an institutional repository....Additionally, I�d expect over time that institutional repository managers will work out some system or systems with the NIH to either allow PubMed Central to sweep local repositories to automatically harvest approved, NIH-funded manuscripts, or for the local repository to automatically upload such manuscripts directly to PubMed Central....

MP: Do you think the NIH mandate will become a model for other government funding agencies? If yes, have you begun working on that effort?

HJ: ...It�s entirely probable that other United States agencies, and agencies abroad, will follow suit. I wouldn�t necessarily expect perfect clones of the NIH policy; there should and will be differences that reflect the unique nature of different disciplines. But I do think that basic premise of open access to the results of research will be a path that the majority of agencies that invest in research will pursue, as they increasingly recognize that broad access provides a greater return on their investment....

Posted by Peter Suber at 7/27/2008 03:07:00 PM.

OA journal statistics by country

The DOAJ has added a page of statistics by country. For each of 90 countries, you can quickly see how many OA journals it had in the DOAJ in any of the past seven years (for example, Japan in 2004 = 72) and how many it added that year (Japan in 2004 = 47).

Each number links to a list. For example, here's the list of 47 Japanese OA journals added to the DOAJ during 2004.

PS: This will be very useful for tracking the growth of OA journals over time and their spread to different countries.

Posted by Peter Suber at 7/27/2008 03:01:00 PM.

A taxonomy for the openness of data

Melanie Dulong de Rosnay, Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness, a preprint, self-archived July 17, 2008.

Abstract: Molecular biology data are subject to terms of use that vary widely between databases and curating institutions. This research presents a taxonomy of contractual and technical restrictions applicable to databases in life science. It builds upon research led by Science Commons demonstrating why open data and the freedom to integrate facilitate innovation and how this openness can be achieved. The taxonomy describes technical and legal restrictions applicable to life science databases, and its metadata have been used to assess terms of use of databases hosted by Life Science Resource Name (LSRN) Schema. While a few public domain policies are standardized, most terms of use are not harmonized, difficult to understand and impose controls that prevent others from effectively reusing data. Identifying a small number of restrictions allows one to quickly appreciate which databases are open. A checklist for data openness is proposed in order to assist database curators who wish to make their data more open to make sure they do so.

PS: See our two previous posts on de Rosnay's research: (1) Ethan Zuckerman's summary, and (2) Shirley Fung's associated web site.

Posted by Peter Suber at 7/27/2008 02:52:00 PM.