Open Access News

News from the open access movement

Saturday, February 04, 2006

Matt Villano, Next-Gen Libraries, Campus Technology, February 1, 2006. Profiles of notable digital libraries at several colleges and universities, including Earlham College. Villano touches on metadata issues, including the OAI-PMH.

(PS: I've been on the Earlham faculty since 1982 but can take no credit for its exemplary digital library --which frees me to express my pride at seeing its achievement recognized.)

Posted by Peter Suber at 2/04/2006 01:49:00 PM.

Measuring OA progress

Jan Velterop, Sizing up opponents, The Parachute, February 4, 2006. Excerpt:

A slight sense of despondency overcame me when I saw in a number of recent posts on various discussion fora about open access, that the fallacy of the number of journals being a measure of size (of activity or the amount of article published in a certain area) is alive and well. The fallacious argument is used by members of the pro-open-access camp as well those from anti-open-access circles. The pros are saying �look how many open access journals there are!� and the antis �look how few open access journals!�, either of them proving or disproving exactly nothing....Even if journals were more uniform in size, counting open access journals to establish how much peer-reviewed material is available with open access is flawed. It is with very good reason that the Bethesda Statement says �Open access is a property of individual works, not necessarily journals or publishers.� Some BioMed Central journals have non-open-access articles and an increasing number of journals will publish open access material (e.g. Springer�s 1250 odd titles and a growing selection of OUP's and Blackwell�s titles, among others). Number of articles is a better measure than numbers of journals, but what seems more important to me is the number of opportunities that authors have to publish with open access. They have grown dramatically over the last year.

And in another post, The joys of choice, half an hour later:

In the previous post I questioned the validity of �number of journals� as proof for the amount of publishing activity in open access. A follow-up question I have is this: why is all this a priori �proof� necessary in the first place? What �proof� is needed to show that open access articles are accessible to more people? It is in the very concept! What �proof� is needed to demonstrate that paying an amount upfront for the service of publishing is worse, or better, for its economic sustainability than paying for subscriptions? The proof of that particular pudding is simply in de eating. Any choices between open access and non-open access will be made by those who actually have the choice: authors and their (financial) backers. The latter (the backers) can even impose that choice. Publishers can't � and shouldn't. The only thing to do for publishers � be they societies or independent outfits � is to offer the choice.

Posted by Peter Suber at 2/04/2006 12:58:00 PM.

More free content in Highbeam

Highbeam Research, which searches free and priced literature, has added 1.5 million articles to the free side of the service. From the press release (January 30, 2006):

HighBeam Research...announced today a series of upgrades to its collection of content on the HighBeam Research Engine, offering more than 1.5 million full-text articles to individual researchers for free and adding extensive premium reference and news articles from Knight Ridder, Oxford University Press and The Washington Post. Full-text articles from more than 200 sources are now offered for free, permanently, to all HighBeam users, even those who are not registered. For the free offering, HighBeam selected BusinessWire, Financial Management, Science News, USA Today magazine and many other sources from its HighBeam Library � a collection of more than 35 million articles from more than 3,000 business, trade, academic, special interest and general interest publications. As part of HighBeam's new slate of research services, free access to The Oxford Pocket Dictionary of Current English and The Oxford Pocket Thesaurus of Current English will soon be available through the HighBeam Reference area of the research engine. Additionally, HighBeam has begun spidering and linking to free publications on the open web, covering areas in which its users have indicated an interest. Among the more than 100 publications and nearly 100,000 articles indexed by the HighBeam Library database are American History, CIO, Financial Advisor, Inc. and Wine Enthusiast.

Posted by Peter Suber at 2/04/2006 09:41:00 AM.

Balancing openness and security in biomedicine

Institute of Medicine and National Research Council, Globalization, Biosecurity, and the Future of the Life Sciences, National Academies Press, 2006. NAP books like this are OA for reading, but do not support cutting and pasting, making it difficult to quote an excerpt. So here's an excerpt from the summary by Research Research:

A new report from the National Research Council favors the �free and open� exchange of information in the life sciences, but also recommends the development of explicit global codes of ethics as well as the creation of �authorities� around the world to monitor and respond to potential misuse of biomedical capabilities. The report emphasizes the importance of ensuring that the results of fundamental research remain unrestricted, unless issues of national security require that they be classified, and it also stresses that any biosecurity regulations should be scientifically sound and non-obstructive to the progress of biology.

Posted by Peter Suber at 2/04/2006 09:20:00 AM.

NASA withdraws from OA Spaceline database

NASA and the NLM have collaborated since 1993 on the OA Spaceline database on research in the space life sciences. But budget cuts at NASA are forcing it to withdraw from the partnership.

Posted by Peter Suber at 2/04/2006 09:08:00 AM.

Friday, February 03, 2006

Fedora v. 2.1 now available

Fedora has has released version 2.1. From the site:

dora 2.1 is a very significant release of Fedora since it introduces the new Fedora security architecture (with pluggable authentication and XACML-based policy enforcement), the Fedora Service Framework, and many other new features. The introduction of the new security architecture resulted in a significant amount of code refactoring, therefore, Fedora 2.1 has been tested with an extensive suite of new JUnit tests. The codebase and supporting libraries have also been updated making Fedora 2.1 compatible with Java 1.5. Some of the system documentation is still evolving. Most documentation is complete, but several documents are still being enhanced and improved. To identify documents that will continue to evolve, please look for an icon that says "Draft Doc" under the document title. As revisions are made, we will publish the most updated versions of these documents on the Fedora web site and send alerts to the Fedora Users mail list. This release of Fedora was also tested against a large production testbed collection, where bulk loading was performed on for a collection of 2 million digital objects with approximately 160M triples in the Kowari-based Resource Index. Also, coinciding with this release is the publication of the Community-Developed Tools page on the Fedora web site. This is a clearinghouse for community-developed, open-source applications, tools, and services that work with Fedora repositories.

Posted by Peter Suber at 2/03/2006 02:43:00 PM.

OA research resources from the Wellcome Trust

The Wellcome Trust has made a web page of the Wellcome-funded research resources that support biomedical research. Some of the resources are physical (e.g. tissue specimens) and some digital. Apparently the digital resources are all OA.

Posted by Peter Suber at 2/03/2006 11:56:00 AM.

OA to weather data: US v. Europe

An anonymous blogger on Stormtrack calls on Europe to provide OA to its weather data, roughly as the US does.

Posted by Peter Suber at 2/03/2006 10:45:00 AM.

University-based OA journals

Wayne Johnston spoke yesterday on Publishing Open Access Journals At Your University at the meeting of the Ontario Library Association. The presentation isn't online yet but Jen ("Canuck Librarian") has blogged some notes about it. Excerpt:

This session unfortunately had low attendance, but I guess that means the rest of us had a better opportunity to interact with the speaker....Wayne talked a bit about why open access is a good idea, and necessary (freely available, immediately available, complete, reader's can use/copy/disseminate as long as integrity preserved and cited). He also talked a fair bit about the Budapest Open Access Initiative as well as some other people and orgs such as SPARC, Public Library of Science and Peter Suber. Most of us know that the cost of scholarly publication is skyrocketing, and libraries have to pay outrageous amounts for journals, especially in areas such as the sciences. Open Access journals helps provide a place for researchers to exchange ideas freely - much more easily than with commerical publications. There are two concerns with open access e-journals: cost and prestige. I think there must be alternative ways to funding - we have to use our imagination and design a system that works. As for prestige, that is something that will change over time. Currenty there are about 300 e-journals hosted using OJS, many (or most?) are refereed journals and go through the same or similar process of submission and review as commercial publications. As more and more faculty and researchers start to use these e-journals, their acceptance will grow. (a little side note: at dinner tonight we talked a bit about ISI and got into impact factors; how often do you think that happens!?!) And just think about this. If the price of publication and dissemination of research in Canada is so high, libraries and researchers are struggling with costs, what about other areas of the world? Wayne mentioned how researchers all over the world have to publish in journals from the "North" and so find the costs too exorbitant, and their institutions have a hard time subscribing to them all. How is that any good for progress? They have to pay to get it published and then pay to get access to it, it seems. So open access journals are certainly the way of the future (and present) in my lowly opinion, but we have to keep working on it. FYI OJS is only one journal management system, there are 28 more listed on SPARC.

Posted by Peter Suber at 2/03/2006 10:28:00 AM.

Intermediary copying and the Google Library project

Paul Ganley, Google Book Search: Fair Use, Fair Dealing and the Case for Intermediary Copying, a preprint, self-archived January 13, 2006. (Thanks to William Walsh.)

Abstract: This article examines the legality of Google's Library Project under U.S. and U.K. copyright law. The Library Project provides a useful example of the divergence in approach to copyright exceptions in these two jurisdictions. In particular, whilst Google's plans have generated a great deal of controversy, it at least has an arguable case under U.S. law that its use is fair use. No analogous argument can be made under U.K law. The main purpose of this article is to highlight this distinction and to suggest that U.K copyright law is failing to adequately account for transformations in the mode and manner in which individuals interact with information. Following a brief introduction in part 1, part 2 begins by explaining how Google's 'Book Search' program (formerly 'Google Print') operates and briefly describing the two lawsuits issued against the 'Library Project' aspect of the service by the Author's Guild and a number of prominent publishers. Part 3 offers a preliminary assessment of whether Google's activities are lawful under U.S. copyright law. In attempting to answer this question, Google's case is presented with a 'positive spin'; not in an effort to predict the outcome of any future trial, but rather to illustrate that Google can reasonably argue that its use is a privileged one. Part 4 then considers how Google would fare under U.K. law. The conclusion, unlike the U.S. law analysis, suggests that Google would have little chance of success if its case was being heard in the U.K. Part 5 asks whether this is a desirable result and concludes that, given recent advances in the technological landscape, it is not. This conclusion is based, in particular, on Professor Zittrain's concept of 'generativity' and Professor Frischmann's economic theory of infrastructure. Finally, Part 6 describes how U.K. copyright law could accommodate uses such as Google's within its existing scheme of exceptions. A specific defence for 'intermediary' copying premised on the 'temporary copies' exception recently enacted as section 28A of the Copyright, Designs and Patents Act 1988 is outlined, and alternatively a new defence of 'fair dealing for informational purposes' is proposed.

Posted by Peter Suber at 2/03/2006 09:46:00 AM.

TOCs by RSS for libraries

New JISC project will feed tables of contents into library catalogues. A press release dated today. Excerpt:

A new JISC project is developing an RSS news feed service that will automatically feed publisher and e-journal information into library catalogues. Led by the publisher Emerald, and supported by library supplier Talis, the project will finish in July 2006. The University of Derby is working as the test bed and evaluation partner. The open source software developed by the TOCRoSS project will be freely available to further and higher education establishments, publishers, and library management systems developers. With TOCRoSS in place, e-journal table of content data will be fed automatically into library catalogues without the need for cataloguing, classification or data entry. This will improve the accuracy of records, save time for library staff and deliver a more integrated service to library users. It will be of particular value to academic libraries, where students often choose search engines such as Google over the library catalogue for tracking down articles and information. TOCRoSS will also deliver benefits to the publishing community, improvements in the dissemination of data and information to libraries and will lead to a corresponding improvement in the value and management of e-journals and other resources. Paul Evans, head of web services at Emerald Group Publishing, explains the value that the project will bring to both library staff and users: �Cataloguing journals and articles has, up until now, been considered a �luxury� in resource terms by many academic institutions. Once TOCRoSS is complete, cataloguing journal articles becomes automatic and the commonly asked question, 'Why can�t I find any articles on the OPAC?' becomes obsolete.� When the project is complete, the TOCRoSS software will be available with an Open Source license, making it possible for publishers and library management system suppliers to use it freely with their products and services. A demonstrator service is a key deliverable of the project and is expected to be up and running by June 2006.

(PS: It appears that the RSS feeds will not be available to individuals. If not, why not?)

Posted by Peter Suber at 2/03/2006 09:26:00 AM.

NSF endorses OA to data

If you remember, last September, the U.S. National Science Foundation (NSF) released version 4.0 of its report, NSF�s Cyberinfrastructure Vision For 21st Century Discovery, September 26, 2005. The report outlined the agency's vision for cyberinfrastructure and sought public comment.

Now the agency has released version 5.0 of the report (January 20, 2006). It doesn't discuss OA to literature but strongly endorses OA to data. Excerpt:

At the international level, a number of nations and international organizations have already recognized the broad societal, economic, and scientific benefits that result from open access to science and engineering digital data. In 2004 more than thirty nations, including the United States, declared their joint commitment to work toward the establishment of common access regimes for digital research data generated through public funding. Since the international exchange of scientific data, information and knowledge promises to significantly increase the scope and scale of research and its corresponding impact, these nations are working together to define the implementation steps necessary to enable the global science and engineering system. The U.S. community is engaged through the National Committee on Data for Science and Technology (CODATA). CODATA is working with its international partners, including the International Council for Science (ICSU), the International Council for Scientific and Technical Information (ICTSI), the World Data Centers (WDCs) and others, to create a Global Information Commons for Science. As currently conceived, this online �open-access knowledge space� will: promote the promise of easy access to and use of scientific data and information; promote wider adoption of successful methods and models for providing open availability on a sustainable basis; facilitate reuse of publicly-funded scientific data and information, as well as cooperative sharing of research materials and tools among researchers; and, encourage and coordinate the efforts of many stakeholders in the world�s diverse science and engineering community to achieve these objectives....
At the institutional level, colleges and universities are developing approaches to digital data archiving, curation, and analysis. They are sharing best practices to develop digital libraries that collect, preserve, index and share research and education material produced by faculty and other individuals within their organizations. The technological implementations of these systems are often open-source and support interoperability among their adopters. University-based research libraries and research librarians are positioned to make significant contributions in this area, where standard mechanisms for access and maintenance of scientific digital data may be derived from existing library standards developed for print material. These efforts are particularly important to NSF as the agency considers the implications of not just making all data generated with NSF funding broadly accessible, but of also promoting the responsible organization and management of these data such that they are widely usable....
Motivated by a vision in which science and engineering digital data are routinely deposited in well-documented form, are regularly and easily consulted and analyzed by specialists and nonspecialists alike, are openly accessible while suitably protected, and are reliably preserved, NSF�s five-year goal is twofold: [1] To catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable; and [2] To support development of a new generation of tools and services facilitating data mining, integration, analysis, and visualization essential to turning data into new knowledge and understanding....The agency will also develop a suite of coherent data policies that emphasize open access and effective organization and management of digital data, while respecting the data needs and requirements within science and engineering domains....
The following principles will guide the agency�s FY 2006 through FY 2010 investments....Science and engineering data generated with NSF funding will be readily accessible and easily usable, and will be appropriately, responsibly and reliably preserved....
Data tools created and distributed through these projects will include not only access and ease-of-use tools, but tools to assist with data input, tools that maintain or enforce formatting standards, and tools that make it easy to include or create metadata in real time. Clearinghouses and registries from which all metadata, ontology, and markup language standards are provided, publicized, and disseminated must be developed and supported, together with the tools for their implementation. Data accessibility and usability will also be improved with the support of means for automating cross-ontology translation....
Through a suite of coherent policies designed to recognize different data needs and requirements within communities, NSF will promote open access to well-managed data recognizing that this is essential to continued U.S. leadership in science and engineering....In addition to addressing the technological challenges inherent in the creation of a national data framework, NSF�s data policies will be redesigned to overcome existing sociological and cultural barriers to data sharing and access. Two actions are critical. NSF will conduct an inventory of existing policies, to bring them into accord across programs and to ensure coherence. This will lead to the development of a suite of harmonized policy statements supporting data open access and usability. NSF�s actions will promote a change in culture such that the collection and deposition of all appropriate digital data and associated metadata become a matter of routine for investigators in all fields. This change will be encouraged through an NSF-wide requirement for data management plans in all proposals. These plans will be considered in the merit review process, and will be actively monitored post-award....
[M]any large science and engineering projects are international in scope, where national laws and international agreements directly affect data access and sharing practices. Differences arise over privacy and confidentiality, from cultural attitudes to ownership and use, in attitudes to intellectual property protection and its limits and exceptions, and because of national security concerns. Means by which to find common ground within the international community must continue to be explored.

Posted by Peter Suber at 2/03/2006 07:58:00 AM.

Thursday, February 02, 2006

Editorial in a new OA journal

Ge Wang, Message from the Editor-in-Chief, International Journal of Biomedical Imaging, 2006. An editorial. Excerpt:

[T]his journal will be published using an Open Access publishing model, which means that accepted papers will be freely and immediately available on the journal�s website without any access barriers. In addition, a print edition will be made available at a minimal cost. It is our belief that the Open Access model will become a major force in scientific publishing in the near future, and we hope that IJBI will be a leading journal in this new movement....We are confident that our journal will soon establish itself as a reputable vehicle in the field, due to the advantages of its Open Access publishing model, comprehensive coverage, novel features, and high academic standards.

Posted by Peter Suber at 2/02/2006 10:36:00 PM.

Wanted: contractor to manage UK version of PMC

The Wellcome Trust is looking to hire "a single contractor to host and manage a suite of applications to provide a UK version of PubMed Central (UKPMC)." From the site:

The aim of this initiative is to create a stable, permanent and free-to-access digital archive of the full-text, peer-reviewed research publications (and datasets) that arise from the research funded by the UKPMC Implementation Group. UKPMC will be fully searchable and provide context-sensitive links to other online resources, such as gene and chemical compound databases.

Also see the UKPMC requirements summary and the outline procurement timetable.

Posted by Peter Suber at 2/02/2006 04:08:00 PM.

Open Access RNAi Program?

Open Biosystems has launched an Open Access RNAi Program. From today's press release:

Open Biosystems, Inc., focused on the commercialization of leading-edge life science research tools for drug discovery, announced today its Open Access RNAi Program. Participation in this program allows all research laboratories within an institution access to its choice of Open Biosystems' portfolio of genome-wide RNAi resources. The Open Access RNAi Program, unique in this field, supports Open Biosystems' vision of supporting basic and medical research and reinforces its commitment to expanding access to research reagents for the life sciences community....The Open Biosystems Open Access RNAi Program gives entire academic systems, including multiple campuses, access to the company's advanced shRNA libraries, priority technical support, continued access to all extensions of existing libraries as well as library upgrades. In this manner, Open Biosystems will support university customers' research without imposing heavy financial burdens on individual labs. The program is customizable and can be tailored to fit the diverse needs of research institutions worldwide.

(PS: Is this really a free program, or does it use the term "open access" to mean "pay and then you'll have access"? Unfortunately, I can't tell because the program link is currently dead.)

Posted by Peter Suber at 2/02/2006 01:16:00 PM.

February SOAN

I just mailed the February issue of the SPARC Open Access Newsletter. In this issue, I list six things that publishing scholars need to know about OA, and take a close look at how Google AdSense ads can help OA journals without creating conflicts of interest. The Top Stories section takes a brief look at new OA policy proposals from three continents, new books on OA, new developments among OA repositories and OA hybrid journals, and a little more news on the CURES Act.

Posted by Peter Suber at 2/02/2006 12:06:00 PM.

Free publishing platform for OA journals

Scholarly Exchange has announced a free journal publishing platform for OA journals. From today's announcement:

Combining Open Journal Systems public-domain software with complete hosting and support, this new service offers scholars unrivaled freedom and flexibility to produce academic journals - and at a price that fosters the open access model. Scholarly Exchange offsets its costs by contextually appropriate on-screen advertising, supplied by such sources as Google and Yahoo. Revenues exceeding the basic support threshold of $1500 yearly are shared with each journal, to help defray editorial costs. Journals that prefer an advertising-free environment may pre-pay the technology cost and enjoy the same platform benefits. The 501(c)3 public charity serves as a facilitator for open access journals rather than a publisher. SE provides options with the platform for OAI harvesting and LOCKSS compliance. The SE website offers information resources to help with archiving, ultra-low-cost tagging/data conversion, and short-run/print-on-demand services commercially available....Participating journals retain all rights to their metadata and content and may charge submission or publication fees to help defray editorial costs. Scholarly Exchange plays no role in the creation of the information or its ultimate ownership, only in the sharing of the highest quality of scholarship as determined by the scholars who produce it. The free-platform service begins February 15, 2006.

(PS: This looks like an excellent idea and I'd like to hear from journals that try it out. By chance, I have an article on using Google ads to support OA journals in the February issue of SOAN, due out later this morning.)

Posted by Peter Suber at 2/02/2006 08:50:00 AM.

Wednesday, February 01, 2006

US Copyright Office guidelines on orphan works

The U.S. Copyright Office has released its guidelines on orphan works (January 31, 2006). Excerpt:

This Report addresses the issue of �orphan works,� a term used to describe the situation where the owner of a copyrighted work cannot be identified and located by someone who wishes to make use of the work in a manner that requires permission of the copyright owner. Even where the user has made a reasonably diligent effort to find the owner, if the owner is not found, the user faces uncertainty � she cannot determine whether or under what conditions the owner would permit use....We recommend that the orphan works issue be addressed by an amendment to the Copyright Act�s remedies section.
Recommended Statutory Language....[W]here the infringer: (1) prior to the commencement of the infringement, performed a good faith, reasonably diligent search to locate the owner of the infringed copyright and the infringer did not locate that owner, and (2) throughout the course of the infringement, provided attribution to the author and copyright owner of the work, if possible and as appropriate under the circumstances, the remedies for the infringement shall be limited as set forth [below]....[N]o award for monetary damages...shall be made other than an order requiring the infringer to pay reasonable compensation for the use of the infringed work; provided, however, that where the infringement is performed without any purpose of direct or indirect commercial advantage, such as through the sale of copies or phonorecords of the infringed work, and the infringer ceases the infringement expeditiously after receiving notice of the claim for infringement, no award of monetary relief shall be made....[I]n the case where the infringer has prepared or commenced preparation of a derivative work that recasts, transforms or adapts the infringed work with a significant amount of the infringer�s expression, any injunctive or equitable relief granted by the court shall not restrain the infringer�s continued preparation and use of the derivative work, provided that the infringer makes payment of reasonable compensation to the copyright owner for such preparation and ongoing use and provides attribution to the author and copyright owner in a manner determined by the court as reasonable under the circumstances; and in all other cases, the court may impose injunctive relief to prevent or restrain the infringement in its entirety, but the relief shall to the extent practicable account for any harm that the relief would cause the infringer due to the infringer�s reliance on this section in making the infringing use.

Comment. This is a good solution to a serious problem. I'm suspending judgment on whether there are better solutions. Bottom line: if you want to copy more than fair use allows, and you cannot find the copyright holder even after a diligent effort, then (if this proposal is adopted) you may proceed to do the copying at a lower risk than under current law. If the copyright holder shows up and sues you, you will minimize your damages if you are careful to attribute the work to its author and copyright holder, if you can. You won't have to pay any monetary damages if you made no commercial use of the copies and stop making or using the copies when asked.

Posted by Peter Suber at 2/01/2006 10:32:00 PM.

Molecular Systems Biology -- gathering momentum?

Molecular Systems Biology is an Open Access, author pays model journal, jointly published by Nature Publishing Group and the European Molecular Biology Organization (EMBO). The journal went live in March 2005 with three research articles and a review paper already solicited, peer-reviewed, edited, formatted and ready to go. Subsequent months have seen a steady trickle of research articles until January 2006. Four research articles were published last month, fully 25% of the articles published over the first 11 months of the journal. Is this a bump, a new plateau, or the beginning of a sustained upsurge? Time will tell.

Posted by George Porter at 2/01/2006 02:26:00 PM.

Interview with David Lipman

Juan Carlos Perez, Searching for Answers: NCBI�s David Lipman, Bio-IT World, February 1, 2006. Excerpt:

David Lipman [is the] director of the National Center for Biotechnology Information, part of the National Library of Medicine, which hosts various life sciences databases, including the PubMed scientific literature database....
Some people view Google, etc., as increasingly useful research tools in the life sciences, joining PubMed and other established services. What is your opinion?
Lipman: People sometimes have this a bit backwards. The primary point is that there is a huge amount of information on the Web. The best way to make the Web more useful for research and education is to get more information out there. What we find on our site is that the more good-quality content that we add to a resource, the more it is being used. If you have more gene sequencing data, you have more searching. If you have more full text in PubMed Central, you have more searching. While I want to give MSN and Yahoo and Google a lot of credit for doing a good job, the reality is that if there wasn�t content there, it wouldn�t matter how good those search engines were. And if the search engines were a little less good than they were, you would still be getting to a lot of good content... And if those search engines got better, it wouldn�t make as much of a difference as if there was more good peer-reviewed medical content. That would make a bigger impact....
Are you talking about identifying search patterns from users so you can surface links they might be interested in?
If you did a search on, say, Alzheimer's disease, and you find a PubMed record that looks interesting to you � right now you go to that page and we may have done an incredible amount of work to link that up to all kinds of things � to information about genes that might be involved, to genetic versions of it, to related articles. And the reality is you have to poke around to see that � it�s sort of hidden in a way. What we�ll do instead is, using computational tools, poke around for you and then put them right on the record....The bottom line for us is discovery. We want people to make discoveries....
Is there a specific initiative tied to these improvements?
Yes. I call it the Discovery Initiative. It�s something new. It�s been percolating. Last summer, I went out to visit Google and Amazon�s A9 [search engine] and the folks from Microsoft�s MSN came to visit. I also went up to Boston to meet with folks from the major hospitals there and MIT and Harvard. We�ve really been giving this a lot of thought. In many ways, we have had great success. Lots of people use the site and 2.25 terabytes of data are downloaded from our site everyday. And yet I find it very frustrating because we�ve connected up the scientific information in very precise and powerful ways: a protein structure to the chemical it�s bound to, to genetic data, you name it. All that is connected up. And yet very few of our users do more than very simple things with our site....We want them to find answers to questions they didn�t even know they had. They can do that. People who are really experts at using this site can make discoveries almost on demand....We�ve always been working on making incremental improvements, but now I�m talking about something that is a more powerful shift. I�m hoping that without the users having to think about having to learn anything different, they�ll be able to take advantage of databases they didn�t know about and [make] connections between things they don�t have to ask for anymore. This is a new phase.

Posted by Peter Suber at 2/01/2006 12:45:00 PM.

Berlin 4 details finally available

The Berlin 4 conference, Open Access - From Promise to Practice (Potsdam-Golm, March 29-31), now has a web page.

Normally I log conference details to my conference page without blogging them. But because this is such an important event for OA and the details are coming out so close to the deadline, I want to do all I can to let everyone learn the details in time to make a decision.

Also see the preliminary program. If you plan to attend, make sure to register before March 22, 2006.

Posted by Peter Suber at 2/01/2006 11:48:00 AM.

Despite expectations, search engines not biased toward popular sites

Santo Fortunato and three co-authors, The egalitarian effect of search engines, a preprint self-archived November 1, 2005. Also see the authors' lay summary of their results, Googlearchy or Googlocracy, in the February 2006 issue of IEEE Spectrum.

Abstract: Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web.

Comment. This study supports the hope that if you can get your work online and indexed by the major search engines, then users will be able to find it, even if there are already many other, older works online on the same topic with more incoming links. Of course the best way to get your work online and indexed by the major search engines is to make it OA. BTW, the effect observed by the authors depends on users running specific rather than general searches and should improve over time as more users become intelligent searchers.

Posted by Peter Suber at 2/01/2006 10:01:00 AM.

Tuesday, January 31, 2006

OA is the future

Kuan-Teh Jeang, Open Access And Public Archiving: The Future Of Scientific Publishing? NIH Catalyst, Jan-Feb, 2006 (accessible only to NIH employees). (Thanks to Jennifer Heffelfinger.) Excerpt:

Traditional journals, like print journalism, remain the dominant force at the moment. However, slowly but surely, the open-access web and electronically based upstarts are gaining traction. Indeed, a senior science writer at the New York Times recently told me --when asked how the Times sees its free web-based competitors-- "We're running scared!"...Now, the pervasiveness of the Internet offers the potential for numerous additional communities --within or outside academia, in rich and in poor nations-- to access previously guarded knowledge. Such access is in keeping not only with the concept that publicly funded science should be shared without charge, but also with the tradition long embraced by scientists that access to large databases such as the genomes of animals and plants and archives like PubMed should be free and public. Nonetheless, broad acceptance of open-access publishing is at a tipping point. Several factors may yet influence its success or failure. The first is the economics of publishing for a wide audience. The web promises to be a low-cost venue that can reach, with unparalleled rapidity, large numbers of geographically dispersed and economically disparate parties. Contrast this availability with the rising cost of the traditional print model, which threatens affordability by even the best-funded libraries in wealthy nations. For example, United Kingdom statistics show that between 1998 and 2003, the average subscription price of academic journals rose by 58 percent while retail prices increased by only 11 percent. A second factor is public demand in developed and developing worlds. The view that at-large access to scientific data is not needed because of lack of public interest is incongruent with empirical experience. Existing numbers indicate that only one-third of the users of PubMed are academicians and researchers, whereas two-thirds are the "public" --clearly not indifferent. As science moves increasingly toward globalization, access models that transcend professional classifications, national boundaries, and accidents of birth are timely and necessary.... Currently, NIH, the Howard Hughes Medical Institute, the United Kingdom's Wellcome Trust, Germany's Max-Planck Society and Deutsche Forschungsgemeinschaft, and France's CNRS and INSERM have all encouraged their funded researchers to deposit peer-reviewed articles into publicly accessible repositories. The two major publishers of open-access journals --Public Library of Science (PLoS) and Biomed Central-- have also adopted policies of directly and immediately depositing their published works into PubMed Central....I have an interest in the evolution of scientific publishing. Twelve years ago I helped start a traditional journal, the Journal of Biomedical Science, which I edited for more than 10 years. Two years ago, I left that project to found Retrovirology, an exclusively web-based open-access journal. Although I have an abiding loyalty to my scientific societies and feel that they deserve continuing revenue streams, my personal read of the winds of change is that open-access publishing and publicly accessible digital repositories like PubMed Central may well be the dominant future players....Based on the acceptance that Retrovirology has gained within my scientific community, it seems to me that scientists do look beyond the cover of a journal to recognize the value of open accessibility to their work. Our journal caters to a relatively small cohort of retrovirologists, but it is accessed steadily 1,000 times each day, 30,000 times each month. These numbers are disproportionate to our known academic audience and suggest that a significant percentage of our readers are members of the public who value and trust our content. Public access, public trust, and public archives --are these not the wave of the future of scientific publishing?

Posted by Peter Suber at 1/31/2006 12:54:00 PM.

Cross-searching OA collections and ISI's Web of Science

Dongmei Cao has been blogging tips on how to use the CrossSearch feature in ISI's Web of Knowledge to search OA collections in biology (e.g. PubMed, Agricola), physics and astronomy (arXiv, ADS), and math and computer science (arXiv).

Posted by Peter Suber at 1/31/2006 10:51:00 AM.

More on the British Library digitization project

Helen Beckett, Preserving our digital heritage, ComputerWeekly, January 31, 2006. A detailed look at the British Library project to digitize its holdings, some of which is funded and assisted by Microsoft as part of the Open Content Alliance. The article focuses more on digital preservation than OA.

Posted by Peter Suber at 1/31/2006 10:30:00 AM.

More on the NIH policy and CURES Act

The Winter 2006 issue of ARL's Federal Relations and Information Policy is now online. Section V.B is on the NIH public-access policy and V.C is on the CURES Act. Excerpt:

Based on a review of statistics detailing grantee deposit rates, the NIH Public Access Working Group, comprised of key stakeholders including members of the library community, recommended that researchers be required to deposit articles in PMC in lieu of the current policy which is voluntary. Ann Wolpert, Director of Libraries, MIT, is a member of the Working Group. The library community strongly supports this recommendation. ARL will continue to monitor the NIH policy and work with others in the community, SPARC and the Alliance for Taxpayer Access (ATA) in particular, on this evolving policy....
Introduced on December 14, 2005, by Senators Joe Lieberman (D-CT) and Thad Cochran (R-MS), the bipartisan �American Center for Cures Act of 2005� would expedite the development of new therapies and cures for life-threatening diseases. One provision in the bill calls for free public access to articles stemming from research funded by agencies of the Department of Health and Human Services (DHHS), including NIH, the Centers for Disease Control and Prevention, and the Agency for Healthcare Research and Quality. Under the proposed legislation, articles published in a peer-reviewed journal would be required to be made publicly available within 6 months via NIH's PubMed Central online digital archive. The library associations note that although some final electronic manuscripts are made available on PubMed Central, many are not�and delays in posting research on PubMed sometimes thwart public access to important articles for up to a year. The library announcement is available at www.librarycopyrightalliance.org. ARL will promote the public access provision in the CURES legislation.

Posted by Peter Suber at 1/31/2006 09:12:00 AM.

Monday, January 30, 2006

Beginner's introduction to OAI and OAI-PMH

Philip Hunter, OAI and OAI-PMH for absolute beginners: a non-technical introduction, a PPT presentation at the CERN workshop on Innovations in Scholarly Communication (OAI4) (Geneva, October 20-22, 2005). Self-archived January 30, 2006. Abstract:

1. Coverage:
- Overview of key Open Archives Initiative (OAI) concepts.
- Development of the OAI Protocol for Metadata Harvesting (OAI-PMH).
- Non-technical introduction to main underlying technical ideas.
- Some considerations regarding implementation of OAI-PMH, with particular focus on harvesting issues.
For those who would like an introduction to, or revision of, the main concepts associated with OAI then this session will provide an ideal foundation for the rest of the OAI4 workshop.
2. Audience:
Decision-makers, Managers, Technical staff with no previous OAI-PMH knowledge. This is a tutorial for those who may not themselves do hands-on technical implementation, but might make or advise on decisions whether or not to implement particular solutions. They may have staff who are implementers, or may work with them. Technical staff are likely to prefer the technical tutorials, but may want to attend this one if they are at the very early stage of simply requiring background information.
3. At the end of the tutorial participants will have gained knowledge of:
- the background of the OAI as an initiative;
- how the OAI-PMH developed;
- the uses and functions of OAI-PMH;
- the vocabulary used in discussing OAI;
- problems and issues in harvesting metadata;
- some basic non-technical issues in implementing OAI-PMH;
- some of the technical support/tools available;
- sources of further information in all of these areas.

Posted by Peter Suber at 1/30/2006 03:43:00 PM.

More on OA to theses and dissertations

Diane Le Hénaff and Catherine Thiolon, Gérer et diffuser les thèses électroniques : un choix politique pour un enjeu scientifique, Documentaliste, October 2005. Only an abstract (in French) is free online for non-subscribers, at least so far. Here is Erik Arfeuille's translation of the title and abstract:

Managing and disseminating electronic theses: policy decisions for scientific stakes. Now that the concept of open archives has been accepted by the scientific community, open access to theses has become a major preoccupation for institutes of higher education and research. Disseminating electronic theses is a key concern in providing visibility for and access to scientific documents that although not published has been validated. Following a review of the techniques used to deposit, process and disseminate theses, this article describes STAR, the French plan for depositing, publicizing and archiving this type of record, and insists on the scientific issues of a national policy on electronic dissemination of theses.

Posted by Peter Suber at 1/30/2006 03:37:00 PM.

Two more OA repositories indexed by Thomson's Web Citation Index

Ulrich Herb has announced on SOAF that

[b]oth OA repositories (SciDok the institutional repository at Saarland University, and PsyDok the disciplinary repository for psychological OA content) at Saarland University and State Library (SULB, Germay) will be added to Thomsons Web Citation Index and Current Web Contents.

Posted by Peter Suber at 1/30/2006 01:53:00 PM.

Profile of the Sudan Archive Project

Sudan archiving project turns dry-as-dust documents into bits for easy access, Balancing Act News Update, Issue no. 290. An unsigned news story. (Thanks to Eve Gray.) Excerpt:

Archives conjure up images of rows of shelves with documents gathering dust. And this is precisely the problem a Sudan archiving project had when it wanted to digitalise the archives. In Lokichoggio in northern Kenya, just over the border from southern Sudan, Daniel Large remembers that dust was the main problem �Scanning old documents was difficult among the dust to the point where it clogged the scanner�, writes Isabelle Gross. Large is the project manager of the Sudan Open Archive Pilot Project � a scheme that aims to digitally preserve the documents left by various humanitarian organisations in Sudan and to make them accessible to the public via a website. As Large explains, back in 1989 UNICEF -Operation Lifeline Sudan was only meant to last for three months, but in reality humanitarian work went on for more than 15 years, involving more than a dozen of other NGOs working under UNICEF�s umbrella....In Daniel�s opinion, once preserved, these [documents] will help reveal such things as the history of constraints to aid operations, the evolution of the conflict and changing conditions in locations throughout Southern Sudan over these years. He adds that there is lots of documentation currently scattered across many locations in Kenya and Sudan. Some of these are in a vulnerable condition, including some documents produced after the peace agreement in 1972. The project began after a meeting in Amsterdam between UNICEF and the members of the Rift Valley Institute. The idea was to turn the documents left by UNICEF and other NGOs into usable resources for NGO field practitioners, and more generally, for the Sudanese people, giving them the opportunity to access contemporary and historical knowledge about their country. Large throws his spotlight on the general problem of archives in the humanitarian sector. In his view, just as information is fundamental to the effectiveness and impact of emergency response, knowledge of the history of aid and development operations is important to programme design, implementation and evaluation. Documents from the past can have a practical purpose in the present, but only if they can be readily accessed. The widespread �amnesia� resulting from emergency response and development is mostly a result of a lack of institutional memory and of high staff turnover....According to Daniel Large, the fact that the [Greenstone] software is open source and free will ensure cheap and easy archive accessibility among NGO practitioners and more generally between people in Sudan....Furthermore the concept behind the project is highly transferable. Unfortunately many countries around the world such as East Timor or Afghanistan have been in turmoil over the past few years. These events have resulted in a loss of documented history for the people and the NGO working there. Short- or long-term history suggests a choice of how anybody can write about the history of its own country. How, for instance, can one dispute the position of the frontier when one can only rely on human memory? Although human memories fade over 50 years, properly archived and accessible documentation could give a more concrete guide. In the meantime the experience of the many NGO that have been involved in aid work has made Sudan�s history more accessible to the world. Balancing Act will publishing the URL to the Sudan Open Archive Pilot Project as soon as it becomes available.

Posted by Peter Suber at 1/30/2006 01:30:00 PM.

Proposed new OA journal of anthropology

Matthew Wolf-Meyer has writiten a proposal for a new OA journal of anthropology, After Culture: Emergent Anthropologies. (Thanks to Kambiz Kamrani on Anthropology.net.) From the proposal:

The purpose of After Culture: Emergent Anthropologies (hereafter AC:EA) is to allow an international group of graduate students to work alongside a similarly international group of faculty in the production of a journal that publishes primarily graduate student scholarship. Graduate students will produce peer-reviews as well as deliberate on and implement editorial policy, all with the support of a diverse group of faculty (who will participate in the peer-review process and provide advice on editorial policy)....[I]f given AAA approval, the journal will be published by the University of California Press and made available through AnthroSource....The total cost of one year�s worth of publications (2 issues, 200 pages) is approximately $3200 (based on University of California Press figures and including the costs of formatting, online storage and publicity). I have approached the president of the National Association of Student Anthropologists (NASA) about the possibility of aligning AC:EA with NASA, thereby becoming the official publication of NASA. I am currently discussing the possibility of funds being provided by the Society of Cultural Anthropology and the American Ethnological Society. The AAA journals will soon be receiving payments from the institutional subscription fees based upon how often the journal and its contents are cited and accessed through AnthroSource. It is hoped that with its provocative title and contemporary content that AC:EA will receive enough funds to cover some of its publishing costs. AC:EA may also attempt to solicit funds through the AAA meeting registration process (similar to the donations that people make for childcare).

Posted by Peter Suber at 1/30/2006 10:26:00 AM.

Counting the OA journals

Heather Morrison, Trends in refereed journals / open and toll access, Imaginary Journal of Poetic Economics, January 29, 2006. Excerpt:

Data on scholarly, peer-reviewed journals from three sources is presented and analyzed. Ulrich's reports 1,253 scholarly, peer-reviewed open access journals, about 5% of the journals in this category. The number of new journal start-ups recorded in Ulrich's since 2001 appears to be fairly steady since 2001, both for all scholarly, peer-reviewed journals, and for open access scholarly peer-reviewed journals. The largest number of open access peer-reviewed journal start-ups recorded was in 2004, the last year for which data is likely complete, with a total of 99. DOAJ includes a total of 2,009 open access journals as of today. One possible source of the discrepancy in numbers could be an english-language bias in Ulrich's; of the academic journals listed in Ulrich's, almost 90% are in the english language, while DOAJ appears to represent a much broader linguistic spectrum. Both DOAJ and Ulrich's list considerably fewer open access journals than are found in Jan Sczcepanski's list, over 4,700 journals as of December 2005. There are several possible reasons for this. One is that many academic journals are not necessarily peer-reviewed; for example, only about 40% of the journals listed as academic / scholarly in Ulrich's are peer-reviewed. If we assume that 40% of the journals in Jan Szczepanski's list are peer-reviewed, the total would be 1,880 - very close to the DOAJ figure of 2,009. There are reasons to think that all available figures for open access journals are underestimates. Jan Sczcepanski's, the longest list available, for example, focuses on social sciences, humanities, and math; it is also primarily the work of one individual working on a volunteer basis.

Posted by Peter Suber at 1/30/2006 09:26:00 AM.

Profile of David Goodman

Heather Morrison, David Goodman: Ardent Open Access Advocate, OA Librarian, January 29, 2006. Another installment in Heather's celebration of librarians who fight for OA. Excerpt:

Dr. David Goodman of the Palmer School of Library and Information Studies, Long Island University, has been an ardent advocate of open access for many years....Many of us know David through his long-standing, and much appreciated, contributions to the Liblicense discussion list...as well as his contributions to the SPARC Open Access Forum....Some of David's formal writings can be found in David's E-LIS or D-LIST. David edited a November 2004 special issue of Serials Review on Open Access - his own The Criteria for Open Access is highly recommended as a well-balanced overview, particularly for the novice advocate. Through his long-standing involvement with the renowned and highly innovative Charleston conference, David succeeded in developing an OA-centric 2004 conference theme. Like others in the open access movement, however, many of David's best works are not the formal publications. When we are working in an arena like open access, timing is critical - policymakers are discussing the issues now, not next year; publishers are reviewing their procedures now, too. Rather than waiting for the formal publication process, with its academic rewards but inevitable delay, David often shares his knowledge, opinions, and even preliminary data with us right away.

Posted by Peter Suber at 1/30/2006 09:05:00 AM.

Sunday, January 29, 2006

Lorcan Dempsey on institutional repositories

Lorcan Dempsey, Networkflows, Lorcan Dempsey's blog, January 28, 2006. Excerpt:

As more of our working, learning and playing lives moves onto the network we need better workflow support. One can state one of the major challenges facing libraries in these terms. Historically, users have built their workflow around the services the library provides. As we move forward, the reverse will increasingly be the case. On the network, the library needs to build its services around its users' work- and learn-flows (networkflows). This may provide one way of thinking about institutional repositories. I tend to think about institutional repositories as ways of automating particular processes in support of particular institutional goals. Now, one of the discussion points around insitutional repositories is about which goals they support: open access, curation of institutional intellectual assets, reputation management. And which processes? Over time, it is clear that what we now call institutional repositories will be part of wider research process support. What is currently the institutional repository will be a component of the workflow/curation/disclosure apparatus that develops to support research activities....
A couple of interesting recent indications of direction....I have mentioned before the impact of research assessment on local infrastructure, particularly in the UK and Australia where there is a need to record and report research outputs. Some support issues are discussed in an interesting White Paper by Les Carr and John MacColl. This is one output of the IRRA project, which has also released some software which extends eprints.org and Dspace to provide better workflow support for the Research Assessment Exercise.

Posted by Peter Suber at 1/29/2006 09:27:00 AM.

OA to medical books

Dean Giustini, Open access to the digital medical atheneum - work in progress, UBC Google Scholar Blog, January 28, 2006. Excerpt:

Open access to high-quality, digitized versions of the most influential medical books in history is improving, all the time. The National Library of Medicine's History of Medicine Division and the British Library have notable digitization projects worth exploring. NLM's amazing historical collections examine various facets of medical history, and include Islamic manuscripts, searchable images and even the Vesalius
De humani corporis fabrica. The NLM version of the anatomy classic includes audio commentary, and online magnifying and "page turning" technology. Google"s Book Search is typical of current digitization efforts - it's very much a work in progress. The great medical texts of history - such as Harvey's Circulation of the Blood - are not yet digitized but others mention Harvey's landmark book or are translations. Text versions are available on Bartleby's as are writings by Lister and even Pasteur. Try an advanced search on the Web for specific digital versions. Googling for medical texts in the digital atheneum is getting easier. But first, if you can, browse specific portals such as MLA's and the AAHM. Two of Canada's best collections in the history of medicine are located at the UBC Woodward Library and McGill"s Osler Library of the History of Medicine. Sir William Osler was a bibliofile and gave a collection of 8,000 medical books to McGill. It will take time to view Osler's complete collection online. Digitization is hard on books, and some texts will likely never be digitized. At present, however, search for static images using Google"s Image search, view progress at the Gutenberg project and the Internet Archive. For a good starting point, browse sites selected by McGill's librarians and search for history papers on PubMed, the IndexCat or Google scholar.

Update. Klaus Graf points out by email that the Bibliothèque interuniversitaire de médecine (BIUM) lists over 400 OA medical books, mostly in French, and over 3,000 OA medical texts of all kinds, in many languages.

Posted by Peter Suber at 1/29/2006 09:03:00 AM.

Comparing ROAR and OpenDOAR

Stevan Harnad, ROAR to DOAR, Open Access Archivangelism, January 29, 2006. Comparing the strengths and weaknesses of the Registry of Open Access Repositories (ROAR) and the Directory of Open Access Repositories (OpenDOAR). Excerpt:

DOAR is mostly just re-doing, funded, what Tim [Brody] had already done, unfunded (with ROAR). DOAR so far covers about 3/5 of the archives in ROAR and 1/2 the number in OAIster, and does not yet measure or provide a way to display the time-course of their growth in contents or number, as ROAR does. (DOAR will need Tim's Celestial to do that.) However, DOAR does provide an OAI Base URL in what looks (to my eyes: DOAR does not yet give tallies) to be a much larger proportion of archives than ROAR does (c. 80%), and this is presumably because DOAR has directly contacted each archive individually for which the OAI Base URL was missing. (This is...perhaps too much to expect from an unfunded doctoral student, primarily working on his thesis! The solution of course is for archives to expose their own OAI Base URLs for harvesters to pick up automatically, and this will of course be the ultimate outcome. For now, there is no Registry that all archives use or aspire to be covered by. If DOAR incorporates all of the useful features of ROAR (especially celestial), and adds value, it may succeed in becoming that Registry. So far, ROAR's periodic calls to Archives to register have insufficient success. Most of ROAR's new archives for the past year or more have been hand-imported by me and Tim! At least DOAR will be funded to do that thankless task, from now on!) The second potentially useful feature of DOAR is that it seems to classify separately the different content types and (I think -- I'm not sure) that DOAR has checked that those are all full-texts (rather than just bibiographic metadata: DOAR will have to make this more explicit in their documentation)....
Right now, the DOAR entry for an archive looks a lot like a library card catalogue entry for a journal or a book (perhaps by analogy with DOAJ) or even a collection. This does not quite make sense to me, since users do not consult or use individual online institutional archives as they do for individual books or journals or collections. For one thing, most of the archives will be university IRs. Most universities produce contents of all of the types listed, and in all of the subjects listed; and rarely will any user want all/only, say, articles on subject X from individual institution Y: They will instead use an OAI harvester and service-provider like OAIster or citebase or citeseer or even google scholar, that searches across all institutions on that subject, or even all subjects.

Update. Also see the discussion thread on Harnad's comments in the AmSci OA Forum.

Posted by Peter Suber at 1/29/2006 08:31:00 AM.