Open Access News

News from the open access movement

Saturday, January 14, 2006

New tool makes OA data more useful

M.L. Baker, Gene Mining Strikes Gold, ExtremeNano, January 1, 2006. Excerpt:
There's a lot of scientific data going to waste. Much of it has been painstakingly gathered through timely and costly experiments and is freely available in public databases. But researchers have been hard-pressed to use existing data to ask new questions, because they lack reliable descriptions and computational tools. Now, scientists at Harvard and Stanford have created a software application that overcomes some of these barriers. The program, called Genotext, trolled through publicly available data and came back with genes implicated in aging, leukemia and injury, as described this month in the journal Nature Biotechnology. The program automatically analyzes text descriptions of different experiments. It then identifies which genes were turned on or off, up or down, in various diseases or environmental conditions. That's no easy task, since a single experiment can collect millions of data points and descriptions of very similar experiments can vary widely. "This is a real advance," said John Wilbanks, head of Science Commons, a nonprofit group dedicated to helping scientists find productive ways to share data. "The use of annotation and knowledge to understand functional relationships between genes is where the field has to go."...Scientific journals often require researchers to deposit their microarray data in publicly available databases. Though data formats to describe which genes are turned up to what level are fairly standard, the same can't be said for descriptions of the conditions and tissues in which the genes are measured. That makes it difficult to compare experiments that probe how the environment might change gene activity or how gene activity differs between sickness and health. "We've all agreed on how to represent the genes, but we haven't agreed on how to represent what we actually did in the experiments," [Atul] Butte [Stanford bioinformatics specialist and the study's lead author] said. That's one of the problems that Butte, along with Isaac Kohane, a bioinformaticist at Harvard, set out to solve by creating Genotext....Though Genotext is available for free over the Web, researchers need some programming experience to use it. Plans to create a user-friendly query interface are underway.

More on the author and publisher suits against Google

Mike, How Jealousy Could Destroy The Internet, techdirt, January 13, 2006. Excerpt:
Jealousy is a very powerful emotion. It can make for some great stories in books or movies, but it has no place in the board room -- yet, that's where it is these days. Fred Wilson pointed this out quite clearly last week, when he correctly said that all this talk by the Baby Bells that Google, Microsoft, Vonage and other successful web companies should pay the telcos extra was simple jealousy. Wilson tells the Bells to "dream on," and while we hope he's right, he may be underestimating the destructive power of jealousy. And, it's not just the Baby Bells who are acting this way -- but plenty of online businesses. If they keep it up, they're going to destroy a good thing, just because they can't stand the thought of someone else being successful....Considering the powerful position Google has these days online, it's no surprise that Google is often at the center of this jealousy. It's the main company the Baby Bells want to pay up. It also explains the Google Print controversy -- as authors and book publishers are upset, even though Google is making their content more useful by making it searchable. The latest case of Google jealousy comes courtesy of online publishers, with a Business Week writer suggesting the idea of having major publishers completely cut their content off from Google. It is, as the cliche goes, cutting off your nose to spite your face. People are so upset that Google is successful, they don't seem to notice that it's helped make them more successful too. Google is successful because it's adding value, not just for users, but to the sites it directs its traffic. Cutting Google off makes the content less useful and serves no purpose other than giving in to destructive jealousy. Part of the power of the internet is that it has been able to avoid most of this, by making it easy to link, embed, modify and copy -- to offer new and different ways to manipulate and view the information that's out there. It's that openness that makes the whole thing valuable. Throwing up walls, tollbooths and blockades for the sake of jealousy harms the entire system. Those who support these moves claim they do so to try to maximize their own profit -- trying to take a piece of the cut from these other services that make their product valuable. However, that's a short-sighted view. If they want to do that, they should make their products more valuable on their own without cutting off others. The "profit" they believe they're maximizing, they're actually shrinking by cutting off the added value that others provide. The more this jealousy continues, the more harm it does to the overall value presented by the internet.

Quaero update

James Niccolai, Europe's 'Google killer' goes into hiding, Info World, January 13, 2006. Excerpt:
A project to develop advanced multimedia search technologies led by France's Thomson SA has gone into hiding in the face of intense publicity this week that it is building a "Google killer" that will help to improve Europe's standing in the high-tech world. The project, called Quaero, found itself in the spotlight following remarks last week by French President Jacques Chirac in a speech laying out his agenda for France in 2006. "We must take up the challenge posed by the American giants Google and Yahoo," Chirac said, discussing the importance of technology to Europe's economy. "For that, we will launch a European search engine, Quaero." His remarks prompted some commentators to describe Quaero as Europe's next Airbus SA, the aircraft maker that competes with The Boeing Co. in a contest symbolizing the economic rivalry between Europe and the U.S. There was talk of a coming out party next month where Quaero's goals would be described in more detail, although a spokeswoman for the project said no event has been planned. The scrutiny was apparently too much for Thomson's chairman, Frank Dangeard, who imposed a "news blackout" Thursday on Thomson's media staff and ordered the project's Web site to be taken offline. "There's been a lot of noise and our chairman decided we should stop making any comments until a more official press event," said Thomson spokesman Philippe Paban...."Probably politically what's behind it is an uncomfortable feeling of having all access to knowledge and information filtered or provided through a search engine that (comes from) abroad," said Alex Waibel, director of the InterACT Center at Germany's University of Karlsruhe, which is developing Quaero's speech and language processing technologies. "Having said that," he added, "there's also a wish to make search, in a way, much richer, and in particular that involves multimedia and multilingual information."

A primer on OA journals

Mark Funk, Open Access – A Primer, undated. (Thanks to the Krafty Librarian.)

Comment. This is OA-friendly and well-done. I hate to pick nits but there are a few small ways in which Mark could improve the document. A date would help, since the OA scene is always changing. The primer focuses on OA journals and neglects OA archives, but it admits this upfront. (For a two-sided introduction, see my OA Overview.) The only factual mistake is the claim that most OA journals use the "author pays" model. The Kaufman-Wills report of October 2005 showed that only 47% of OA journals charged author-side fees --fewer than half and even a smaller percentage than the subscription-based journals charging author-side fees. In any case, this model should not be called "author pays" since, as Mark acknowledges, authors rarely pay out of pocket.

More STM publisher blogs

Richard Akerman has taken up the challenge to find more blogs by STM publishers. His list includes Nascent, Action Potential, and Free Association, all from Nature, as well as TheScientist blog, the Library Journal TechBlog, Lost Boy by Leigh Dodds at Ingenta, and From the Hart by Factiva CEO Clare Hart.

The OCA work agenda for 2006

The Open Content Alliance has released its Work Agenda for 2006. It has launched six working groups to select works for digitization, solicit new contributing institutions, and advise on preservation, book formatting, workflow, and data transfer to the Internet Archive. Each of the groups welcomes public comments and suggestions. Excerpt on the rest of the agenda:
Our focus has been on the OCA's first year. A key milestone will be a public event we are planning for October 2006 to demonstrate the power of collaborative and open efforts to build joint collections. That focus informs the agenda for the coming year. The OCA will initially concentrate on digitally reformatted monographs and serials which represent diverse times, regions and subjects which are in the public domain or available under a Creative Commons license. In other words, the OCA is initially interested in the broad range of digitized documents that are in our libraries and archives. For an October 2006 event, we would like to focus on materials that reflect the history, people, culture, and ecology of North America. This decision is in part a practical one. It establishes essential priorities for the OCA while emphasizing collection depth as a means of encouraging the development of value-added services. It also reflects the general orientation of the initial collections that have been offered to the OCA. (At this stage, OCA is not harvesting metadata.)... The Internet Archive is continuing its role in administering the Open Content Alliance (OCA), but Rick Prelinger, interim OCA director, will unfortunately follow through with his plan to return to the world of moving images. He will help with recruiting OCA staff, and any suggestions for a great Executive Director would be most welcome. The Alfred P. Sloan Foundation has indicated that it may initially help support this position.

Librarians in the age of digitization and OA

Barbara Quint, The Home Guard, The Searcher, January 2006. Excerpt:
What would you do if you had a personal home library numbering in the thousands or even hundreds of thousands of books? Hire a librarian, right?! Well, that’s just what every Web user has as the mammoth book digitization projects by Google, the Open Content Alliance (OCA), Microsoft, Yahoo!, et al., open up their public domain collections. Project Gutenberg has offered tens of thousands of such texts for years. The U.S. Government Printing Office continues to load documents born in public domain, promising eternal archives for them. The open access movement has put masses of scholarly content, similar to what one would expect to find in an academic library’s periodical collection, into the line of sight of Yahoo! Search, Google Scholar, Scirus, and other free Web search engines. And that’s only the material that resembles the traditional content formats that people expect librarians to handle — books and magazines. Then there’s all the content out there on the open Web from authoritative or semi-authoritative or hit-or-miss Web sites. How is a user to tell the wheat from the chaff, the plums from the prunes, the true from the false? Hire an information professional, right?! Well, we know they need us, but do they?...If we information professionals, we librarians, want to serve users, we have to bring our services to where and when the user needs us....Let’s start with three basic principles and one overall goal. Principle One: Our solutions operate on Web time and in Web calendars, i.e., 24/7/365 (366 in leap year). Principle Two: Our solutions conserve our time, energy, and expertise by solving problems as Web-wide as possible. Principle Three: Moving a vendor to provide a solution constitutes a successful solution for us. Goal One: We need to get credit for our solutions, if only in order to get enough influence and resources to make more. Time to roll up our virtual sleeves and get to work.

Washington University joins the OCA

Answering the Tribune's December editorial

Mary M. Case, Health information, Chicago Tribune, December 30, 2005. A letter to the editor.
This is regarding your Dec. 19 editorial "To your e-health." The lack of competition in scientific publishing leads to extraordinarily high prices of research journals. The high prices mean that only health-care professionals and researchers affiliated with well-to-do institutions are able to obtain access to the vast array of relevant published research results. The average citizen faces significant barriers. Parents of children with rare genetic diseases who are active and engaged advocates for their children's health find themselves sneaking into research libraries, hiring students in large medical schools to go to the stacks for them or "borrowing" others' IDs and passwords to search electronic databases--all to read the results of research that is funded with their taxpayer dollars. The National Institutes of Health understands that not only does it have the responsibility of distributing billions of dollars in federal funds to support research, it is incumbent upon it to make sure that the results of that research are widely available to scientists, physicians and the public. Over decades, publishers have clearly demonstrated that their mission to disseminate information is not as important as their opportunity to make money. Through PubMed Central, the NIH is providing the trusted, integrated database that researchers have demanded and the public deserves. For the parents of sick children, the DC Principles Coalition proposal of linking to publisher Web sites, rather than depositing articles in PubMed Central, is one more version of the run-around. It is time for the publishers to stop protecting their own financial wealth and start focusing on our citizens' health.

(PS: An excellent letter! Also see my 12/19/05 response to the Tribune editorial.)

Friday, January 13, 2006

Another journal policy on NIH-funded authors

Judith Gedney Baggs, Open access, Research in Nursing and Health, January 10, 2006. An editorial. Not even an abstract is free online for non-subscribers, at least so far. Excerpt:
A new term has appeared on the horizon of nursing researchers related to publication, it is open access. What is it? Open access, in principle, means publication in a form that allows anyone to have access to the material, so that people are not constrained by having to use a library or to pay for a subscription to a journal....Why would open access be appealing? There are a number of reasons. Librarians, who are troubled by ever-increasing costs for journal subscriptions, believe this would be a good solution. People in rural areas, both in the US and abroad, who do not have access to a good library, would be able to access research that they currently cannot. The National Institutes of Health (NIH) and the U.S. Congress like the idea because it seems reasonable that research that was supported by public funding (e.g., NIH funding), should be accessible to members of the public without their having to pay an additional fee. Open access has the potential to expand the visibility and impact of research by increasing the number of people who can read about it. Why not have open access? Journal publishers make their living from subscription fees....While most peer reviewers are unpaid, there is a complex system supporting editors, editorial boards, and management of the peer review process that is costly....In light of this discussion, and to be open with our authors, I want to share Wiley’s policy related to open access. The entire policy is available [here] at the bottom of the For Authors page. Wiley, publisher of Research in Nursing & Health has agreed to deposit the article, in its final form, in PMC at the time of publication, with the stipulation that it be made available for public access 12 months later. This will be done for any article with an NIH grant mentioned in any part of the manuscript. Authors may request that it not to be posted. With regard to posting manuscripts in an internal website, Wiley’s copyright transfer agreement allows posting unfinished versions of the manuscript on such sites. The final version can be posted to an "electronic reserve room" at their own institution that is for student use. As researchers desiring to publish, it behooves us all to be aware of the policy of any journal we submit our work to with regard to open access. I understand the logic of open access, but I also appreciate the manuscript review process. I have enormous faith in and respect for the reviewers for this journal in assuring that what is published is the best, and, although the charges for some publications are outrageously high, I would not want to see an end to private and societal publication and the peer review process.

Comment. It's one thing for a journal to try to protect its revenue stream, although there's no evidence to date that OA archiving jeopardizes that revenue. But it's quite another to imply that OA is about bypassing peer review when it's about removing access barriers to peer-reviewed literature.

Freeing users to use OA literature

Valerie, Why on earth? The Return of Lady GovDocs, January 12, 2006. Excerpt:
there are some things i just don't get about my library....part of the process for 'acquiring' or providing access to 'open access' or no-fee materials is...contacting the person responsible for the site & getting written notice from them that a license isn't required. (that's right - it's not enough that they're just putting it out there...our legal counsel needs to know that they're not going to sue us for LINKING TO THE SITE.) grr.

Comment. University lawyers are paid to make sure that we err on the side of caution, which in the case of fair-use judgment calls often means that we err on the side of non-use. The hard way to fix this problem is to reform copyright law. The easy way is to make sure that OA content carries some kind of label that it's OA, even if the label isn't as formal as a CC license.

More on the new BMJ access policy

Fiona Godlee, Swept along by the tide, BMJ, January 14, 2006. A short elaboration on the new BMJ access policy. Excerpt:
One unwelcome change for some readers has been the closure of access to the BMJ's non-research articles, which up until now were free for the first week of publication. The change was necessary to maintain subscription revenues. The peer reviewed research articles remain open access (free from the day of publication on as well as being on PubMed Central), and the whole journal remains free to most countries in the developing world (those on the HINARI list). Non-research articles become free to all after a year of publication. It is always hard to be asked to pay for something that has been free, but we hope that those readers who don't get the BMJ free through their institution will see enough value in it to pay £20/$37/€30 for a year's full online access.

Trove of OA from American Museum of Natural History

The American Museum of Natural History has launched an institutional repository through which it's poviding OA for its past and present scientific publications. The program includes its journal, Bulletin of the American Museum of Natural History, which is now OA up to its most recent issue from 2005, and its monograph series, Anthropological Papers of the American Museum of Natural History, which is now OA up to what seems to be the most recently published volume in 2002.

DOAJ reaches milestone, plans changes

This morning the DOAJ reached the major milestone of listing 2,000 OA journals. From today's press release:
As of today the Directory of Open Access Journals (DOAJ) contains 2000 open access journals, i.e. quality controlled scientific and scholarly electronic journals that are freely available on the web. The goal of the Directory of Open Access Journals is still to increase the visibility and accessibility of open access scholarly journals, and thereby promote their increased usage and impact. The directory aims to comprehensively cover all open access scholarly journals that use an appropriate quality control system. Journals in all languages and subject areas will be included in the DOAJ. The selection criteria have been updated based on feedback from users to be more understandable.

The database records are freely available for reuse in library catalogues and other services and can be harvested by using the OAI-PMH, and thereby increase the visibility of the open access journals....New titles are added frequently and to ensure that the holding information is correct you have to update your records regularly. We also have to remove titles from DOAJ if they no longer lives up to the selection criteria e.g. during the last 6 months of 2005 50 titles where removed. We are working with publishers of hybrid journals (subscription based journals where authors /institutions for a publication charge can publish articles in open access) in order to include even these articles in the DOAJ. It is our intention to be able to inform about this in the near future.

Feedback form the community tells us that the DOAJ is an important service. In order to be able to maintain and further develop the service we have decided to launch a Donation Programme that makes it possible for all users/institutions to contribute to the continued maintenance and development of DOAJ....DOAJ is or has been supported by the Information Program of the Open Society Institute, along with SPARC (The Scholarly Publishing and Academic Resources Coalition), SPARC Europe, BIBSAM, the Royal Library of Sweden and Axiell.

IEEE provides OA to editorials and book reviews

The IEEE is now offering free online access to the "non-indexed, ancillary content (often called ephemera) from IEEE publications" such as editorials and book reviews. (Thanks to ResourceShelf.)

New business models for open content

Intelligent Television is working on the economics of open content --not at all limited to television. (Thanks to Open Business.) From the site:
With the support of the Hewlett Foundation in 2005 and 2006, Intelligent Television is bringing together business and industry leaders and culture and education stewards to explore new business collaborations between libraries, museums, archives, universities and commercial media and technology enterprises. Intelligent Television is also commissioning and publishing a working paper on the economics of open content, a vital subject; publishing a report, based on a summary of its public-private meetings and drawing on this working paper, highlighting the emerging economic relationships in this field; and developing and producing two new models for commercial-noncommercial media collaborations around cultural heritage and educational materials. Intelligent Television’s Open Production Initiatives serve as one sort of new model for the distribution of open content and open educational content in particular to the broader interested public—a model based in video and film media, produced in the best traditions of documentary television, and meant to be distributed in various complementary ways. The two Open Production Initiatives for this project are being developed in association with Columbia University Center for New Media Teaching and Learning and the Massachusetts Institute of Technology Open Courseware project.

Blogs by STM publishers

Rafael Sidi, STM Publishers and Blogs, Really Simple Sidi, January 11, 2006. (Thanks to Issues in Scholarly Communication.) Excerpt:
I am kind of surprised that we haven't seen any major STM (Elsevier, Thomson, Wiley, Springer, IEEE etc) publishing companies' senior execs embracing blogging and officially blogging. Here is what David Weinberger said in "Talking from the inside out: The rise of Employee Bloggers" (pdf) a white paper by Edelman and Intelliseek:
"Many corporations are afraid of Weblogs because they are afraid of the sound of the human voice. But that voice-the unfiltered sound of an actual person writing about what she cares about, sounding like herself-is actually the most important way of connecting with customers and partners"
If you see one STM publishing exec official blog, let me know.

(PS: Here are two. Jan Velterop is the Director of Open Access at Springer and writes a blog called The Parachute. Chris Leonard writes a blog called Computing Chris, and wrote it while he was a Publishing Editor for Elsevier, though he's now left the company. I hope his departure was not blog-related.)

Jonathan Band updates his Google Library analysis

Jonathan Band, The Google Library Project: The Copyright Debate, American Library Association Office for Information Technology Policy, January 2006. Updating and extending his earlier pieces (this from 9/05 and this from 10/05). Excerpt:
The Google Library Project has provoked newspaper editorials, public debates, and two lawsuits. Much of the press coverage, however, confuses the facts, and the opposing sides to the controversy often talk past each other without engaging directly. This paper will attempt to set forth the facts and review the arguments in a systematic manner.

(PS: This is the most comprehensive defense to date of the legality of Google's opt-out Library poject.)

Technical criteria for OA repository software

Andy Powell, Notes about possible technical criteria for evaluating institutional repository (IR) software, UKOLN, December 2005. Excerpt:
This document attempts to identify some of the technical criteria that might be used to evaluate the different institutional repository (IR) software platform options, particularly in terms of the ‘machine’ interfaces that the repository offers. The list of issues is not intended to be exhaustive, and the approach is based on the assumption that other, non-technical, criteria such as usability and configurabilty have already received detailed consideration in other documents....Three of the most popular IR software platforms are DSpace, and Fedora (though there are others of course). Trying to compare these three is a little like comparing apples and oranges. DSpace is a Java-servlet application that runs under Apache Tomcat. is written in Perl and typically runs under Apache, using mod-perl to improve performance. Both applications provide the basis for an IR ‘out of the box’, including an end-user Web interface and so on. Both offer similar functionality to the end-user. Fedora on the other hand is more like a software toolkit. It provides the underlying IR framework, but requires custom development of a user-interface, either by layering an existing suite of user-interface tools on top of the Fedora APIs, or by building from scratch. Any decision about which IR software platform to choose must be based not only on the technical and functional capabilities of the system but also in determining best fit with organisational IT strategy and with the availability of local software development effort. However, as a way of helping with that decision making process, it may be sensible to ask the developers of these software platforms to respond to the issues raised in the sections below. Some potential questions are suggested in each section.

Teaching students that not everything is OA

Marylaine Block, Information Literacy: Food for Thought, January 13, 2006. Good teaching exercises for students who "believe everything they need to know is available for free with a simple Google search -- and, if they don't find it there, that it doesn't exist at all."

Comment. I wholeheartedly endorse these teaching exercises. But I have a two-sided response to the belief that if it isn't online [or free online], then it doesn't exist [or isn't worth reading]. On the one hand, on most topics today it's wishful thinking and may remain so for a long time. Don't let students indulge in it and don't fail to teach them what else exists and how to find it. On the other, we should work on making this belief true tomorrow, not just criticize it for being false today. Don't expect students to overlook the spectacular convenience of free online access to scholarship and information. For research authors as well as research readers, it's better to move peer-reviewed research literature into this basket than to keep blaming students for looking first in the basket closest to them.

Thursday, January 12, 2006

The UK text mining center

Julie Nightingale, Digging for data that can change our world, The Guardian, January 10, 2006. Excerpt:
Research tools able to swiftly analyse masses of data could soon bring about advances that scientists up to now can only dream of...Scientific research is being added to at an alarming rate: the Human Genome Project alone is generating enough documentation to "sink battleships". So it's not surprising that academics seeking data to support a new hypothesis are getting swamped with information overload. As data banks build up worldwide, and access gets easier through technology, it has become easier to overlook vital facts and figures that could bring about groundbreaking discoveries. The [UK] government's response has been to set up the National Centre for Text Mining, the world's first centre devoted to developing tools that can systematically analyse multiple research papers, abstracts and other documents, and then swiftly determine what they contain. Text mining uses artificial intelligence techniques to look in texts for entities (a quality or characteristic, such as a date or job title) and concepts (the relationship between two genes, for example)....Initially, the centre is focusing on bioscience and biomedical texts to meet the increasing need for automated ways to interrogate, extract and manage textual information now flooding out of large-scale bio-projects....Text-mining tools in use include Cafetiere, an information extraction tool that annotates text with information about entities and the relationships between them. Termine, a tool for handling terminology, is being re-engineered by the centre so that it can deal with large volumes of data. The centre...will act as a repository for such tools, as well as developing its own. One key task will be plugging the number of different tools for different tasks into one coherent framework. "This infrastructure will allow many people's tools to work together in a mix and match way, the mix of which will depend on the intended application," says Barker.

More on science as collateral damage in the war on music copying

Pierre Baruch, Franck Laloë, and Françoise Praderie, La science, c'est aussi de la culture, Le Monde, January 12, 2006 (in French). (Thanks to Stevan Harnad.) French copyright reforms designed to crack down on file-sharing in music will inadvertently harm science; exceptions in the existing law for research and education are not enforced; and media discussion focuses on music to exclusion of other affected areas of culture.

Research-sharing limited less by DNA patents, more by Bayh-Dole Act

David Epstein, Good Business, Inside Higher Ed, January 12, 2006. Excerpt:
Tales about business interests in technology impeding the flow of academic information linger in the minds of many researchers like horror stories. But in most cases involving DNA patents, licensing concerns have not restricted sharing among colleagues in academe. A study conducted by LeRoy Walters, professor of bioethics at Georgetown University, and six colleagues — from academe and from private industry — found that, even when universities grant exclusive licensing rights to companies, they insist on the right to share technology for academic research. “The licensing of DNA patents by U.S. academic institutions: an empirical survey,” published this month in Nature Biotechnology, gathered data from 19 technology transfer offices at leading research institutions, some of which are among the most prolific DNA patent holders in the country. All of those respondents, according to the paper, generally retain the freedom to share technology for research purposes. The paper suggests that 1999 guidelines by the National Institutes of Health, which urge grant and contract recipients to share “research tools with all biomedical researchers who request them,” set the tone for academic cooperation, and are widely considered by academic researchers to be stipulations of receiving grants. “It was almost like a gentleman’s agreement when it became clear NIH wanted people to share,” Walters said....Rebecca Eisenberg, a patent law professor at the University of Michigan who specializes in biomedical research, said that...while some things are getting better, data hoarding between colleges and companies is still prevalent since the Bayh-Dole Act of 1980, which allowed colleges and companies to gain exclusive rights to government funded research. “It made companies more reluctant to allow universities to use information freely, because they view them as competitors,” Eisenberg said. “If you’re going to have a mixed system of public and private science, this is going to happen.”

Review of EconPapers

Péter Jacsó reviews EconPapers in the January issue of Gale's Reference Reviews. Excerpt:
[I]n my 2004 review I criticized the meager coverage by EconLit of working papers. The good news is that in 2004 and 2005, the publisher of EconLit added records for 59,000 working papers from Research Papers in Economics (RePEc), the outstanding open-access database specializing in Economics. There have been several applications developed for processing various subsets of the RePEc database. Others, such as the IDEAS database maintained by Christian Zimmermann at the Department of Economics at Connecticut University, process the whole RePEc data set. Also processing the entire data set is the EconPapers database, which I review here. EconPapers (and the RePEc source file) is one of the best examples for successful large scale, collaborative projects among scientists, researchers and their institutions. It has close to 358,000 records for working papers (170,000 items from 1,500 series), journal articles (185,000 items from more than 400 journals), books (600), book chapters (1,020) and computer programs (1,300). Although EconPapers has only about half as many records as EconLit, it makes up for it by the rich content of the individual records. More than one-third of the journal article records and more than two-thirds of the working paper records have abstracts. The majority of the working papers are linked and available online free of charge. Seventy-six percent of the journal article records have links to the full text of the source documents. Although these are not open-access documents, many users will have free access to them by virtue of subscriptions by their libraries....EconPapers is yet another worthy and impressive implementation of the excellent Research Papers in Economics (RePEc) database, proving the viability of efficient collaboration among researchers in providing open access to the full-text, or at least to the rich metadata, of their papers to users who otherwise would not have access to traditional indexing/abstracting tools, let alone to full-text journal archives.

Update on Gallica

Nate Anderson, France pushes creation of European Google killer, ars technica, January 11, 2006. Excerpt:
[T]he French have organized several initiatives designed to one-up the Yanks. You'll remember, of course, the digitization project undertaken by the French National Library which was designed to counter Google's own plan to index millions of English-language books. The project, dubbed Gallica, is great if you want to access manuscript images of Proust's À la recherche du temps perdu from the comfort of your living room, but not for much else. Gallica has only 80,000 images online so far, and none of these are searchable by content. While the idea has merit and may turn into an incredible resource, its current incarnation leaves much to be desired and has basically failed to enhance Europe's reputation as a digital pioneer.

Update (1/13/06). Klaus Graf writes to say that Anderson is wrong on every point. According to Gallica's page of Documents Available Online, "Today, this digital library includes more than 75,000 volumes of digitized texts, 70,000 still images, and 30 hours of sound recordings....About 1,250 works in text format have been placed online...." And from a January 10 story in PC Inpact, "On estime d’ores et déjà que dans le cadre d’une numérisation massive, ce sont entre 50 et 60 000 ouvrages qui seront traités fin 2006, estime Jean-Noël Jeanneney, Président de la Bibliothèque nationale de France." (Thanks, Klaus.)

Advice for the new Blackwell journals

John Blossom, Journal Publishers Huddle Under the Wings of Blackwell, ContentBlogger, January 11, 2006. Excerpt:
It's not the best of times for independent scholarly journal publishers, a fact that keeps them moving towards distributors with more marketing and distribution savvy. Blackwell has announced that it will begin 2006 with 39 new publishing partnerships and 59 journal titles added to its of more than 600 society publications. Not a bad short-term solution for journals challenged by open access publishing and lacking the marketing muscle to distinguish themselves via online search solutions....Blackwell offers a quality publishing solution for journals that provides cost-effective technology and marketing infrastructure that can help them to be more effective independent publishers. But in spite of its Synergy online search interface it's still a heavily print-oriented marketing solution....It's important for independent publishers to consider their options for improving online and print marketing through partnership very carefully for options that will carry them aggressively into online revenue streams as more of their audiences make the shift to online as a primary consumption channel. For many publishers the move to Blackwell will be a positive experience in the short run, but it's a move that won't eliminate to consider long-term marketing solutions carefully.

Criteria for OA government info

Kristin R. Eschenfelder and Clark A. Miller, What Public Information Should Government Agencies Publish? A Comparison of Controversial Web-Based Government Information, a preprint self-archived January 11, 2006.
Abstract: This paper develops a framework to assess the public information provided on program level government agency Websites. The framework incorporates three views of government information obligations stemming from different assumptions about citizen roles in a democracy: the private citizen view, the attentive citizen view, and the deliberative citizen view. The framework is employed to assess state Websites containing controversial policy information about chronic wasting disease, a disease effecting deer and elk in numerous U.S. states and Canada. Using the framework as a guide, the paper considers what information agencies should provide given the three different views of government information obligations. The paper then outlines the costs and benefits of fulfilling each view of government information obligations including issues of limited resources, perceived openness and credibility, press coverage, and policy making control.

Chile asks WIPO to protect the public domain

William New, Chile Urges WIPO To Act To Protect Public Domain, IPWatch, January 12, 2006. Excerpt:
The government of Chile this week submitted a proposal to an upcoming meeting of the World Intellectual Property Organisation’s new committee on the development agenda that calls for positive steps to protect information in the public domain. The first meeting of the new Provisional Committee on Proposals Related to a WIPO Development Agenda will be held in Geneva on 20-24 February. The committee created by the WIPO General Assembly in October reflects a compromise extension of discussions over a proposal to expand WIPO’s focus on developing countries’ needs (IPW, 3 October 2005). The original development agenda proposal was put forward at the 2004 General Assembly by Argentina and Brazil, supported by 12 other Friends of Development. Subsequent proposals have followed. In its proposal, Chile highlights the benefits to society of a rich base of freely available public information. The public domain is of “crucial importance” to researchers, academics, educators, artists, authors and enterprises, as well as all varieties of institutions, it said. Developing countries in particular have raised concern that WIPO’s emphasis on the protection of rights, rather than the protection of public knowledge, may reduce their ability to innovate since most rights belong to developed countries. The proposal, obtained by Intellectual Property Watch, mentions a series of previous documents negotiated by governments in various bodies such as the United Nations Educational, Scientific and Cultural Organisation, and the UN World Summit on the Information Society. Chile calls for an analysis of the implications and benefits of a substantive and accessible public domain, and elaboration of proposals and models for the protection and identification of and access to the contents of the public domain. It further calls for protection of the public domain to be considered in the making of policy at WIPO.

Xerox joins the OCA

Wednesday, January 11, 2006

Authors protest OA book plan at Memorial University

Memorial University of Newfoundland is planning to digitize the works in its library and put them on the web for free online access. When the works are under copyright, it will proceed only with the copyright holder's consent. Canadian authors are protesting anyway. Excerpt from a CBC News story yesterday:
Newfoundland and Labrador writers are fighting a plan to make their work available on the internet for free. Memorial University wants to make much of its library holdings available to the public over the web. However, the association that represents writers in Newfoundland and Labrador says the program could make it harder for its members to sell books. "It's very simple. If you make a work you own it and you should be paid – you should be remunerated for it," said Allison Dyer, president of the Writers' Alliance of Newfoundland and Labrador. Memorial University says the idea behind the project is to make Newfoundland's culture and heritage freely available to everyone. Richard Ellis, Memorial's university librarian, says the university will not post anything on the web without first negotiating for permission. "It ought not to interfere with the livelihood of those people who make a living from publishing whether it be the publishers or the authors," Ellis said. The plan, Ellis noted, is only in preliminary stages. Memorial hopes to begin by posting the library's Newfoundland Studies collection.

Comment. There has to be more to this controversy than we've heard so far. Do the authors understand that the university will respect the decisions of copyright holders? Do they believe the university is making this assurance in bad faith? Have they transferred copyright to publishers and fear that publishers will consent against author wishes? Are they trying to block OA to books in the public domain on a theory that copyright is eternal? I'll post more as I learn more.

Bibliography of OA in Hungary

Tibor Koltay and Erika Tóth have complied a Bibliography of Open Access in Hungary (in Hungarian). From Koltay's announcement:
Due to the relatively low number of original articles, the bibliography also contains data of abstracts and news-items related to OA, published in Hungarian library and information science periodicals. The bibliography will be constantly updated. Its goal is to raise awareness of OA among Hungarian information professionals and through them among researchers.

Five new OA repositories in Australia

Five Australian universities have launched OA repositories, all using the ProQuest/Bepress DigitalCommons service. (Thanks to Arthur Sale.)

21 more Oxford Open journals from OUP

Oxford University Press has added 21 journals to its Oxford Open program, exactly doubling the number of participating journals.

Library automation tools and IRs

Mark Chillingworth, Library automation market is tracking big IT vendors, Information World Review, January 11, 2006. Excerpt:
Pressure from university IT departments is driving the development and adoption of library automation (LA) tools, and fuelling the current spate of mergers and acquisitions in the LA market....Pressures closer to home are also driving the adoption and development of LA tools. “There will be more integration with e-learning and institutional repositories (IR), which will bring in a lot more publishing and workflow technologies to the library,” said [Rein van Charldorp, managing director of OCLC PICA]. Institutional repositories pose a challenge, he added. “Building the IR is easy, getting the information in and out is much harder. To keep up with this pace you have to invest,” he said.

Comment. I don't see the problem getting content out of an OA repository, if this means finding and downloading it rather than removing it. What will library automation tools do to make discovery and downloading easier? As for getting material in, will the tools streamline or even automate the deposit process? Will they change the culture of inertia? I really can't tell what van Charldorp has in mind.

More on the CURES Act

CURES Act Would Push NIH, Library Journal, January 11, 2006. A short, unsigned note.
The battle for free public access to government-funded research may heat up after Sens. Joe Lieberman (D-CT) and Thad Cochran (R-MS) introduced legislation to establish the American Center for Cures within the National Institutes of Health (NIH). Included in that bill, known as the CURES act, is an aggressive provision to help make taxpayer-funded biomedical research available to all potential users. Although Congress directed the NIH to draft a policy to achieve that goal in 2005, what resulted was a weak policy that simply requested NIH-funded research be deposited into PubMed Central within a year after publication. A provision of the CURES Act, however, if passed, would require research funded by a number of government agencies be made available within six months. In addition, the law would set penalties for non-compliance. SPARC director Heather Joseph said that library groups were "gratified" to see that Congress took universal access to research into account.

Comment. Two quick notes: (1) Why is it "aggressive" to give taxpayers access to the research for which they've alrady paid? The six-month embargo is a compromise with the public-interest that makes the policy even less aggressive. (2) Congress directed the NIH to adopt an OA mandate in mid-2004. See my procedural history of the NIH policy.

Chinese ban on Wikipedia in its third month

Geoffrey York, Chinese ban on Wikipedia prevents research, users say, Globe and Mail, January 10, 2006. Excerpt:
Chinese students and intellectuals are expressing outrage at Beijing's decision to prohibit access to Wikipedia, the fast-growing on-line encyclopedia that has become a basic resource for many in China. Wikipedia, which offers more than 2.2 million articles in 100 languages, has emerged as an important source of scholarly knowledge in China and many other countries. But its stubborn neutrality and independence on political issues such as Tibet and Taiwan has repeatedly drawn the wrath of the Communist authorities. The latest blocking of the website, the third shutdown of the site in China in the past two years, has now continued for more than 10 weeks [starting October 19, 2005] without any explanation and without any indication whether the ban is temporary or permanent. "What idiots these officials are!" said one message on a Chinese site. "They are killing our culture with censorship."

Review of Google Scholar

Rita Vine, Google Scholar, Journal of the Medical Library Association, January 2006. A review. Excerpt:
Although Google Scholar covers a great range of topical areas, it appears to be strongest in the sciences, particularly medicine, and secondarily in the social sciences. The company claims to have full-text content from all major publishers except Elsevier and the American Chemical Society, as well as hosting services such as Highwire and Ingenta. Much of Google Scholar's index derives from a crawl of full-text journal content provided by both commercial and open source publishers. Specialized bibliographic databases like OCLC's Open WorldCat and the National Library of Medicine's PubMed are also crawled. Since 2003, Google has entered into numerous individual agreements with publishers to index full-text content not otherwise accessible via the open Web. Although Google does not divulge the number or names of publishers that have entered into crawling or indexing agreements with the company, it is easy to see why publishers would be eager to boost their content's visibility through a powerhouse like Google....The inadequacies of Google Scholar have already been well documented in reviews. These reviews focused on three major weaknesses of the tool: lack of sufficient advanced search features, lack of transparency of the database content, and uneven coverage of the database. Henderson's review of Google Scholar demonstrated its significant limitations for clinician use. Tests conducted by Jacso showed that Google Scholar typically crawled only a subset of the full available content of individual journals or databases. In February 2005, Vine discovered that Google Scholar was almost a full year behind indexing PubMed records and concluded that “no serious researcher interested in current medical information or practice excellence should rely on Google Scholar for up to date information”. With a simple, basic search interface and only minimal advanced search features, Google Scholar lacks almost every important feature of MEDLINE. It does not map to Medical Subject Headings (MeSH); does not permit nested Boolean searching; lacks essential features like explosions, subheadings, or publication-type limits; and offers searchers no ability to benefit from the extraordinary indexing that the National Library of Medicine provides. Google Scholar's closest free Web competitor, the quasi-scientific search tool Scirus from Elsevier, crawls a defined subset of free Web pages plus full-text content from Elsevier journals, patents, preprints, and more. Unlike Google Scholar, the Scirus project team is quick, even eager, to disclose the content of the Scirus database and regularly feeds new partner content into the database in its “About Us” section....Google Scholar has some great features. It is cited by × feature, which links a result to other items in the Google Scholar database that reference the item, a quick and fast way to find citations. Although it is not comprehensive, no other citation-linking tool in the marketplace is....Cyber sleuths can also use Google Scholar to find a free Web version of an article that might have started out behind a publisher's authentication firewall but has been downloaded by someone and then put on a public Web server.

Update. Dean Giustini has written some comments on Vine's review.

ALPSP survey on self-archiving and journal cancellations

The ALPSP has launched a Library Survey on Self-Archiving and Journal Cancellation. From the introduction:
As you may be aware, some publishers are becoming concerned that if self-archiving of postprints, or even preprints, of journal articles becomes sufficiently widespread, this may lead to a decline in usage at journals’ own websites, and that this in turn may lead to cancellations. In order to understand whether or not our fears are well-founded, we would like to understand more about the process by which you make the decision to cancel journals, what the crucial factors are, and how you would rank them in importance, both now and in the future.

OA advocate Les Carr has criticized the survey for question-begging wording on some questions, confusing structure, and a scope limited to librarians, who are only part of the journal-cancellation process. Today I realized that I'd been following the controversy but hadn't yet blogged the survey itself. Sorry for the delay.

What is commercial use?

Mia Garlick, Discussion Draft - NonCommercial Guidelines, Creative Commons blog, January 10, 2006. Those who use CC licenses that prohibit commercial use will be interested in the new draft guidelines on what counts as commercial and non-commercial use.

OA law review from Utrecht

The Utrecht Law Review is a new peer-reviewed, open-access journal now in its second issue. Excerpt from today's press release:
Utrecht Law Review is an Open Access journal offering an international platform for cross-border legal research. It is a good practice of electronic publishing that has been developed by the DARE project [PS: in English] ‘Truth or DARE’ [PS: Dutch only], to show legal scholars the added value to deposit publications in digital repositories. The aim of the ‘Truth or DARE’ project was to establish a number of good electronic publishing practices for Dutch legal researchers. Specifically entailed are publications by legal scholars in digital repositories, resulting in added value to legal-academic communications as well as optimal user-friendliness for academics. The project focused mainly on added scholarly value, communication / information, the supply process, copyright and visibility and was intended to find the most effective method of creating commitment among the target group of authors....The Editorial Board of the Utrecht Law Review has committed itself until December 2007 to publish in Open Access and deposit the publications in the digital repository of the Utrecht University. In the meantime a sustainable business model for the journal is being investigated.

New members for the OCA

Simon Fraser University, the University of North Carolina at Chapel Hill and its School of Information and Library Science, and Washington University have joined the Open Content Alliance.

Open source, open content, open access in developing countries

Segun Oni and Bolaji Onibudo, Open source: The future of IT in Nigeria, Vanguard, January 11, 2006. Excerpt:
Nigeria has to move away from the Get Rich at the Expense of the Poor syndrome that has plagued many western corporations who continually siphon wealth from Nigeria to their countries. We want to promote home grown software built by Nigerians for Nigerians with wealth creation remaining within the shores of Nigeria....But, as William Gibson reminds us, the future is here, it’s just not well-distributed yet. The answer to our problems is not to redistribute wealth, it’s to redistribute the future. In very practical terms, that’s what open source is about....When intellectual problems become distributed, the search for solutions becomes collaborative and the research agenda is driven not by multinational shareholders but by the passions of the participants, you get not just better results, you get different results. The South-South scientific coalition is a sign that a few countries at least - namely Brazil, South Africa and India - get this. They’re working together, trying to educate more local scientists and allying themselves with open and non-commercial approaches, like the open access movement in scientific publishing (which demands that scientific papers be made freely available online, not published in expensive, limited-circulation hardcopy journals), precisely because they recognize that this makes possible a different kind of science. It makes possible a scientific research agenda based on what their people need, not on what will make Monsanto the most money....[T]here’s something very wrong with a world in which crops, energy systems, essential drugs, access to information, methods for providing clean water, and so on are priced outside the reach of billions simply because of the legacy of past development patterns. They are proprietary knowledge. The greatest strength of the open source model is that it is explicitly non-proprietary. It is a direct antidote to legacy ownership of key ideas, because the core concept is that no one should own core concepts. No corporation, no nation, no person can claim ownership over the core concepts in an open source project in order to demand royalties or restrict its use. No one using OS-built medicines, for example, would ever die of AIDS because some Big Pharma executive in New York or Berlin decided that distributing cheap drugs was too great a risk to their patents.

Tuesday, January 10, 2006

Engineering Scholarly Communication blog

OA can reduce one kind of scientific misconduct

In an article for BBC News on the Hwang Woo-suk stem cell scandal, Paul Rincon and Jonathan Amos digress from the problem of fraudulent data to the problem of plagiarism, which allows them to make a point about OA.
Some scientists say that one of the benefits of the "open access" business model for journals - where scientific papers are free for all to read in a web-based database - could be beneficial for picking up plagiarism and possibly other forms of misconduct. A great many scientific journals are subscription-based, so that readers have to pay to view research. "We think it would be harder for people to plagiarise work once you can do extensive word searches and access more material free on the internet. You'll be able to spot where someone has lifted their work much more easily," says Robert Terry, senior policy adviser at the UK medical charity, the Wellcome Trust.

More on India's OA Traditional Knowledge Digital Library

Gaurie Mishra, New-age library set to protect age-old yoga knowledge, Business Standard, January 10, 2006. (Thanks to Subbiah Arunachalam.) Excerpt:
In response to more than 1,000 patents of yoga postures and Ayurvedic medicines in the US and Europe, the health ministry is preparing a 30-million-page digital library [Traditional Knowledge Digital Library] documenting the country’s traditional knowledge. “The 30-million-page digital library is being made to ensure that Ayurvedic medicines, yoga asanas, Unani and Sidha are not patented anywhere else in the world,” an official said. The traditional knowledge digital library (TKDL), which will be ready by December 2006, is being prepared at an estimated cost of Rs 10 crore....Of the 30 million pages, 10 million have already been digitised and 54,000 formulations of Ayurveda and 45,000 formulations of Unani have been placed in the library. Next on the agenda is digitisation of 10,000 formulations of Sidha and 1,500 yoga postures.

OA for commercial and military security

How can commercial and military security specialists make use of OA? Here's a passage from a speech to be given later this month by Robert David Steele Vivas, CEO of and advocate for open source intelligence.
Fifth, in close consultation with the United Nations, Chambers of Commerce, and selected universities and publishers, OSIS-X [Open Source Information System -- External] is now prepared to fully integrate both copyrighted and Open Access content, apply semantic web, synthetic information, and Open Hypertextdocument System (OHS) technologies, in order to create the distributed World Brain that is able to deliver both synthetic distilled answers that are free of copyright restrictions, and also all copyrighted footnotes on a micro-cash reimbursable basis at the paragraph level. This has the added advantage of providing a spam-free information environment in which all participants are authenticated legitimate individuals and all information is of known validated provenance.

OA biomedical journals, esp. in India

Jitendra Narayan Dash and D.K. Ahuja, Open Access of Biomedical Journals. A Revolution in Global Health Information Flow, Library and Information Professionals, January 4, 2006. A paper presented at the conference of the Indian Association Of Special Libraries And Information Centres at the Indian Institute Of Technology, Madras, December 27, 2005.
Abstract: Information exchange is critical for development of health systems. But there are many factors which inhibit free flow of information. Open Access (OA) may play an important role in free flow of information. Various initiatives have been taken by OSI, WHO, NIH, etc. Also some publishers such as PLoS, BMC have developed OA which are providing unrestricted access to their publications. OA movement has received good responses from authors and funding agencies. Resource is a major obstacle for OA, ‘Author pay’ model may have some weak points but it is not a new phenomenon. OA movement will be successful by effective participation of institutions, co-coordinating organizations, authors, publishers and libraries.

More evidence that OA increases submissions

Sara Schroter, Importance of free access to research articles on decision to submit to the BMJ: a survey of authors, BMJ, January 9, 2006. Abstract:
Objectives. To determine whether free access to research articles on is an important factor in authors’ decisions on whether to submit to the BMJ, whether the introduction of access controls to part of the BMJ’s content has influenced authors’ perceptions of the journal, and whether the introduction of further access controls would influence authors’ perceptions.

Design. Cross sectional electronic survey.

Participants. Authors of research articles published in the BMJ. Results 211/415 (51%) eligible authors responded. Three quarters (159/211) said the fact that all readers would have free access to their paper on was very important or important to their decision to submit to BMJ. Over half (111/211) said closure of free access to research articles would make them slightly less likely to submit research articles to the BMJ in the future, 14% (29/211) said they would be much less likely to submit, and 34% (71/211) said it would not influence their decision. Authors were equally divided in their opinion as to whether the closure of access to parts of the journal since January 2005 had affected their view of the BMJ; 40% (84/211) said it had, 38% (80/211) said it had not. In contrast, 67% (141/211) said their view of the BMJ would change if it closed access to research articles. Authors’ comments largely focused on disappointment with such a regressive step in the era of open access publishing, loss of a unique feature of the BMJ, a perceived reduction in the journal’s usefulness as a resource and global influence, restricted readership, less attractive to publish in, and the negative impact on the journal’s image.

Conclusions. Authors value free access to research articles and consider this an important factor in deciding whether to submit to the BMJ. Closing access to research articles would have a negative effect on authors’ perceptions of the journal and their likeliness to submit.

What does and doesn't work to fill OA repositories

Dorothea Salo, A messy metaphor, Caveat Lector, January 9, 2006. Excerpt:
There’s a call out for strategies for attracting content to institutional repositories. I thought about answering it, but “strategy” is such a businesslike, buttoned-up word...I don’t have a meticulously-planned capital-S strategy on a pretty Gantt chart with milestones. I don’t even have a minuscule-s strategy scrawled on a cocktail napkin. I can suggest some capital-S strategies that don’t actually work, though. Limit your repository to peer-reviewed material, and don’t forget to sneer actively at everything else your faculty produce. Play copyright cop, or content cop, or all kinds of other kinds of cop. (No, I don’t actually recommend ignoring copyright, though I wish I could. I’m just saying that copyright ally is a far more pleasant and useful role than copyright cop.) Make everybody sign licenses and memoranda of understanding and any other bits of paper you can shove in front of them. Talk at any opportunity about Dublin Core and OAI-PMH and DSpace over Tomcat on OSX. Mm-hm. That’ll bring ’em runnin’. My strategy? I throw handfuls of spaghetti at the wall and see what sticks. Honestly. Messy metaphor though it is, that’s my strategy. I have put an incredible amount of effort in the six months I’ve been employed into strategies that have gone absolutely nowhere. Formal lectures? No good. Trying to weasel into faculty meetings? Practically impossible. Presence on the library’s home page? Totally useless (though there being no tooltip to explain the acronym doesn’t help). Has anyone actually read my painstakingly-composed propaganda pages? I wonder. What’s worked? Informal contact. Sure, I have to make ten or twenty informal contacts for every one that actually turns into content—but that’s still a better track record than most of my other attempts. If you’re a repository-rat, carry your card with you everywhere and give it out at the least opportunity, along with the fifteen-second version of what the repository’s about. I may have hooked somebody today at lunch, a completely unplanned contact. Another tactic I’ve had decent success with is paying attention to events of scholarly interest happening on campus. Contact the organizers, ask if there will be any print or multimedia results of the event, and ask whether you may archive them. I’m running a 50% success rate on this right now (not including the event I’m currently pursuing for which the jury is still out). That’s huge, in repository-land. The next handful of spaghetti I throw at the wall will include involvement with campus tech-training sessions for faculty, flyer distribution (thank you,, and perhaps trying to chase down some campus webmasters. Oh, and theses and dissertations, of course. What will stick? I’ve no notion. Part of the frustration of being a repository-rat is that repositories’ tipping point is largely outside my control. I can’t do much to shove the CURES Act along. I can’t singlehandedly wrench the entire faculty into supporting open access; in fact, I expect a battle royale from MPOW’s book-smellers over ETDs, even though our proposal actually splits the difference. Patience. Patience and spaghetti. Those are the strategies that work.

Two-layered wikis for OA government info

Christian Wagner and three co-authors, Building Semantic Webs for e-government with Wiki technology, Electronic Government, 3, 1, (2006).
E-government webs are among the largest webs in existence, based on the size, number of users and number of information providers. Thus, creating a Semantic Web infrastructure to meaningfully organise e-government webs is highly desirable. At the same time, the complexity of the existing e-government implementations also challenges the feasibility of Semantic Web creation. We therefore propose the design of a two-layer semantic Wiki web, which consists of a content Wiki, largely identical to the traditional web and a semantic layer, also maintained within the Wiki, that describes semantic relationships. This architectural design promises several advantages that enable incremental growth, collaborative development by a large community of non-technical users and the ability to continually grow the content layer without the immediate overhead of parallel maintenance of the semantic layer. This paper explains current challenges to the development of a Semantic Web, identifies Wiki advantages, illustrates a potential solution and summarises major directions for further research.

Also see Joab Jackson's news story today, Researchers Recommend Wikis for Government Information, Red Orbit, January 9, 2006. Excerpt:

E-government researchers have suggested that collaborative Wiki software may be the best avenue for getting public information to the citizenry. They advocate building two-layered Web pages, with the agency providing a base layer of information and interactive pages layered on top that domain experts, volunteers and others could use to annotate and link the data....Through a Google search, the research group found there are 368 million Web pages under the federal .gov domain alone. The authors propose using Wiki software to ease the burden of handling all this material. Their idea is this: In addition to standard Web pages, a second interactive layer could be added to allow outside parties to add contextual information and pull together disparate strands of data....The two-layer design owes a debt to database design, the authors concede. The database itself holds the raw data, while additional indexes are placed over top to parse the data in various ways. The agencies would "rely on a community of users to maintain the semantic relationships in the form of a Wiki web," according to the paper....Communities of interest, such as domain experts from different agencies, can get together to explain the data and how it could be used. "You really need to involve communities of interest. It takes a community to make sense out of the content," Davis said....SICoP uses the Wiki to organize meeting materials and documents, Niemann said. A Wiki was also used in developing the second draft of the Federal Enterprise Architecture's Data Reference Model. In both of those project pages, the numbers in purple are links to other static Web pages or sub-parts of the same page. SICoP is looking at other ways and possible pilots to repurpose Wiki and other online content through the use of additional semantic layers. Niemann also points to a pilot of a medical search engine developed by SemanTx Life Sciences Inc. of Waltham, Mass. Here, you type in a question and the system "builds an ontology so you can see if that is really what you mean and then uses the ontology to structure the answers to your question," he explains.

Making text and data visible together

Leigh Dodds, The Modern Palimpsest, Lost Boy, December 16, 2005. (Thanks to Richard Ackerman.) Excerpt:
The following is a brief summary of a talk I gave recently at the Ingenta Publisher Forum on the 28th November. The slides are available as a Powerpoint presentation. In the presentation I tried to highlight some of the possibilities that could become available if academic publishers begin to share more metadata about the content they publish, ideally by engaging with the scientific community to expose "raw" data and results. The conceit around which I hung the presentation was the suggestion that the scientific paper is the modern equivalent of a palimpsest. A a scroll or manuscript that has been written on, had its text scraped off, and then reused....A great deal of success has been made in extracting the original texts from these works....The underlying text is known as the scriptio inferior, and may actually be more valuable than the more visible content. I likened the process of authoring a scientific paper to that of the creation of a palimpsest. Starting from original research results and working through the synthesis of a cogent explanation of the results or discovery, at each step the content becomes more abstracted from the original results, the previous work being "lost" to the reader. Data is presented in pre-analysed forms and is not amenable to reuse. Like the palimpsest the raw data has not really been lost, its just not (easily) accessible to the reader. If the scriptio inferior, the underlying data, were made available to the reader, then there a lot of interesting possibilities arise....In my presentation I tried to stick to a pragmatic and practical line and demonstrate the possibilities by referring to actual examples. I ended up pointing to three:...iSpecies is a nice example of a science "mashup" that illustrates an alternative search interface for finding related content. I used the false results that can appear when performing simple keyword searches to reinforce the need for standard identifiers. (The need for a common, scoped identifier for authors, is a particular hobby horse of mine.) I also showed the excellent HubMed as an example of how both an alternative user interface can be better than the original, and also how content can be "enriched" by mixing in other sources. The "terms" feature which dynamically links keywords in an abstract through to a number of data sources, demonstrates this very well. I used the fact that material can be sourced from user contributed sources such as Wikipedia, to promote the idea that content needn't be fixed at the point of publication but can be annotated after the fact.

More on the CURES Act

Battle for free access to government-funded research introduced, again, LANL Research Library News, January 9, 2006. Excerpt:
The battle for free public access to government-funded research is will rear its head again in 2006. And this time, it's Congressional. On December 14, 2005, Senators Joe Lieberman (D-CT) and Thad Cochran (R-MS) introduced legislation to establish the American Center for Cures within the National Institutes of Health (NIH). Included in that bill, known as the CURES act, is an aggressive provision that to help make taxpayer-funded biomedical research available to all potential users. Although Congress directed the NIH to draft a policy to achieve that goal in 2005, it proved to be a controversial issue. What resulted was a weak policy that simply requested NIH-funded research be deposited into PubMed Central within a year after publication. A provision of the CURES Act, however, if passed, would require research funded by a number of government agencies be made available within six months. SPARC director Heather Joseph told the LJ Academic Newswire that library groups were "gratified" to see that Congress took universal access to research into account when crafting this bill. "The aim of the bill is to speed cures by removing barriers," Joseph noted. "One of those barriers is access to research." Joseph said the bill's provisions are both broader and more stringent than the current NIH policy—the debate over which was the source of much consternation in 2005. Under the CURES Act, deposit of research articles funded in part or whole by government agencies, including Department of Health and Human Services (DHHS), including NIH, the Centers for Disease Control and Prevention, and the Agency for Healthcare Research and Quality, would be required, as opposed to requested, and within up to six months. In addition, the law would set penalties for non-compliance. While the legislation mentions PubMed Central as a repository, Joseph said the bill does apparently leave the door open for deposit in any publicly accessible repository. Joseph said it was unclear when Congress would begin to consider the bill in earnest. While publishers will likely find much to oppose in the bill, a coalition of library groups issued a statement praising the bill and promising support. The coalition is made up of the American Association of Law Libraries, the American Library Association, the Association of Research Libraries, the Medical Library Association, and the Special Libraries Association.

OA and libraries

Charles W. Bailey, Jr., Open Access and Libraries, a preprint of Charles' chapter in the forthcoming anthology, Electronic Resources Librarians: The Human Element of the Digital Information Age (ed. Mark Jacobs, Haworth Press, 2006). Excerpt:
The open access movement has gained considerable traction in the last six years. It has become the most successful scholarly publishing reform movement in modern times, and it has begun to transform the scholarly communication system. Understandably, it has been met by hostility and skepticism by traditional publishers; however, a growing number of them are overcoming their initial reactions, and they are testing whether open access offers them a viable business model. Open access has stuck a sympathetic cord in the library community, which has long suffered the debilitating effects of the serials crisis; however, libraries have been somewhat cautious in their embrace of open access, uncertain about its destabilizing effects on the scholarly publishing system and its ultimate impact on their budget and operations. A growing number of scholars, especially in STM disciplines that have been hard hit by high serials prices, have either become open access advocates or have been swayed by its arguments; however, disciplines that are less dependent on journal literature have shown less enthusiasm and many scholars still have concerns about credibility issues associated with new digital publishing efforts and have not yet seen that the benefits outweigh the risks and costs in terms of time and effort (e.g., to create and deposit e-prints). Primarily as a result of the open access movement, there is now a rare opportunity to truly transform the scholarly communication system. There has not been such an opportunity in living memory, and, if it is not seized, it is unclear if there will be another one in our lifetimes. If you want change, now is the time to act. Action does not require total agreement with the open access movement's beliefs and proposals, but it requires an active engagement with them. The movement is not monolithic, but diverse. Not closed, but participatory. Not dogmatic, but argumentative as it vigorously debates its future. It can be influenced by new voices and perspectives. The open access movement is not the only potential solution to the serious problems that libraries face in the conventional scholarly communication system, but it is a very important one, and it does not require that other strategies be abandoned. The voice of libraries needs to be heard more strongly in it.

Monday, January 09, 2006

French anthology on OA archiving

Christine Aubry and Joanna Janik (eds.), Les Archives Ouvertes : enjeux et pratiques. Guide à l’usage des professionnels de l’information, Ouvrages ADBS, 2005 (332 pages - ISBN 2-84365-079-8). An anthology of essays on OA archiving. See the table of contents.

Also see Yves Desrichard's review of the book, Les archives ouvertes : enjeux et pratiques, Bulletin des Bibliothèques de France, 50, 6 (2005). Stevan Harnad has translated this excerpt of the review into English:

The chief merit of this work is its survey of the basic principles of this "Open Access" of which "open archives" are but one component. In this regard it is not to slight the other contributors to note that, if one could read but one single article, it would have to be that of Helen Bosc, of the National Institute of Research in Agronomy in Tours. With her "Open Archives: A Fifteen-Year History", she provides, not of a mere a constipated chronology, but a veritable "Survival Manual" for the librarian or documentalist wishing either to get involved in projects based on these concepts or merely to keep informed.

Hélène Bosc has self-archived her essay, Archives Ouvertes : quinze ans d'histoire. Here's the English edition of her abstract:

Both the idea and the benefits of providing Open Access to scientific publications are now coming to be understood by a growing number of researchers who publish in new open access journals. However, the potential of the complementary strategy of self-archiving in Open Archives -- which could provide immediate open access to all scientific articles -- remains still under-utilized. This chapter describes the various attempts in the last fifteen years to generalize the self-archiving practice first adopted by physicists, who have been providing open access to their work since 1991. The benefit of self-archiving and open archives are explained . The adoption of the practice of self-archiving by researchers and their institutions is still too slow. The reasons for this delay are discussed. The official policies adopted by various institutions following the Berlin 3 meeting in Southampton (U.K.) now makes more likely that the practice of self-archiving will spread considerably in 2005. The open access thereby provided will increase the scientific impact of research worldwide.

Jean-Michel Salaün has also self-archived his essay, Libre accès aux ressources scientifiques et place des bibliothèques. Here's the English-language abstract:

Free access to scientific resources and the role of libraries In the field of scientific publishing, the movement for open archives has led to changes which the author of the article analyses from four points of view. The first highlights the relationship between libraries and publishing and its destabilisation due to digitization. A more historical approach identifies the contribution of three parallel movements: the development of the web, the saturation of the publishing industry and international scientific politics. An approach per discipline gives an overview of ongoing evolution. Finally an economic approach puts the accent on added value from the various players, its remuneration and the limits of the consideration afforded to available documents.

Strategies for populating IRs

Kathleen Shearer of the Canadian Association of Research Libraries (CARL) is collecting strategies for populating institutional repositories. If you have ideas to share, please contact her.

Blackwell's expertise on OA

Blackwell issued a press release today on its journal program at the start of 2006. It contains this sentence:
Blackwell also provides expert guidance on a range of publishing issues, such as peer review, open access, and readership.

Comment. Blackwell has a commendable self-archiving policy and author-choice OA model called Online Open. But before journals consider it an expert on OA they should consider the hasty generalization of its President Bob Campbell ("The OA model is not secure financially, it isn't delivering a stable platform and I don't think it's sustainable" --November 2005) and the uninformed comments of its CEO Rene Olivieri ("the economics [OA] employs is more in the Marx and Lenin mould than in the neoclassical tradition recognised by most economists today" --August 2005).

Geist: We'll get the opposite of what we need

Michael Geist, Tech laws we need, Toronto Star, January 9, 2006. Excerpt:

The international treaty we need: The World Intellectual Property Organization (WIPO) will continue discussions on its "development agenda" in the coming year with the desperate need for acceleration on an Access to Knowledge Treaty. These discussions mark a turning point for the organization, as dozens of developing countries challenge the status quo on intellectual property protections that limit access to medicines and cutting-edge research.

The international treaty we will get: Under intense pressure from the United States, WIPO will seek to put the finishing touches on a Broadcasting Treaty that provides significant new rights to broadcasters and webcasters. Although there is little evidence that the treaty is needed (a slimmer treaty on piracy would address most legitimate concerns), supporters hope to close out 2006 with a near-final text in hand.

Report on the Madras IR workshop

A. Amudhavalli, Building archives for the digital era, The Hindu, January 9, 2006. Excerpt:
Open Access and Institutional Repositories complement each other. They are carefully built databases that open the gates of knowledge for academics, researchers and students....Open Access (OA) and Institutional Repositories (IR) have gained prominence in this sphere since 2004. It is estimated that in the next 10 years almost all the academic institutions are likely to be running an IR. Library and Information Science professionals have often requested MALA [Madras Library Association] to conduct a training programme to help build an IR. The Computer Society of India (CSI) recently came forward to collaborate in this endeavour, drawing upon the skills of software professionals engaged in similar activities. MALA and CSI took the process forward with a three-day workshop on 'Open Access and IR' to familiarise, train and equip professionals, using the Dspace platform....OA and IR are not synonyms, though the underlying philosophies are common. An IR is a set of services that an institution offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It emphasises organisational commitment to the stewardship of digital materials, including long-term preservation, organisation, access and distribution. IR also represents e-prints archives, as digital archives of the research output created by faculty, scholars and students and make it accessible to end-users on the Internet....By facilitating interoperability, the Open Archives movement has accelerated the deconstruction of the traditional scholarly publishing model and increased the potential for institutional repositories....Significantly, [DSpace] also seeks to build a repository system that can support a federation of institutional repositories. In other words, DSpace is a digital repository system that captures, stores, indexes, preserves, and redistributes an organisation's research data. To support this goal, the DSpace project is exploring related issues including access control, rights management, versioning, retrieval, faculty receptivity, community feedback and flexible publishing capabilities....M.G. Sreekumar, Librarian, Indian Institute of Management Kozhikode, and his team, supported by K.T. Anuradha of the Indian Institute of Science, Bangalore, guided the [Madras] workshop. The beneficiaries were the 45 principal stakeholders of IR, including librarians, information, computer and software professionals from colleges, universities, R and D laboratories, CSIR units, corporate sector, hospitals such as Apollo and Sankara Nethralaya, Tamil Nadu Dr. M.G.R. Medical University, teaching faculty in the disciplines, business firms, publishing sector and the media. The participants got an overview of the role of subject descriptors (metadata) in describing digital objects for the IRs. The Dublin Core (DC) metadata standard, recommended by the WWW Consortium (W3C) for describing web objects and metadata standards such as `METS' and `MODS' were explained. Participants included Prof. Nirmala Prasad, principal, MOP Vaishnav College, H.R. Mohan, chairman, Division VIII, and Chennai Chapter, CSI, Prof. S. Narayanan, Dean, Academic and Chair-Library Committee, IIT-Madras, Prof. S. Parthasarathy, senior member, MALA, and R. Seshadri and R. Samyuktha, vice-presidents, MALA.

OA at India's MEDLARS Centre

Naina Pandita, Indian MEDLARS Centre and open access, a PPT presentation at the 93rd Indian Science Congress, Special Session on Open Access (Hyderabad, January 6, 2006).

NEJM perspective on medical search and OA

Robert Steinbrook, Searching for the Right Search — Reaching the Medical Literature, New England Journal of Medicine, January 5, 2006 (accessible only to subscribers). An editorial on the NIH public-access policy, among other topics. I'll try to post an excerpt after I can gain access to the full text. (Thanks to medpundit.)

Update. Here's an excerpt from the text:

Web-based search engines are transforming our use of the medical literature....Although we continue to read the print issues of journals and to browse current issues online, we are now using links from Google...and other search engines, as well as citation links in other articles, to gain direct access to the articles we want....“What readers see now are articles,” [John] Sack [Director of Highwire Press] said recently. “They don’t see articles bound in the context of issues or in the context of well-known journals. This has been happening for a while, but it has been greatly accelerated by the Internet and by Google and other search engines that are indexing everything that is out there.”...The rapid changes are illustrated by data compiled by HighWire Press. In June 2005, Google provided the majority (56.4 percent) of the referrals from search engines to articles in HighWirehosted journals...PubMed accounted for 8.7 percent, Google Scholar 3.7 percent, and Yahoo 3.4 percent....When he first saw similar data earlier in the year, Sack recalled, he was “surprised that Google had greatly surpassed PubMed and that a new product such as Google Scholar had approached half of PubMed’s referrals within a few months.”....Because of the limits of other online sources, central electronic repositories of journals and articles serve a critical archival function, according to Dr. David Lipman, the director of the National Center for Biotechnology Information at the National Library of Medicine, home to PubMed and PubMed Central....Central repositories can also store supplemental data and may permit more detailed searches and a greater ability to retrieve and manipulate the underlying information than is possible with papers that may be archived in different formats at different sites. “Biomedical research has changed,” noted Lipman. “Every paper has more and more data. People are not just reading these papers. Researchers want to compute on the underlying data.” The NIH is seeking to expand public access to the research it sponsors and to increase the usefulness of PubMed Central. As of May 2, 2005, the NIH has asked the investigators it supports to submit voluntarily to PubMed Central an electronic copy of any scientific report, on acceptance for publication....However, the initial response to the voluntary policy has been slow. With 100 percent participation, about 5500 peer-reviewed manuscripts that have been accepted but not yet published — equivalent to about 10 percent of the articles indexed monthly by PubMed — would be submitted to PubMed Central each month, according to Lipman. As of July 9, 2005, 340 such unpublished manuscripts (or about 165 per month) had been submitted — a participation rate of only 3 percent....In December 2005, Senators Joseph Lieberman (D-Conn.) and Thad Cochran (R-Miss.) introduced legislation that would require the public posting of all NIH-funded peer-reviewed manuscripts at PubMed Central within six months of their publication. Failure to comply could result in the loss of public funding for federal employees or grantees....Search engines and the Internet are not only changing the medical literature. They are also challenging the traditional economics of scholarly publishing and fueling heated debate about the extent to which the biomedical literature should be accessible online and available without charge to the user.

Sunday, January 08, 2006

Notes on Alma Swan's presentation yesterday

Ed at NewsFromBlore has blogged some notes on Alma Swan's presentation yesterday at the Indian Institute of Science (IISc) in Bangalore. Excerpt:
It is surprising that lack of access to scientific information is regarded as such as obstacle in a well-funded and respected university in a developed western country. The reasons cited for this are: 1) The high cost associated with maintaining an up to date library of journals in educational and research institutions such as Nottingham University. 2) The huge and growing number of scientific journals that are available. This not only increases the cost of maintaining the library of journals. It also means that searching for a specific paper or topic across all the possible journals it could be in is difficult and time consuming. 3) Copyright claims of printed journals can be restrictive and may stifle the ease with which papers can be distributed amongst the wider community....A survey quoted by Dr Swan estimates that on average across all natural science subjects, research papers receive 50% greater exposure in terms of readers if they are [deposited] in an Open Access repository than if they are published in a traditional journal. This is a conservative estimate. The average increase in exposure for Open Access Physics papers alone is 250%!...The advantages that OA brings to scientists that Dr. Swan described are broad and far reaching. As authors, scientists can vastly increase the circulation of their paper, encouraging feedback and peer view from a broader section of the scientific community. As researchers, scientists can have free and instantaneous access to the material they need, when they need it. As teachers, scientists will have direct access to teaching material and will not have to be concerned with copyright issues associated with duplication of the material. So what are the reasons for not using OA, and why does only 15% of the Indian scientific community currently use it? Several preconceptions about Open Access are shared by community members. In some cases it is feared that submitting a paper to an online repository will be difficult and time consuming. According to Dr. Swan this is not a real concern at all. With only basic computer skills a user may submit an article in only a few minutes. In addition, a survey carried out amongst the scientific community found that an overwhelming percentage of users found the system “very easy” to use....One way that institutions can increase the use of OA is to take the decision out of the hands of individual researchers and make OA publishing compulsory. Dr Swan provided data which showed only a small number of researchers worldwide would actively oppose such a decision by their institute. In Asia, there seemed to be almost no opposition to OA adoption.

More on the withdrawal of OA environmental data

OMBWatch issued a new report last month, Dismantling the Public's Right to Know: EPA's Systematic Weakening of the Toxic Release Inventory, December 1, 2005. (Thanks to Free Government Information.) Also see the press release. From the executive summary:
Under the Bush administration, the Environmental Protection Agency (EPA) is slowly dismantling its flagship [open access] environmental information tool --the Toxic Release Inventory (TRI)....The TRI tracks the amount and types of toxic chemicals released into the environment, stored at facilities, or transferred in between facilities. The program’s authority comes from the Emergency Planning and Community Right-to-Know Act (EPCRA), enacted in 1986....The primary purpose of the TRI is to allow citizens access to information on chemical hazards in their communities....The program has been protected and improved for over the last 15 years, since it was put in place during the Reagan administration....Unfortunately, the program’s success has made it a target for those that seek to reduce corporate oversight and accountability. The easy access to pollution information provided by TRI has empowered citizens to push for improvements, and facilities have acted to reduce releases. Since facilities began reporting in 1988, there has been a nearly 60 percent reduction in total releases of the 299 core chemicals that the program began tracking. This is a significant drop, one that was fueled by merely making information publicly available. As new chemicals have been added to the TRI program, those releases have also dropped. This year, EPA reported a 42 percent reduction in releases and disposal of the more than 650 chemicals now tracked under TRI over the 6 years between 1998 and 2003. TRI is EPA’s premier database of environmental information, and it demonstrates the power that information holds to promote change that benefits everyone’s environment, health and safety....Despite the program’s positive impacts, the TRI is under attack from the very agency administering this success story. EPA’s recent actions and stated plans are geared to downgrade and weaken the TRI program. These actions represent a recent and definitive shift in EPA’s approach to TRI and are largely a result of the current administration’s political priorities --corporations first, communities last.