Open Access News

News from the open access movement

Saturday, March 11, 2006

Publisher division deepening on Google book-scanning

VNU Staff, Publishers call for library digitisation boycott at Book Fair, Information World Review, March 10, 2006. Excerpt:
Google used the London Book Fair (LBF) as a platform to reach out to a wary book trade, as it revealed plans to expand its controversial Library Project to include European libraries. On the eve of the fair, Bloomsbury, which includes Whos'e Who publisher A&C Black , c.e.o. Nigel Newton called on the industry to boycott Google's search engine "until it desists from its present misguided mission in the world of books". He described Google as "a false prophet" engaged in "acts of 'kleptomania'". But Jens Redmer, director of Google Book Search in Europe, told IWR sister title The Bookseller that Google is talking to further library partners in all major European countries. Google is currently working with only four American libraries and the Bodleian in Oxford. Redmer added that in-copyright works would not be scanned in Europe, where copyright laws are "significantly different" to the US.

Meanwhile, publishers involved in Google Book Search reported increased backlist sales at an LBF session hosted by the search engine....Blackwell's has put 5,000 titles into Google Book Search and has had 57,344 "buy this book" click-throughs. "The high rate of 'buy this book' clicks is translating into small sales for our deep backlist," said book sales director Ed Crutchley at a Monday session. HarperCollins US, a business and humanities publisher has put 6,000 titles into the programme. Group president Brian Murray said Google has delivered over six million page views in 16 months. Murray added that, although they had not seen much income from related advertising, the initiative was more about marketing. "It drives highly-qualified traffic to our site. The results suggest it leads to book purchase and intuitively we believe this."

PS: Some publishers are acting on a faith-based fear of harm and some are acting on an evidence-based record of benefit.

Software for tagging eprints in Eprints archives

New Connotea software supports institutional repositories, a press release from the Nature Publishing Group, March 10, 2006. Excerpt:
Nature Publishing Group (NPG) has released new software which enables institutional repositories running EPrints to integrate with the social bookmarking services Connotea and This latest innovation allows content within institutional repositories to be bookmarked, tagged, and linked to related content. The work behind this development was funded by the Joint Information Systems Committee (JISC) as part of their PALS Metadata and Interoperability Projects 2 program. Once installed in a repository, the software will enable users to bookmark documents in that repository using their Connotea or account, assigning their own tags and without leaving the web page. They can also see what tags have already been assigned to the document they are viewing in the repository and click on links to related content, either within the same repository or elsewhere on the web. If bookmarked in Connotea, the bibliographic metadata for the institutional repository item can be automatically imported. Connotea already does this for items bookmarked from several other sources, including Nature, PubMed, Science, Blackwell Synergy, Wiley Interscience and Amazon. Recognizing the importance of the content within institutional repositories, this new functionality will allow such content to be integrated and linked with the wider scientific literature.

OA legal scholarship at Lewis & Clark

Yesterday the Law Library at Lewis & Clark Law School launched a web site on Open Access Legal Scholarship at Lewis & Clark. From the site:
Today we introduce the latest addition to the law library web site - Open Access Legal Scholarship - a resource for those who are interested in learning more about open access scholarship and publishing, in both law and other fields.  It has been created by Professor Joseph Miller in conjunction with the Lewis and Clark Law School 2006 Spring Symposium, Open Access Publishing and the Future of Legal Scholarship [PS: held on March 10].  Open Access has been briefly described as “the electronic publication of scholarly work that is available for free without copyright constraints other than attribution.” Paul George, Members’ Briefing: The Future Gate to Scholarly Legal Information, AALL Spectrum (April 2005). See the Open Access introduction for an expanded discussion.  Our own law reviews have provided open access to their most recent issues, with Environmental Law publishing the full-text of Symposium: Ballot Measure 37: The Redrafting of Oregon’s Landscape (v.36, n.1 2006), and Lewis & Clark Law Review with Paper Symposium: Federalism After Gonzales v. Raich (v. 9, n.4 2006).  Included in our new Open Access Legal Scholarship section are:  [1] An Introduction to Open Access, [2] Blogs, [3] Core Documents, [4] Open Access Journals, [5] Projects and Gateways, [6] Self-Archive Repositories

Google working with publishers on paid-access plan for scanned books

Kimberly Maul, Publishers to Control Paid-Access Books Available Through Google, The Book Standard, March 10, 2006. Excerpt:
In an attempt to work with publishers and others opposed to the Google Book Search project, Google today announced its first plan for publishers to provide --for a price-- the full text of books online. Though the agreement, publishers can decide to have the full text of books available through Google’s program while the publisher still has control over the price—which they can change whenever they want. Google will take a portion of the profit, similar to an ad-revenue share model. “Virtually every partner we have spoken to has been extremely enthusiastic,” said Google executive Jim Gerber, Publisher’s Marketplace reported.

Comment. This development isn't directly related to OA, so I won't be covering it in depth. But it may increase the number of publishers willing to let Google digitize their books and, therefore, enlarge the corpus of book literature indexed for free, full-text searching. It won't improve our access for reading, but it will improve our access for searching.

More on libraries as publishers

Sarah E. Thomas, Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner, a PPT presentation at the 8th International Bielefeld Conference, 2006. Self-archived March 9, 2006.
Abstract: What can an academic library contribute to scholarly publishing? The Cornell University Library has engaged in a number of activities in the publishing realm that aim at increasing affordable, effective, widespread, and durable access to research. Cornell's Center for Innovative Publishing operates the arXiv, an e-print service for physicists, computer scientists, mathematicians, and others; Project Euclid, a journal hosting service for over 40 titles in math and statistics; and is developing, with Pennsylvania State University, DPubS, an open source publications management software. Cornell's DCAPS, or Digital Consulting & Production Service, assists in the transition of print to electronic through its digitization, metadata production, and consulting service. Digital publications are preserved according to a well-developed policy for digital archiving, ensuring ongoing access to information across time. The Cornell University Library's Center for Innovative Publishing is one manifestation of publishing activity undertaken by academic libraries as part of a movement to increase access to scholarship in an affordable manner, to ensure the ongoing availability of scholarly information in a way that is consistent with the traditional library role of preserving the record of our civilization from generation to generation, andwhich seeks to apply innovative techniques in the management and delivery of information to scholars.

Update. A new version of this article was self-archived on March 23, 2007.

LIS journals and permission for self-archiving

Anita Coleman, Self-Archiving and the Copyright Transfer Agreements of ISI-Ranked Library and Information Science Journals, a preprint. Self-archived March 10, 2006.
Abstract: This paper has been accepted for publication in the Journal of the American Society for Information Science and Technology. A study of Thomson-Scientific ISI ranked Library and Information Science (LIS) journals (n=52) is reported. The study examined the stances of publishers as expressed in the Copyright Transfer Agreements (CTAs) of the journals, towards self-archiving, the practice of depositing digital copies of one's works in an OAI-compliant open access repository. 62 % (32) do not make their CTAs available on the open web; 38 % (20) do. Of the 38 % that do make CTAs available, two are open access journals. Of the 62 % that do not have a publicly available CTA, 40 % are silent about self-archiving. Even among the 20 journal CTAs publicly available there is a high level of ambiguity. Closer examination augmented by publisher policy documents on copyright, self-archiving, and instructions to authors, reveal that only five, 10% of the ISI-ranked LIS journals in the study, actually prohibit self-archiving by publisher rule. Copyright is a moving target but publishers appear to be acknowledging that copyright and open access can co-exist in scholarly journal publishing. The ambivalence of LIS journal publishers provides unique opportunities to members of the community. Authors can self-archive in open access archives. A society-led global scholarly communication consortium can engage in the strategic building of the LIS information commons. Aggregating OAI-compliant archives and developing disciplinary-specific library services for an LIS commons has the potential to increase the field's research impact and visibility. It may also ameliorate its own scholarly communication and publishing systems and serve as a model for others.

PS: This is an updated version of an article archived in January, blogged here 1/24/06.

Friday, March 10, 2006

India's Knowledge Commission launches its web site

India's National Knowledge Commission, launched in June 2005, now has its own web site. The site has separate pages for each of the NKC's five "focus areas": access to knowledge, knowledge concepts, knowledge creation, knowledge application, and knowledge services. The access to knowledge page doesn't mention OA, but comes close in these statements of commitment:
Information networks and a culture of information-sharing are required in sectors like education, health, agriculture, business, R&D, food distribution, disaster management, security, etc....National web-based portals need to be established as one-stop comprehensive sources of information on issues like water, sanitation, health, education, housing, nutrition, employment, etc. Technology and the Internet also have an important role in making the recently legislated Right to Information Act more effective in its implementation.

Comment. I have a few recommendations for the NKC suggestion box: (1) support a network of OA institutional repositories at India's universities and research centers, (2) require recipients of publicly-funded research grants to deposit their peer-reviewed manuscripts in these repositories, and (3) encourage these institutions to adopt their own local policies requiring researchers to deposit their research output in them, whether it is publicly-funded or not.

The case for using IRs for more than research eprints

Dorothea Salo, What's an IR for? Caveat Lector, March 8, 2006. Excerpt:
Arthur Sale’s risk assessment for institutional repositories is every bit as good as everyone says it is. Should be in every repository-rat’s documents drawer. In it, however, we find repeated the assertion that an IR should limit itself strictly to the peer-reviewed research literature of its target population. I still think that’s deeply wrong, but it’s up to me to defend my belief. The cited concern is cost. Further details are sketchy, but the general idea seems to be that doing “digital-library stuff,” whatever that is, requires a lot of technical jiggery-pokery that costs a lot of money, and loading that into an IR’s budget makes the IR look cost-ineffective, which creates the impression that OA is cost-ineffective....To put it briefly: if what you want is Greenstone, don’t use DSpace....Still, it does not follow that an IR is intrinsically poorly-suited to every conceivable digital-library need beyond archiving peer-reviewed research. To be a good fit with an IR, a project should consist of individual, self-sufficient pieces of work that don’t really need to be seen next to each other or manipulated during viewing by the patron....To me it seems absurd and arrogant to forbid a library that’s undertaken an IR project to use it for purposes that otherwise make sense but don’t consist of peer-reviewed literature....[T]he alternative --I speak frankly-- is an empty repository. It’s dead simple to set up an empty repository. A lot of people have. An empty repository strikes me as far more likely to be accused of misallocation of resources, fold, and threaten OA by folding, than a repository that has made itself useful in other ways besides holding on to peer-reviewed research....[T]he only way we get [to better times] is by enduring the current grim times long enough. Which means we can’t --absolutely cannot-- sit around with our IR doors barred to everything but peer-reviewed research while we wait for mandates that may never come.

More on OA to data

Glyn Moody, The Dream of Open Data, Open..., March 9, 2006. Excerpt:
Today's Guardian has a fine piece by Charles Arthur and Michael Cross about making data paid for by the UK public freely accessible by them. [PS: see my blogged excerpt.] But it goes beyond merely detailing the problem, and represents the launch of a campaign called "Free Our Data". It's particularly good news that the unnecessary hoarding of data is being addressed by a high-profile title like the Guardian, since a few people in the UK Government might actually read it. It is rather ironic that at a time when nobody outside Redmond disputes the power of open source, and when open access is almost at the tipping point, open data remains something of a distant dream. Indeed, it is striking how advanced the genomics community is in this respect. As I discovered when I wrote Digital Code of Life, most scientists in this field have been routinely making their data freely available since 1996, when the Bermuda Principles were drawn up. The first of these stated:
It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.

The same should really be true for all kinds of large-scale data that require governmental-scale gathering operations. Since they cannot be feasibly gathered by private companies, such data ends up as a government monopoly. But trying to exploit that monopoly by crudely over-charging for the data is counter-productive, as the Guardian article quantifies. Let's hope the campaign gathers some momentum - I'll certainly being doing my bit.

Richard Poynder interview with Michael Hart

Richard Poynder has posted his interview with Michael Hart, founder of Project Gutenberg. This is the first installment of The Basement Interviews, Poynder's blog-based OA book of interviews with leaders of many related openness initiatives. Excerpt:
Immediately seeing the potential of the network as a revolutionary new medium for distributing information, Hart was soon typing in entire books, including the Bible, all of Shakespeare, and Alice in Wonderland. Thus was born Project Gutenberg — a project that rapidly turned into an ambitious scheme to make electronic copies of 10,000 out-of-copyright books freely available on the Internet. Hart's mission: "to break down the bars of ignorance and illiteracy." In retrospect Project Gutenberg was both prescient and revolutionary. In effect, Hart had become the first "information provider" twenty years before Tim Berners-Lee invented the Web, and at a time when there were, says Hart, just 100 people on the network....Since then the number of volunteers has grown from tens, to hundreds, to thousands, and today Project Gutenberg offers over 17,000 e-texts, all of which can be freely downloaded in a wide variety of formats. In addition, there are now national Project Gutenbergs in Australia, Germany, Portugal, Canada and the Philippines, and plans are under way to create local projects in Africa, Asia, and other regions too. New obstacles were to arise however: while copyright had always posed a challenge for Hart, the 1998 Sonny Bono Copyright Term Extension Act — extending US copyright by a further 20 years — removed one million potential eBooks from the public domain in one fell swoop. With copyright now averaging 95.5 years, and creators no longer needing to register their copyright, Hart began to fear that the public domain could disappear all together, undermining the raison d'être of what by then had become his life's mission....For Hart the stakes are high, since he views Project Gutenberg as more than just the first and largest distributor of public domain eBooks. In addition, he argues, it is a primitive example of a "replicator" (a reference to a Star Trek machine envisaged as being capable of copying any inanimate matter by rearranging subatomic particles), and so therefore also a "lever to the Neo-Industrial Revolution."

RP:...How much is being lost to the public domain as a result of the Sonny Bono Act?

MH: The answer is about $10 trillion. This is based on a calculation of just one cent per book per lifetime, and assumes only about 15% of the world are readers, and only one million books are being lost to the public domain....

RP:...I'm going to trust your math! But what you are talking about, surely, is a product of an information explosion, not of any changes to the law?

MH: Right. But what we face is a situation in which the rise in the percentage of copyrighted information due to growth is being compounded by the constant extension of copyright terms. Put these two elements together and you end up with a situation in which well over 90% of all copyrights ever granted are still in force. In other words, if you combine the information explosion with the fact that the average copyright term has risen from about 15 years to around 95 years then the public domain — which was about 50% of everything every written a century ago — will fall to around 0.00001% or less just a century into the future....

RP: What's the end game?

MH: The future mission is to create 10,000,000 eBooks and translate them into 100 different languages.

RP: Effectively, you plan to make every book in the public domain available as an e-text?

MH: Right. Once Project Gutenberg has a million items to offer, it should be an easier task to add the remaining 9 million items that it is estimated will become available in the public domain between 2010 and 2020. Then in its final stages Project Gutenberg will focus on collecting materials from all 100 languages, and disseminating them in other languages. So the eventual aim is to be able to offer 10 million eBooks in 100 languages to as many readers as possible.

Does OA depend on findability or vice versa?

Dean Giustini, Open access is impossible without findability, OA Librarian, March 9, 2006. Excerpt:
Open access (OA) advocates like Peter Suber and my colleagues here at OA Librarian do a marvelous job of documenting the progress of the OA movement. In a post-OA world, however, what about findability? What about the search side of the equation? Without search engines like Google, for example, what happens to easy findability?? The problem is likely to be exacerbated as the web scales in size, and complexity.  Authority destabilizes in open access models. I am thinking in terms of authority files in catalogues but also with respect to authoritative information. I grew up in a small suburb of Calgary, Alberta where authority was never questioned, where the World Book Encyclopedia was "what was right". For all its limitations, at least a ten year old could find the World Book confidently at the local public library. Can that same ten year old trust Wikipedia?  OA librarians need to spend time and intellectual energy thinking about OA advocacy beyond free information for all. Dismantling paid search, for example. Advocating for OpenSearch, as in PubMed, but not just in medicine. Finally, the future of open access models on the web must be flexible enough to accomodate new means of findability - ie. algorithms, tagging, folksonomies, social software - but continue to build on the tried-and-true tenets of library science.

Comment. Thanks for the plug. I have a couple of nits to pick, however. (1) OA will enhance findability by making content open for indexing by all comers, from the established players to newcomers with innovative ideas. It's true that the adequacy of search is challenged by the rapid growth of the web, but it's also true that OA is a necessary condition for the adequacy of search in a rapidly growing web. OA does not depend findability, if only because because it always brings findability with it. It's more true to say that findability depends on OA. (2) Why does "authority destabilize in open access models"? I think Dean is mixing up OA and peer-review reform, which are independent projects. Authority may destablize for OA resources that bypass peer review, or experiment with less rigorous or more fallible vetting models, like Wikipedia. But there's nothing intrinsic to OA that calls for abandoning or weakening peer review. On the contrary, all the major OA declarations agree on the importance of peer review. Because the rigor of peer review does not depend on the medium or price of a publication, an OA journal or encyclopedia can acquire the same kind and level of authority as the best non-OA resources. Good examples are PLoS Biology and the Stanford Encyclopedia of Philosophy. In short, Wikipedia is not the poster-child of OA! It mixes OA with a communal-review model that is not at all typical of the journal literature central to the OA movement.

Meeting on mass digitization

The meeting today and tomorrow in Ann Arbor, Scholarship and Libraries in Transition: A Dialogue about the Impacts of Mass Digitization Projects, will be webcast for those can't attend. You can also follow the proceedings through the conference blog.

Publishers who resist Google indexing shouldn't pretend to speak for authors

Tom Evslin, John Battelle’s The Search and Google Book Search, Fractals of Change, March 7, 2006. Evslin interviews John Battelle. (Thanks to Ray Corrigan.) Excerpt:
While I was writing a review (to appear soon) of John Battelle’s prescient book The Search, I noticed something on the copyright page. Here it is:
The scanning, uploading, and disstribution ofo t his books via the Internet or via any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of the author's rights is appreciated.

This warning seems directly aimed at Google Book Search, a project which intends to scan the collections of some of the world’s great libraries and make them searchable online. Now you can find similar language on the copyright page of lots of books but John Battelle is a known strong supporter of the value of having almost everything searchable as anyone who reads either his book or his blog knows. So I emailed John and asked him about the apparent contradiction.He said the decision was the publisher’s (Penguin) decision to make but “I totally disagree with it.”Of course, at the time he signed his contract with Penguin, no one knew that this issue would exist.He readily agreed to talk to me it.

Q: “Why didn’t Penguin want your book to be in Google Book Search?”

John: “They’re suing Google over Book Search. They’re part of the Publisher’s Association suit.”

Q:“What are they afraid of?”

John: “They’re afraid of the future.Afraid of what they don’t know…. It’s very irritating to me.” [...]

Q: “How do you think this issue will be resolved in the far future - not the lawsuit but the underlying issue?”

John: “Publishers should be service providers and let authors make these decisions.”He went on to say that, now that the Internet and fast computers exist, you don’t need to make decisions like this en masse; you don’t need huge corporate entities making a one-size-fits-all decision.; authors themselves can choose from a myriad options.This one of the principles of his newly formed Federated Media.

Comment. (1) Battelle's solution is the simplest and easiest. Let authors decide. (2) At least publishers who make this decision without consulting authors, and over the dissent of authors, should not pretend to speak for authors. As Evslin points out later in the interview, "the last sentence of Penguin’s prohibition – 'Your support of the author’s rights is appreciated.' – seem particularly hypocritical." (3) Penguin has let Lawrence Lessig provide open access to the entire text of Free Culture under a CC license. Why can't it take the much smaller step of letting John Battelle let Google make his book searchable and discoverable?

Open courseware comes to the Open University

Alexandra Smith, OU to bring all course content online, The Guardian, March 10, 2006. Excerpt:
The Open University will become the first institution in the UK to put all its course materials online later this year, giving all students and teachers free access to study notes and reading lists. The university will select educational resources from all levels from access to postgraduate study and from a full range of subject themes, including arts and history, business and management, languages and science and nature. The material will be free to teachers and students studying in the UK and abroad, with the project following a long partnership with the BBC, which broadcasts the university's television programs....The university's vice-chancellor, Brenda Gourley, said the project would not only benefit the students studying at the university, but also students in countries where they were unable to access text books or quality course material. Prof Gourley said: "The philosophy of open access and sharing knowledge is a wonderful fit with the founding principles of the Open University and with the university's very strong commitment to opening up educational access and widening participation....Prof Gourley said the Open University would be the first in the UK to offer open content material on the internet, following the lead of several US institutions...."[Open courseware] is definitely a movement that is really going to change universities," Prof Gourley said....The £5.65m project will be partly funded by a US$4.45m (£2.56m) grant from the William and Flora Hewlett Foundation in the US. The Open University has more than 210,000 students studying courses this year, with around 40,000 studying outside the UK. The online project will start in October.

Update. The first edition of the Guardian story, quoted above, was in error to report that "all" OU's course materials would be part of the new project. The Guardian has since rewritten its story. Also see the OU press release. (Thanks to Marc Eisenstadt.)

Thursday, March 09, 2006

More on strengthening the NIH policy

Rick Weiss, Government Health Researchers Pressed to Share Data at No Charge, Washington Post, March 10, 2006. Excerpt:
Political momentum is growing for a change in federal policy that would require government-funded health researchers to make the results of their work freely available on the Internet. Advocates say taxpayers should not have to pay hundreds of dollars for subscriptions to scientific journals to see the results of research they already have paid for. Many journals charge $35 or more just to see one article -- a cost that can snowball as patients seek the latest information about their illnesses. Publishers have successfully fought the "public access" movement for years, saying the approach threatens their subscription base and would undercut their roles as peer reviewers and archivists of scientific knowledge. But the battle lines shifted last month when a National Institutes of Health report revealed that a compromise policy enacted last spring -- in which NIH-funded scientists were encouraged but not required to post their findings on the Internet -- has been a flop. Less than 4 percent filled out the online form to make their results available for public viewing.

Now a key federal advisory committee has recommended that scientists who receive NIH grants be required to post their results within six months of publication. And the Senate is considering legislation that would mandate such disclosures for an even broader array of federally funded scientists...."We think it is too early to jump into a mandatory system," said James Pringle of the Publishing Research Consortium, a loose-knit group created to fight the public-access movement. It is not just profit-hungry publishers who object to mandatory public access, opponents emphasize. Some nonprofit scientific and professional societies fear that without the income they receive from their research journals they will no longer be able to finance their educational and training programs. "We make money off our journals, but it all goes back to enhance publishing and to enhance the needs of our scientific community," said Martin Frank, executive director of the Bethesda-based American Physiological Society, which publishes 14 journals. The society runs an award-winning mentoring program for minority scientists and educational programs for elementary schools and high schools....

A National Library of Medicine working group concluded last November that scientists are well aware of the voluntary program -- the NIH has sent multiple e-mails to grant recipients, published a pamphlet and posted details on the Web -- and that the online submission system works well. The best way to boost compliance, a majority of the group concluded, is to make it mandatory. In February, the library's Board of Regents made a formal recommendation to Zerhouni that grant recipients be required to post their papers within six months after publication -- with some "flexibility" for infrequently published journals that might be hurt by free access to their contents within six months. Ruiz Bravo said the agency is considering the recommendation, but the publishing consortium is fighting back with data of its own. The group recently commissioned a survey of 1,128 scientists. It concluded that although 85 percent of scientists "have heard of" NIH's public access effort, only 18 percent know "a lot" or "quite a lot" about it. That suggests NIH could still do more to promote the voluntary policy, Pringle said....

Public-access advocates say opponents are simply stalling. "It has to be mandatory," said Rich Roberts, chief scientific officer of New England BioLabs in Ipswich, Mass. -- one of many who think that most scientists will not get around to posting their work unless they are told they must. Some in Congress appear to agree. After years of asking NIH to encourage public access, Sens. Joseph I. Lieberman (D-Conn.) and Thad Cochran (R-Miss.) upped the ante in December by introducing the American Center for Cures Act. It would require recipients of grants -- not only from the NIH but also from the Centers for Disease Control and Prevention and the Agency for Healthcare Research and Quality -- to post their final manuscripts within six months after publication, or risk losing funding. That is an option that makes publishers cringe. But it could get worse. A spokesman for Sen. John Cornyn (R-Tex.) said last week that the senator has been mulling over broader language that would compel public disclosure of research findings from an even greater number of federal agencies, including the Environmental Protection Agency and the National Oceanic and Atmospheric Administration. With that option looming, the National Library of Medicine's recommendation -- which applies only to NIH-funded research -- could start to look good to publishers.

Update. Rick Weiss' story made it to the news blog of the Chronicle of Higher Education, where it will be seen by academics who missed it in the Post.

More on the PRC study of the NIH compliance rate

Susan Morrissey, NIH Public Access Policy Is Having Little Impact, Chemical & Engineering News, March 9, 2006. Excerpt:
Although about 85% of NIH-funded researchers say they have heard about NIH’s policy on public access to research articles, only 18% of them report knowing specific details, according to a study by the Publishing Research Consortium (PRC), an international group of publishers and scientific societies. The survey of 1,128 journal authors was conducted in January. It focuses on how well authors who publish in the life sciences and medical journals understand NIH’s public-access policy. That policy, issued in May 2005, asks NIH-funded researchers to voluntarily post their manuscripts on PubMed Central, the agency’s online database, within one year of publication. The survey results also indicate that a lack of understanding about the policy has resulted in low submission rates: 24% of the NIH-funded authors surveyed reported that they have submitted a full manuscript to PubMed Central. Another 43% said they intend to do so in the future. Only 3% said they don’t plan to post manuscripts on the database. “As publishers, we are committed to working with NIH in improving dissemination of and enhancing access to scientific and medical research,” said PRC Chairman Robert Campbell in a statement, adding that the publishing consortium will work with NIH to facilitate author compliance.

PS: See my blogged comment on this study from 3/2/06.

Calling for OA to publicly-funded geospatial data in the UK

Charles Arthur and Michael Cross, Give us back our crown jewels, The Guardian, March 9, 2006. (Thanks to Glyn Moody.) Excerpt:
Imagine you had bought this newspaper for a friend. Imagine you asked them to tell you what's in the TV listings - and they demanded cash before they would tell you. Outrageous? Certainly. Yet that is what a number of government agencies are doing with the data that we, as taxpayers, pay to have collected on our behalf. You have to pay to get a useful version of that data. Think of Ordnance Survey's (OS) mapping data: useful to any business that wanted to provide a service in the UK, yet out of reach of startup companies without deep pockets. This situation prevails across a number of government agencies. Its effects are all bad. It stifles innovation, enterprise and the creativity that should be the lifeblood of new business. And that is why Guardian Technology today launches a campaign - Free Our Data. The aim is simple: to persuade the government to abandon copyright on essential national data, making it freely available to anyone, while keeping the crucial task of collecting that data in the hands of taxpayer-funded agencies. One government makes the data it collects available free to all: the United States. It is no accident that it is also the country that has seen the rise of multiple mapping services (such as Google Maps, Microsoft's MapPoint and Yahoo Maps) and other services - "mashups" - that mesh government-generated data with information created by the companies. The US takes the attitude that data collected using taxpayers' money should be provided to taxpayers free. And a detailed study shows that the UK's closed attitude to its data means we lose out on commercial opportunities, and even hold back scientific research in fields such as climate change....

In a seminal piece of research into the real cost of charging for access to public data, the late Peter Weiss, of the US National Weather Service, compared open and closed economic models for public sector data. His paper, Borders in Cyberspace: Conflicting Public Sector Information Policies and their Economic Impact, is online. He quoted a 2000 study for the European Commission carried out by Pira International, which noted that "the concept of commercial companies being able to acquire, at very low cost, quantities of public sector information and resell it for a variety of unregulated purposes to make a profit is one that policymakers in the EU find uncomfortable." But why? Pira pointed out that the US's approach brings enormous economic benefits. The US and EU are comparable in size and population; but while the EU spent €9.5bn (£6.51bn) on gathering public sector data, and collected €68bn selling and licensing it, the US spent €19bn - twice as much - and realised €750bn - over 10 times more. Weiss pointed out: "Governments realise two kinds of financial gain when they drop charges: higher indirect tax revenue from higher sales of the products that incorporate the ... information; and higher income tax revenue and lower social welfare payments from net gains in employment."

The Office of Fair Trading is preparing a report on public sector information, due this summer, which will "look at whether or not the way in which public sector information holders (PSIH) supply information works well for businesses. It will examine whether PSIHs have an unfair advantage selling on information in competition with companies who are reliant on the PSIH for that raw data in the first place." Though it may already be shooting for an open goal, we urge the OFT to compare the UK with the US; read Peter Weiss's paper; and then, finally, to free our data.

The case for OA, especially in South Africa

Allison Moller, The case for open access publishing, with special reference to open access journals and their prospects in South Africa, Masters thesis, Department of Library and Information Science, University of the Western Cape (South Africa), 2006. Abstract:

Open access publishing is an initiative that aims to provide universal, unrestricted free access to full-text scholarly materials via the Internet. This presents a radically different approach to the dissemination of research articles that has traditionally been controlled by the publishing enterprise that regulates access by means of subscriptions and licences fees levied on users, predominantly academic libraries.

In presenting the case for open access publishing, the thesis explores the contemporary research environment, changing modes of knowledge production, the problems associated with the existing academic journal system, and the subsequent growth of the open access movement as an intervention to reclaim scientific communication. It highlights the ways in which open access better answers the
requirements of researchers, funders, governments, and society more broadly. Free access to publicly funded scientific research is more democratic and is necessary for knowledge dissemination and production in a knowledge economy, particularly for developing countries such as South Africa. Attention is drawn to the ways that open access intersects with the ethical norms guiding the practice of research, with the idea of information as a public good, and with other parallel initiatives that resist the enclosure of knowledge through excessive copyright legislation.

The study also closely interrogates the economic viability of open access journals, and shows how the ‘author pays’ model represents a reasonable approach, but by no means the only one available to publishers considering the transition to open access. Sections are also devoted to examining the impact potential of open access articles and the ways in which open access journals can achieve greater permanence.

The main research question centres on the feasibility of open access journals becoming widespread within the South African research system. The study presents the findings of an investigation undertaken to assess the current awareness, concerns and depth of support for open access amongst South African stakeholders. Separate questionnaires were distributed to government departments, research councils, research funders, research managers within universities and a sample of published authors from biomedical fields.

The conclusion recommends proactive engagement by faculty librarians and organized advocacy on the part of LIASA to promote the cause of open access within South Africa. It further calls for government to mandate open access to publicly funded research as a more democratic, cost-effective and strategic intervention to promote South African science. The gains to be won are particularly relevant for present challenges: training a new generation of researchers and scholars, and stimulating knowledge production and its subsequent application to solve the nation’s developmental needs.

Another defense of Google Library

Victor Keegan, To scan or not to scan, The Guardian, March 8, 2006. Excerpt:

The University of Michigan used to keep its library under lock and key. Students were alllowed in once a week, but needed the librarian's permission before they could touch a book. Now, things are different. The university has given Google co-founder Larry Page (a Michigan alumnus) permission to digitise every one of its 7m volumes, making them available through the Google Book Search to anyone in the world with an internet connection. Other institutions including Oxford University’s Bodleian Library and the Library of Congress are also involved in the exercise which has mind-boggling implications for access to knowledge for everyone from Alaska to deepest Africa.

Who could possibly object to this? Publishers, of course.

Like the music industry before them, when first faced with digital downloads they are ordering the waves of technological progress to go away and not disturb their cosy world. Publishers in the US are suing Google over copyright; in his World Book Day address Bloomsbury's Nigel Newton even called for a boycott of Google's search engine....Would publishers object if Google's project led to an increase, rather than a decrease, in book purchases? I think not. There are already signs in America that Google Book Search is leading to a strong rise in demand for out-of-print books (although unless traditional publishers get their acts together the fruits of this boom may go to the new breed of print-on-demand publishers). I would be amazed if the same did not happen to books in copyright. So let American publishers sue to find out what "fair use" means. Doubtless the case will go to appeal; by the time it ends they, like music publishers before them, may experience a surge in demand for their books, especially those not readily available in bookshops. If, for example, someone searching on Google on a subject in which they are interested unexpectedly comes across a relevant book, reads a bit and orders a copy, one more book is sold, providing income to publisher and author and revenue for Google from contextual advertising. The search engine has undoubtedly been arrogant in having an opt-out rather than opt-in policy for the authors of the books it has scanned, but there is a strong public interest in bringing the millions of books lying fallow in libraries to the world's attention. A colleague was delighted to find that his copyrighted but out-of-print book was featured on Google Book Search, even though no one had asked his permission. Humankind is the winner.

More on Quaero

Jennifer Abramsohn, Europe's Quaero Project Aims to Challenge Google, Deutsche Welle, March 9, 2006. Excerpt:
Some Europeans are concerned about US hegemony in the worldwide information market. Now France -- and maybe Germany -- aims to develop a Eurocentric alternative to the dominant Internet search engine, Google....few people doing a search on Google or one of its competitors (Yahoo, MSN, and Alta Vista make up most of the remaining 10 percent of the market) give much thought to the fact that they are using the services of a for-profit US company....But for Wolfgang Sander-Beuermann, head of the search-engine research lab at the University of Hanover and part of a growing cadre of European Google-skeptics, the situation is downright dangerous. "Google is on the way to becoming the most global media power that ever existed on earth, and the potential for misusing it is so enormous it cannot be accepted," said Sander-Beuermann, who also founded the nonprofit Association for the Promotion of Search Engine Technology and Free Access to Knowledge (SuMa)....Another man on the case is French President Jacques Chirac, who is pushing an initiative for France, together with Germany and perhaps other EU nations, to develop a European search engine to compete with the American behemoth. "We must take up the challenge presented by American giants like Google and Yahoo. There is a the threat that tomorrow, what is not available online will be invisible to the world," Chirac said in a presidential address to the nation at New Year's. His project, spurred by the French government and headed by an Internet information company called, runs under the name Quaero (Latin for "I seek.") Germany's only link to Quaero at present is an agreement between a German meta-search engine,, and exalead. But several technology and information companies are considering the project. They include Siemens, Bertelsmann, and public broadcaster ARD (which Deutsche Welle and this Web site are part of)....Hendrik Speck is a lecturer in computer science at the University of Kaiserslautern, and is a strong Quaero activist. He seconds Sander-Beuermann’s concerns, and cites the example of searching for the keyword "Troy" on Google. "We all know that keyword has a rich cultural history. But at Google, most of the first results are a description of a second-class Hollywood production featuring a third-rate actor," Speck said, referring to the movie starring Brad Pitt....It is not yet clear how much European investors will ultimately be willing to spend, but their quest to build a rival to Google is likely to be very costly. Google spent $7.5 million (6.3 million euros) on one lab with 10 student in its development stages, "which is more than some of our universities have over here," Speck said. Meanwhile, the US search engine behemoth continues to spend $400 million annually on research and development.

What counts as an OA journal?

Jan Velterop, What is an OA Journal? The Parachute, March 8, 2006. Excerpt:
"Currently, the ISI Web of Knowledge includes 298 Open Access journals", according to Thomson Scientific. We also have the Directory of Open Access Journals (DOAJ), reporting (March 8, 2006) that it includes 2089 OA journals.  What, however, are 'Open Access Journals'? Do they exist? What's the definition? Journals that publish OA articles, or journals that publish only OA articles? Same question with regard to Open Access Publishers.

What does exist is publishers who publish journals in which open access articles appear. Not necessarily all the articles in a journal and not necessarily all the journals in a publisher's portfolio. Why the distinction? Well, by focussing on exclusively OA journals or OA publishers one risks overlooking - no, one overlooks - all the open access articles that are published in journals that are not exclusively open access. This was already foreseen in the Bethesda Statement on Open Access Publishing, in which the definition of open access carries the following rider: "Open access is a property of individual works, not necessarily journals or publishers."

There is a fundamental issue here. Thinking in terms of 'journals' can be rather misleading, simply because of their extreme variability....It can mislead to notions such as 'OA journals are less/more prestigious than non-OA journals', or 'one is used less/more than the other'....Journals are 'tags', 'labels', classifying, organising, tools. Lumping them and counting them and averaging them is fine as long as we realise that what we are concocting is a potage that may actually obfuscate rather than elucidate what the situation is regarding the constituent 'molecules' of scientific discourse: the articles.

Comment. Good point. BioMed Central's journals, for example, are unmistakably OA, but some of them include non-OA commissioned content, like review articles, alongside OA research articles. One property of OA journals is that they provide OA to their OA articles themselves and don't merely permit authors to do it through OA archiving. But that doesn't settle the question whether a certain portion of a journal's articles must be OA for the journal itself to be considered OA. It would be tempting to conclude that "full OA journals" and "hybrid OA journals" differ only in degree, not in kind. But that's not quite accurate either, since there's an important difference, in kind, between journals that let authors choose between OA and TA and journals that have already decided to make all their articles (of a certain kind) OA.

Report on ERIC Users Group Meeting

The ERIC Users Information Exchange has posted a report on the ERIC Users Group Meeting at the ALA Midwinter 2006 meeting.

PubChem keeps growing

Increasing the diffusion rate of scientific knowledge

Walt Warnick, Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science, a talk at the AAAS annual meeting, February 16–20, 2006. Warnick is the Director of the US Department of Energy's Office of Scientific and Technical Information. Excerpt:
Science is all about the flow of knowledge....According to the National Science Foundation, there are over 2.5 million research workers worldwide, with more than 1.2 million in the U.S. alone.1 If we look at all the articles, reports, emails and conversations that pass between them, we could count billions of knowledge transactions every year. This incredible diffusion of knowledge is the very fabric of science. Given that the diffusion of knowledge is central to science, it behooves us to see if we can accelerate it. We note that diffusion takes time. Sometimes it takes a long time. Every diffusion process has a speed. Our thesis is that speeding up diffusion will accelerate the advancement of science....Currently it is difficult for researchers, who primarily track journals within their specific discipline, to hear about discoveries made in distant scientific communities. In fact, diffusion across distant communities can take years. In contrast, within an individual scientific community, internal communication systems are normally quicker. These include journals, conferences, email groups, and other outlets that ease communication. Many communities use related methods and concepts: mathematics, instrumentation, and computer applications. Thus there is significant potential for diffusion ACROSS communities, including very distant communities. We see this as an opportunity....Diffusion to distant communities takes a long time because it often proceeds sequentially, typically spreading from the community of origin (A) to a neighbor (B), then to community (C), a neighbor of B, and so on. This happens because neighboring communities are in fairly close contact. Science will progress faster if this diffusion lag time is diminished. The concept of global discovery is to transform this sequential diffusion process into a parallel process....We are particularly interested in recent work that applies models of disease dynamics to the spread of scientific ideas. The spread of new ideas in science is mathematically similar to the spread of disease, even though one produces positive results, the other negative. Our goal is to foster epidemics of new knowledge....Looking at these models has led us to focus on a parameter called the contact rate. In the disease model, this is the rate at which people come into contact with a person who has the disease. Increasing the contact rate speeds up the spread of the disease....To [increase the contact rate for knowledge] we must reduce a huge gap in how the Internet works today....Analysts estimate that perhaps 99 percent of all the Web-accessible scientific documents are in deep Web databases. Because these documents are not accessible to search engines and robots, this creates a huge gap in knowledge searchability. The problem of accessing all this deep Web science mirrors the problem of diffusion across distant communities. This is because many of the deep Web databases are maintained within specific communities, including specialized journals, scientific societies, university departments, or with individual researchers. Within each community the deep Web document repositories are typically well known. But they are hard for a scientist in a distant community to find. Worse, once found, each repository must be searched sequentially, making widespread search prohibitively difficult....We have begun to close this gap and solve the sequential search problem. Conceptually the solution is simple. It is simultaneous deep Web search with integrated ranking of results. All it takes is virtual aggregation or federation of diverse deep Web databases. The federated databases are searched in parallel, not sequentially. This greatly increases the contact rate across distant communities, speeding up the diffusion of new knowledge. We call this result Global Discovery. It means making each original discovery globally available. Federated deep Web search transforms local discovery into global discovery. While the concept is simple, making it a reality is not. The current challenge of metasearch is that the number of databases that can be searched simultaneously is limited. That's a tough problem to solve, and one that we're working on....When trying to integrate information from diverse sources, it is important to avoid adding burdens to information owners. The history of information management has seen a number of instances where seemingly promising efforts to integrate information have been hampered because too few information owners signed on: Government Information Locator System (GILS), Open Archive Initiative (OAI), Institutional Repositories, and others. While DOE adopted the protocols advanced by these efforts, too often few other information owners did so. Our view is that these efforts stumbled because they placed demands on the information owners who did not enjoy the benefits. In contrast, we believe that those who seek to integrate information from diverse sources need to bear the burdens themselves.

Profile of SPARC

Heather Joseph, The Scholarly Publishing and Academic Resources Coalition: An evolving agenda, C&RL News, February 2006. Heather Joseph is the executive director of SPARC. Excerpt:
SPARC is, first and foremost, a strategic organization, and its agenda and programs have therefore evolved over its lifetime. In this article, I will sketch SPARC’s past accomplishments, outline ways that the organization has evolved, and share a sense of the direction SPARC will move in during the coming year....SPARC was created by the Association of Research Libraries in 1998, to serve as a catalyst for action and to reduce barriers to the access and use of information. As an alliance of more than 200 academic and research libraries, SPARC’s mission is to correct imbalances in the scholarly publishing system that have driven up the cost of scholarly journals and diminished the community’s ability to access information. At the core of our mission is the belief that these imbalances inhibit the advancement of scholarship and are at odds with fundamental needs of scholars and the academic enterprise. Since 2002, SPARC’s highest priority and most visible activity has centered on advancing the goal of open access to scholarly literature, and this will continue to be our main focus....

To achieve its mission, SPARC’s activities center around three program areas: educating stakeholders on issues in scholarly communication, advocating policy changes that support the potential of digital systems to advance scholarly communication, and incubating market-based initiatives that demonstrate business and publishing models that advance changes benefitting scholarship and the academy....SPARC’s campaign promoting awareness and adoption of open access has been, and will continue to be, the most visible of our educational projects. In addition to using traditional print channels, SPARC supports a series of rich Web-based resources articulating the benefits of open access. The “SPARC Open Access Newsletter” and the “Open Access News Blog,” both created and edited by Peter Suber, are vibrant channels that provide information on open access activities worldwide. Updates appear in the blog on a daily basis, and these are supplemented by Suber’s thoughtful and thorough analyses in the monthly newsletter.

SPARC’s advocacy program has risen to the forefront of our activities. Initially the program focused on outreach targeted at stakeholder groups internal to the scholarly communication faculty and editorial boards along with communications and public relations activities. During the past two years, however, it has been greatly expanded to include an extremely active public policy focus. SPARC has gained national and international attention with its active advocacy work for open access. SPARC has been outspoken in support of policies related to public access to federally funded research results, in particular on the recently implemented NIH Public Access Policy.... The focus on public access to federally funded research led SPARC to spearhead the formation of the Alliance for Taxpayer Access, a unique alliance of leading library groups, public interest organizations, and patient advocacy groups....SPARC will actively seek to back initiatives that explicitly recognize that dissemination is an essential, inseparable component of the scientific research process and that address questions of access to data as well as to the primary literature.

Update. Also see Heather's presentation at the UBC Library / SLAIS Colloquium, University of British Columbia Library, SPARC Futures : an evolving agenda.

Wednesday, March 08, 2006

How well do search engines index the OA repositories?

Frank McCown and three co-authors, Search Engine Coverage of the OAI-PMH Corpus, IEEE Internet Computing, March/April 2006.
Abstract: The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.

U of Tennessee libraries launch all-OA academic press

Newfound Press is a new digital imprint from the University of Tennessee University Libraries. All its publications will be OA. From the site:
Today’s scholarly publishing environment presents a strategic opportunity for academic libraries to expand their role in the publications process. Universities are both creators and consumers in the information economy. A digital library press offers the potential for making scholarly and specialized resources widely available at a reasonable cost. The University of Tennessee Libraries is developing a framework to make scholarly and specialized works available worldwide. Newfound Press, the University Libraries digital imprint, advances the community of learning by experimenting with effective and open systems of scholarly communication. Drawing on the resources that the university has invested in digital library development, Newfound Press collaborates with authors and researchers to bring new forms of publication to an expanding scholarly universe. We consider manuscripts in all disciplines, encompassing scientific research, humanistic scholarship, and artistic creation.

It will publish OA journals as well as OA books and OA multimedia scholarship. It only asks for non-exclusive rights from authors and offers CC licenses as an option. It works in partnership with the University of Tennessee institutional repository. And it links to Open Access News from the front page --from the phrase, Open Access. What's in it for us?

PS: Kudos to the UT librarians! I wish this enterprise well and hope other universities consider testing the same waters.

Update. Also see Scott Teague's article about the launch in the Daily Beacon Online.

More on Sun's open-source education initiative

Darryl K. Taft, Sun to Open Source Education, ExtremeNano, March 7, 2006. Excerpt:
Sun Microsystems is taking a cue from its successes with open source to help shape the future of education and bridge the digital divide, according to the company's chief executive, Scott McNealy. In a speech at Sun's WWERC (Worldwide Education and Research Conference) here [in NY] on March 7, McNealy said Sun has spun out its GELC (Global Education and Learning Community) effort into a nonprofit organization aimed at aimed at delivering self-paced, Web-based, free and open content --including curriculum, resources and assessment-- for the K-12 segment. Or, as McNealy put it, GELC is "open-sourcing education." McNealy said, "[The] opportunity here is to apply all the community development to textbooks, curriculum and assessment for K-12. So with the help of some folks at Sun we created the GELC, with 2,700 members worldwide and 370-plus projects."

From a Sun press release (March 7, 2006):

Sun broke new ground in free and open-source computing in the creation of this non profit which aims to meet the needs of students by sharing best practices globally. The group named an executive director at the conference, Dr. Barbara "Bobbi" Kurshan, formerly President of Educorp Consultants Corporation, and co-CEO of Core Learning Group, Private Equity Fund. The director will lead an advisory board with representatives from nearly every continent to extend the vision for this group. The GELC Executive Director directs all activities of the GELC, including managing the various working groups, monitoring technical developments, overseeing the education community process, managing the creation of GELC specifications and representing the GELC to external organizations.

Wiley on the RCUK policy and Commons debate

In the press release accompanying its third-quarter revenue report, John Wiley & Sons made a point of saying:
In December, the U.K. Parliament conducted a debate on the Science and Technology Select Committee's report on scientific publications, and reiterated its position that the government should not intervene in the market nor fund institutional repositories.

Comment. Wiley must think this news is relevant to the value of its stock. But if so, then it should be careful about drawing attention to it and then misreporting it. For in fact, the U.K. Parliament did not oppose the funding of institutional repositories. Some members did and some members didn't; there was no vote or other resolution. See the transcript of the December debate. Moreover, funding institutional repositories is less important than mandating that publicly-funded researchers deposit their peer-reviewed manuscripts in them. That is still the policy proposed by the RCUK.

More on the Digital Universe

Web site aims to be research 'storehouse', eSchool News, March 7, 2006. An unsigned news story. Excerpt:
A new internet research tool called Digital Universe aspires to be a more authoritative version of Wikipedia. If successful, it could provide scholars and students with one more option for finding accurate, reliable information online. Skeptics, however, predict that Digital Universe is too ambitious for long-term success.....It's a lofty ambition --the internet equivalent of the Public Broadcasting Service, its founders say, a user-supported resource that pays top academics to create authoritative maps, articles, and links to third-party content related to virtually any scholarly topic. But the vast scope of the project hasn't stopped former high-flying Silicon Valley entrepreneur Joe Firmage from building Digital Universe, a commercial-free internet research clearinghouse four years in the making....A pilot version that debuted in January includes 50 or so portals, or entry points, on topics such as technology, the Earth, and the solar system. Firmage says it will mushroom to at least 500 portals by next year and 10,000 by 2011. Clicking on the Earth portal, for example, presents the visitor with links, reportedly vetted by experts for accuracy, to related articles, images, lists of frequently asked questions, and other resources from sites such as, NASA, and the University of Hawaii's department of geology and geophysics. The Earth portal is also a jumping-off point to sub-portals on topics such as the atmosphere and hydrosphere, which in turn provide links to vetted content and further sub-portals. The approach is designed to give visitors a graphical means to find topics and understand how they are related to subjects in another category....Firmage and his backers say Digital Universe's biggest asset is the trust readers will feel knowing that every link, graphic, and article has been vetted by an army of academics....The site has been under construction since 2002 by Scotts Valley, Calif.-based ManyOne Networks, a 56-employee company that has received about $10 million in financing from Firmage and angel investors. ManyOne Networks has been recruiting professors to become "stewards" of each portal and building offerings such as eMail services to generate revenue. Digital Universe seeks to improve on the ground broken by Wikipedia, the online encyclopedia that allows anyone to contribute and edit articles. Wikipedia's volunteer model offers an impressive body of content, boasting 1 million articles in English on everything from art deco to nuclear physics. But Wikipedia's open system also has led to the publication of fraudulent articles, and authors sometimes have undisclosed conflicts of interest, critics have charged. Instead of relying on anonymous volunteers, Digital Universe will pay experts, mostly academics, to write encyclopedia articles and to round up outside video, audio, online chats, and other resources. Firmage has pledged that access to basic content on Digital Universe will remain free forever and that it will never include ads. To fund the venture, the site will sell monthly subscriptions that let visitors get additional content and features, many of them offered by for-profit third parties, such as film producers, game makers, map providers, and book publishers. "Imagine how many people would be interested in subscribing for $7.95 per month to get all those additional activities," Firmage said. He predicts the site will have at least 10 million paying subscribers within seven years. (At the end of February, it was reported, Digital Universe had more than 10,000 subscribers.)...Academics and others contributing content will get 25 percent of the proceeds, but the money isn't the only motivation for participating, said Peter Saundry, a physicist with the nonpartisan National Council for Science and the Environment. He heads the group responsible for Digital Universe's environmental portal. "At every scientific meeting you ever go to on any subject, one thing you hear is the general public doesn't understand what we're doing," Saundry said. "This now is a tool for the scientific community to [help inform the public]."

An OA guide to performing abortions

Alex Steffen, Open Access and Reproductive Rights, WorldChanging, March 7, 2006. Excerpt:
Yesterday, South Dakota banned abortion. Normally, we'd steer clear of a hot-button topic like abortion, but this law has also triggered a small firestorm around the blog of a woman named Molly, who last week put issues of open access to scientific knowledge in sharp relief by publishing a guide to setting up a cheap, safe, mobile abortion clinic for use in places where abortion has been criminalized....Like other principles of free expression, open access to scientific knowledge often seems absurdly removed from our lives. Molly shows just how tangible such knowledge can be. Science is, above all else, a moral commitment to openly and freely discussing the actual functioning of the universe (and of our bodies). People went to the stake to make science a going concern. Not all that long ago, the information Molly is sharing would have made her a criminal in many countries, just as sharing information on contraception, or evolution, or the fact that the Earth moves around the Sun all once made scientists criminals. What knowledge now being acquired will politicians take it on themselves to criminalize in the future?

New way to browse and search arXiv

Xstructure is a new way to browse and search arXiv. (Thanks to Richard Akerman.) From the site:
Among the features of this service are: [1] Automated generation of hierarchical classification scheme for the papers. The scheme results from classification of the papers from the arxiv database. The EqRank algorithm did the classification. The only input for the classification is the citation graph. The number of the levels in the hierarchy and the number of the clusters is determined by the algorithm. Generally, there is no external parameters (e.g., a preset list of clusters) in the algorithm. The algorithm creates the classification scheme, and indexes the papers by the created classification; [2] The classification is used to index the new papers. We plan to rebuild the classification scheme regularly. In this way, we will take into account that appearance of new papers may lead to emergence of new themes. Detection of new themes is one of our objectives; [3] A number of extra attributes (e.g. Theme name, Authority and Reference Articles, etc.) for the elements (themes) of the classification (see Help); [4] Accessability of the classification in response to search requests via display options, e.g., display as Tree of Themes, and Refrerence (Citation) Tree. At the moment, the service is available only for the hep-th sector of Hopefully, it will be extended to cover a number of other sectors.

USACM weighs in on DRM and fair use

The US Association for Computational Mechanics (USACM) has released its Policy Recommendations on Digital Rights Management (February 2006). (Thanks to Ed Felten via Ray Corrigan.) Excerpt:
Copyright Balance: Because lawful use (including fair use) of copyrighted works is in the public’s best interest, a person wishing to make lawful use of copyrighted material should not be prevented from doing so. As such, DRM systems should be mechanisms for reinforcing existing legal constraints on behavior (arising from copyright law or by reasonable contract), not as mechanisms for creating new legal constraints. Appropriate technical and/or legal safeguards should be in place to preserve lawful uses in cases where DRM systems cannot distinguish lawful uses from infringing uses....

Research and Public Discourse: DRM systems and policies should not interfere with legitimate research, with discourse about research results, or with other matters of public concern. Laws and regulations concerning DRM should contain explicit provisions to protect this principle.

Targeted Policies: Public policies meant to reinforce copyright should be limited to applications where copyright interests are actually at stake. Laws and regulations concerning DRM should have limited scope, applying only where there is a realistic risk of copyright infringement.

Comment. Hear hear. If copyright law is really law, and not just a plea to content companies, then fair use should supersede DRM, not vice versa.

OA and grey literature

Marcus A. Banks, Towards a Continuum of Scholarship: The Eventual Collapse of the Distinction Between Grey and non-Grey Literature. In Dominic Farace (ed.), Proceedings GL7: Seventh International Conference on Grey Literature, Nancy, France, 2005.
Abstract: This paper argues that the distinction between grey and non-grey (or white) literature will become less relevant over time, as online discovery options proliferate. In the meantime, the political success of the open access publishing movement has valuable lessons for proponents of increasing access to grey literature.

Italian health institute signs the Berlin Declaration

Italy's Istituto Superiore di Sanità (National Institute of Health) has signed the Berlin Declaration on Open Acces to Knowledge.

Comment. The ISS is a public agency, and this signature could be a sign that it will look for ways to assure OA for publicly-funded medical research.

Tuesday, March 07, 2006

More on EThOS

Jill Russell, EThOS: progress towards an electronic thesis service for the UK, Serials, March 2006.
Abstract: The EThOS (Electronic Theses Online Service) project is building on previous e-thesis initiatives, and co-ordinating the work of some of the key players in the UK to develop a service for finding, accessing and archiving digital copies of doctoral theses produced in UK higher education institutions. Key issues for the project are the development of a sound financial basis for a successful service, the provision of advice needed by authors and university staff on handling intellectual property rights, and the protection of legitimate needs for confidentiality. EThOS will also establish workable and standards-based procedures for populating e-thesis repositories with current and retrospectively acquired digital versions of theses and associated metadata. These developments must also fit with universities' own internal administrative arrangements and regulations. The project aims to deliver an e-thesis infrastructure that is both technically and financially sustainable, together with a full supporting toolkit of guidance, standards and procedures.

OA to publicly-funded research in Germany

The Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) has adopted Open Access Guidelines that encourage grantees to provide OA to DFG-funded research. (Thanks to Ingegerd Rabow.) From the press release (January 30, 2006):
In 2003 the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) signed the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. The DFG supports the culture of open access. Unhindered access to publications increases the distribution of scientific knowledge, thereby enhancing the authors' visibility and contributing to their reputations.

The DFG has now tied open access into its funding policy. During their meetings in January 2006, the DFG’s Senate and Joint Committee recommended encouraging funded scientists to also digitally publish their results and make them available via open access. In order to put secondary publications (i.e. self-archived publications by which the authors provide their scientific work on the internet for free following conventional publication) on the proper legal footing, scientists involved in DFG-funded projects are also requested to reserve the exploitation rights. Recommendations are currently being integrated into the usage guidelines, which form an integral part of every approval. They are worded as follows:

"The DFG expects the research results funded by it to be published and to be made available, where possible, digitally and on the internet via open access. To achieve this, the contributions involved should either be published in discipline-specific or institutional electronic archives (repositories), or directly in referenced or recognised open access journals, in addition to conventional publishing. When entering into publishing contracts scientists participating in DFG-funded projects should, as far as possible, permanently reserve a non-exclusive right of exploitation for electronic publication of their research results for the purpose of open access. Here, discipline-specific delay periods of generally 6-12 months can be agreed upon, before which publication of previously published research results in discipline-specific or institutional electronic archives may be prohibited. Please ensure that a note indicating support of the project by the DFG is included in the publication."

The revised usage guidelines are expected to be available in April 2006.

Comment. It's a breakthrough for the DFG to incorporate its commitment to OA into its funding policy. On the other hand, it's ironic that it has proposed a policy like the NIH's, which merely encourages grantees to make their work OA, at a time when the NIH is documenting that mere encouragement does not work. I hope the DFG will read the NIH's January 2006 report to Congress, in which it shows that only 3.8% of its grantees have complied with its request or encouragement in the first eight months under the policy. The most effective way to assure OA to the results of DFG-funded research is to mandate it. Two good examples are the draft RCUK policy, not yet adopted, and the Wellcome Trust policy, in effect since October 2005.

More on the pricing crisis

Lindsey Franco, Smathers Libraries may cancel journal subscriptions, Alligator Online, March 6, 2006. (Thanks to Gavin Baker.) Excerpt:
While students and faculty await a reconstructed, renovated, Starbucks-filled Library West, they may get a surprise they're not quite waiting for - a cutback of $750,000 worth of academic journals and databases. Smathers Libraries, the [University of Florida] library system, plans to discontinue some academic journals and database subscriptions this summer to make up for the expected $750,000 deficit in the library's 2007 budget. The anticipated shortfall would occur because publishers plan to charge the library higher subscription costs to cover rising inflation, said John Ingram, the associate director of collections at the library....He said the library's flat budget will not be able to cover the increase, which will lead to a deficit of about $750,000. The library has until the end of July to cancel journal subscriptions, he said....He said the library plans to cut both print journals and online journals and databases, and they will be split evenly. "We are making the cuts across the board - being very evenhanded about it," Landor said....The library informed faculty of the expected changes Thursday with a flier in their mailboxes titled "Smathers Libraries Cancellation List Project," which outlined the plan for the next fiscal year. The document is available on the libraries' Web site [here].

Update. See Gavin Baker's 3/7/06 letter to the editor. Excerpt:

Monday's article on possible cuts to library collections describes a danger that many universities across the country face: the rising costs of academic journals. As president of Florida Free Culture, I can point out the role an inefficient copyright system plays in creating these artificially inflated costs. As a Student Senator-elect, I can say to my future Senate colleagues: It's time for students to make their voices heard on this important issue. There is no good reason to justify the costs of subscriptions to academic journals and repositories. The cause is copyright overprotection by universities and researchers, and the solution is open licensing. The research published in academic journals is overwhelmingly paid for by universities, grants and awards. Researchers are not usually paid for their journal publications; rather, faculty must "publish or perish," where publication in the most prestigious journals results in professional accolades and promotions. The costs of academic publishing are all in the editing, review of submissions and distribution. With online journals, distribution costs drop almost to zero. So why did Smathers Libraries spend $3.4 million on e-journal subscriptions last year? The reason is the outdated model of the academic publishing industry. The copyrights inherent in the journal articles are used as a barrier, and only those who pay the toll are allowed access to the journals' contents. This toll pays the costs of review and editing - and provides a nice bonus for the publishers. However, an alternative exists: open-access publishing. The open-access model pays for editorial costs through institutional subsidies by a journal's sponsoring university or professional society, or by charging the researcher's sponsor a fee to process the submission. Then, an open copyright license is used to ensure free, open access to the research online, via nonprofit repositories or self-archiving by the researcher, university or research sponsor. Alternatively, the journals themselves are left intact, but the cost of subscriptions to repositories is reduced. Authors still give journals an exclusive license to their work, for access to which journals charge for subscription, but the exclusive license lasts a limited time, such as six months. Six months after the original publication, the author deposits a copy of their article in an open access repository, such as the National Institutes of Health's PubMed Central. Open access not only cuts unnecessary overhead costs, but also ensures that everyone worldwide has access to the results of that research, even if their university can't afford journal subscriptions. In the case of medical research, this can literally mean the difference between life and death; doctors in poor countries may be unable to afford access to the latest findings. UF should investigate how it can promote open access to cut costs for our libraries and fulfill our commitment to the global community. I call for the creation of an ad hoc Joint Committee on Open Access, comprising members of the Student Senate, Faculty Senate and UF administration. Open access is a hot issue in academia. If UF wants to be a Top 10 public research university, here is an opportunity for us to lead.

Journal of Cardiothoracic Surgery -- new OA journal

Journal of Cardiothoracic Surgery is the 86th independent, Open Access journal hosted by BioMed Central. The title does a remarkably effective job of communicating the focus of the journal.

Journal of Cardiothoracic Surgery - Fulltext v1+ (2006+); ISSN: 1749-8090.

An OA journal starts charging processing fees

Brian G. Forde and Michael R. Roberts, Plant Methods moves to fund open-access publishing, Plant Methods, March 3, 2006. An editorial. Abstract (provisional):
As an Open Access journal dedicated to promoting technological innovation in plant biology, Plant Methods occupies a unique niche amongst plant journals. To fund its open access policy, and to enable it to continue to serve the plant sciences community, Plant Methods will be introducing an article processing charge (APC) from March 1st 2006. This charge will cover the costs of making the article freely and universally accessible online and the costs involved in its inclusion in PubMed and its archiving in public repositories. In some circumstances, waivers of the APC may be granted and authors whose institutions are BioMed Central members will incur no, or reduced, charges.

(PS: PM is an OA journal from BioMed Central making the transition from an APC-free business model to an APC model. If this is surprising, remember that not all BMC journals charge APCs. Some are subsidized by the organizations standing behind them and some take six months or so after launch before introducing APCs. Currently, 23 of BMC's OA journals do not charge APC's. Among OA journals overall, fewer than half charge APCs.)

New alert service for, the portal for OA science produced and hosted by the US federal government, has a sophisticated new alert service.

More OA books coming

Anthony Pesce, Libraries getting the digital treatment, Daily Bruin, March 7, 2006. Excerpt:
When Laura Willeford went to Powell Library to research a paper last week, she left feeling frustrated. Willeford, a first-year undeclared student, said she had trouble finding useful sources for her art history paper....With groups like the University of California, the Open Content Alliance and Google continually digitizing new books, soon students like Willeford will be able to complete most of their research online at any time of the day. The goal of these programs is to create an online searchable database of non-copyrighted books that students can read from the comfort of their dorm rooms....The California Digital Library will be able to digitize between 18,000 and 80,000 books this year, and those numbers will only grow in the future, [Daniel] Greenstein said.

OA developments in Germany

Hendrik Bunke, Open Access allerorten,, February 21, 2006. Pointing out a few recent OA developments in Germany, including the OA and a DINI Certificate for Bremen's E-LIS server.

Launch of OSGEO for open source and open data in geoscience

Geospatial research groups from around the world have launched the Open Source Geospatial Foundation (OSGEO). From today's press release:
The open source geospatial community today announced the formation of the Open Source Geospatial Foundation, a not-for-profit organization whose mission is to support and promote the collaborative development of open geospatial technologies and data. The foundation was formed in February to provide financial, organizational and legal support to the broader open source geospatial community. It will also serve as an independent legal entity to which community members can contribute code, funding and other resources, secure in the knowledge that their contributions will be maintained for public benefit....The foundation will not require that OSGEO software projects to be licensed under any one particular open source license, but will require that all OSGEO software be released under an open source license approved by the Open Source Initiative (OSI). The long term goal is to encourage licenses that allow the different foundation projects to work better together and permit for code exchange among them. The foundation will implement contribution and intellectual property policies designed to avoid the inclusion of proprietary or patented code in OSGEO projects. Foundation projects are focused on interoperability - both with one another at the library level, and with other proprietary and open source projects through the use of open standards. The foundation will also be pursuing goals beyond software development, such as promoting more open access to government produced spatial data, which is a major problem outside of North America.

Survey of OA at Thessaloniki

Valentina Comba, Open Access: a new opportunity for scholarly communication. In Proceedings of the Workshop on Open Access, Thessaloniki (Greece), 2005. Self-archived, March 6, 2006. A slide presentation.

More on how OA will affect libraries

Michaell J. Giarlo, The Impact of Open Access on Academic Libraries, an undated preprint. Excerpt:
[A]ll flavors and forms of open access impact the roles filled by academic libraries, but it is worth noting that these may vary. For instance, while the green model of open access will undoubtedly benefit scholars by globally providing scholarly material at no cost, with no access restrictions, other benefits such as budget relief may not be realized (Crawford, 2005b). In fact, it may strain budgets that are already being stretched by commercial journals. The scope of this paper is limited to academic libraries....It is not the intention of the author to paint a simple, rosy picture of the issues surrounding open access, nor to advocate a radical, wholesale shift thereto. Rather, it is suggested only that the issues surrounding open access be brought out into the open and discussed. While there are reasons academic libraries might be cautious about modifying the ways they support scholarly communication, there are myriad reasons to consider how they might best serve their communities with open access....[T]his is no longer a subject to be read about and debated; open access has arrived and is being rapidly adopted....There are numerous ways in which open access might impact an academic library, broken into the following categories in this paper: economic, technological, collection development & management, and the very roles that academic libraries play. Each of these impacts will be discussed in turn. There are impacts other than those examined in this paper, such as those concerning reference services, information literacy, and peer evaluaton, but research in these areas was light at the time references were gathered....Academic libraries are positioned to be at the forefront of the open access revolution, but it is altogether possible that they will allow themselves to be left behind. They stand to gain much by investigating potential new roles they might play in the transforming landscape of scholarly communication, but first they must consider the many ways in which they may be affected by open access, weighing significant costs against significant benefits and always with their communities' best interests in mind.

Encouraging OA archiving at Minho

I posted this important item to SOAF last week and then forgot to blog it.

The University of Minho was the first university anywhere to mandate OA to its research output. To achieve compliance, Minho has (among other things) adopted a system of financial incentives. Minho's Eloy Rodrigues describes how they work:

Following the adoption of the Minho university policy on open access, according to the second paragraph of that policy, in 2005 the Rector established a financial supplement for departments and research centers, as a reward for their implementation of the policy, and established criteria for awarding the financial supplement. The financial supplement was devised as a way to reinforce the self-archiving mandate. The reward was distributed through the research centers/departments, and not directly to the individual researchers.

To stimulate the early adoption of the self-archiving practice, the reward was distributed according to the number of documents archived in three phases: 42 % of the reward according with the number of self-archived documents till April 2005, 33% according with the number of documents archived between May and August, 25% according with the number of documents archived from September to December. In each of the phases, the total amount that each research center/department received was calculated as a function of:

  1. Type of documents self-archived (peer-reviewed journal articles = 1; peer-reviewed/accepted conference papers = 0,5; other documents = 0,1);
  2. Date of publication (2004 and 2005 = 1; previous to 2004 = 0,3);
  3. Research Center/Department self-archiving policy. Departments that adopted a self-archiving policy based in the model of university policy (basically, self-archive and make available OA whenever is possible, restrict access or add only metadata if needed, and consider the IR as the official registry of the research output, from where all the lists should be extracted) = 1; Departments without that policy = 0,3.

The results of this policy was that, from January 1 to December 31 2005, 2.813 documents were deposited in our IR: nearly 41% and 40% were, respectively, journal articles and conference papers, and more than 19% were other type of documents (book chapters, books, working papers, etc.).

Comment. Currently, five universities or departments worldwide mandate OA to their research output. All have good compliance records, but none achieves compliance by cracking the whip. They use a wide range of kinder and gentler methods, among which the financial incentives at Minho are apparently unique. What I like about them is that they are directed to departments and research centers, not to individual faculty. They're not direct incentives to deposit eprints in the Minho OA repository; they're incentives for departments to create their own incentives or to facilitate deposits through education and assistance.

The Basement Interviews

Richard Poynder has interviewed eleven leaders of related openness movements --open access, open source, open spectrum, etc.-- and in a new blog posting describes his frustrating attempt to publish the interviews as a book. He's decided to serialize them on his blog instead, starting in the next few days. Something to look forward to.

Risk analysis for launching an OA repository

Arthur Sale, Generic Risk Analysis - Open Access for your institution, Technical Report, School of Computing, University of Tasmania, March 7, 2006. For universities considering the an institutional repository and a strong policy to fill it, a succinct list of the risks, an evaluation of their probability and severity, analysis of the issues they raise, and advice for minimizing them.
Abstract: This is a generic risk analysis for any institution (university, research organization, etc) contemplating the installation of an Open Access Repository. It covers the major risks identified by experienced repository operators. The key risks are 2.1b, 2.3 and 2.8, and actions are recommended to reduce these risks to 'low' (scoring 6 or below out of a possible 25). If these actions are taken, establishing an Open Access Repository is truly a low risk operation. This analysis does not specifically address benefits of an Open Access Repository, which can be found elsewhere. It is assumed that your institution has made an in-principle decision, at least. Thanks to all the members of the international community of Open Access who contributed to this document.

Comment. Very useful, especially for administrators who expect or demand this kind of risk analysis. For other kinds of administrators, the BOAI Self-Archiving FAQ answers more objections and gives more detail. Using the two documents together should be very effective.

More on how disciplinary differences affect OA

Valentina Comba, and Marialaura Vignocchi, New Trends in Scholarly Communication : how do Authors of different research communities consider OA? In Proceedings Satellite Meeting n.17: Open access: the option for the future!?, Oslo, 2005. Self-archived on March 6, 2006.
Abstract: At the time of the Budapest Declaration, self-archiving supporters looked like a revolutionary, "anti-commercial publishers" movement. Today, after some years debate (and technological innovation in research and scientific e-publishing), antagonist positions are able to compromise and consider the tradeoffs. What is really changing in the Authors' attitude towards institutional or disciplinary repositories, and peer reviewed open access journals? Many recent papers have investigated these topics. From these sources we can note that Biomedical Authors behave differently from Physicists, Astronomers and Mathematicians, who have been using open archives for such a long time. Therefore we intend to analyze these different trends in the diverse communities. Several aspects also deserve a careful attention: the role of new OA journals in evaluation processes (i.e. their impact and citations), implementation and maintenance costs of institutional repositories, the evolution of bibliometric indicators. We intend also to discuss the role of libraries in service innovation and e-publishing promotion. The main areas where a key role may be played are: institutional repository management and users' training, the promotion of OA journals and information about evaluation methods (both qualitative and quantitative). We think that the transition towards new communication models may be a great opportunity that libraries have to be ready to support.

Monday, March 06, 2006

New Eprints wiki

Eprints has launched a wiki. See today's announcement from Eprints developer Christopher Gutteridge:
I'm hoping that with the improved wiki software people will find it eaiser to add their own tips and guides etc. Maybe there should be a "requests for information" node which you can add to, or, if you know an answer, create a new page from. As ever the tech list should remain the place for discussion, but if you happen to have an answer or tip that you feel should be preserved, put it on the wiki instead, and just reply to the list with the URL. The reason for this is that a wiki is a much better store of actual knowledge than a mailing list archive. Please *don't* move over the patches and scripts - I'm planning to set up a separate site for those.

UQaM to sign the Berlin Declaration

The Université du Québec à Montréal will soon sign the Berlin Declaration on Open Acces to Knowledge. The decision was made on December 20, 2005, and the signing ceremony will take place on March 20, 2006. UQaM is on schedule to be the first university in North America to sign the Berlin Declaration.

More on the ODF Alliance

Dan Carnevale, New Consortium Will Press Agencies to Adopt Open Standard for Saving Digital Documents, Chronicle of Higher Education, March 6, 2006 (accessible only to subscribers). Excerpt:
Over three dozen businesses, universities, and other organizations have formed a consortium to persuade government agencies to adopt an open-standards format for storing digital documents. Currently, almost everybody in the public sector stores documents using Microsoft Word. But members of the new consortium, called the OpenDocument Format Alliance, are afraid that, years from now, such proprietary software won't be supported by anyone, and that people will have difficulty gaining access to old government records. Open standards are those that can be used free by anyone building hardware or software. Any application based on a given standard, for instance, can open documents stored by any other application using the same standard. This gives software developers an incentive to keep making programs compatible with the standard as they evolve over time. Kenneth Wasch is president of the Software & Information Industry Association, the lead organization for the new alliance. He said that using the OpenDocument Format would better ensure that documents, spreadsheets, and other digital material would remain easily accessible, because the software based on the format is not dependent on one company's support. "The risk is much smaller because it's supported by a whole range" of companies, Mr. Wasch said. "The industry is embracing openness as never before." The alliance is an international organization with about 40 members, including IBM, Sun Microsystems, and the American Library Association. Universities involved include the Indian Institute of Technology and the Technical University of Denmark. Since the announcement was made early Friday, more organizations have expressed interest in joining, Mr. Wasch said....Patrice McDermott, deputy director of government relations for the American Library Association, said government agencies are spending millions of dollars now to convert old documents to make them independent of any software platform. Adopting the OpenDocument Format standard would save government money in the long run, she said. "We think for access over time to government information, it's critical," Ms. McDermott said. "It's very dangerous for government to be using proprietary software."

Anonymizing data files for OA

Bruce A. Beckwith and three co-authors, Development and evaluation of an open source software tool for deidentification of pathology reports, BMC Medical Informatics and Decision Making, March 6, 2006. Abstract (provisional):
Background. Electronic medical records, including pathology reports, are often used for research purposes. Currently, there are few programs freely available to remove identifiers while leaving the remainder of the pathology report text intact. Our goal was to produce an open source, Health Insurance Portability and Accountability Act (HIPAA) compliant, deidentification tool tailored for pathology reports. We designed a three-step process for removing potential identifiers. The first step is to look for identifiers known to be associated with the patient, such as name, medical record number, pathology accession number, etc. Next, a series of pattern matches look for predictable patterns likely to represent identifying data; such as dates, accession numbers and addresses as well as patient, institution and physician names. Finally, individual words are compared with a database of proper names and geographic locations. Pathology reports from three institutions were used to design and test the algorithms. The software was improved iteratively on training sets until it exhibited good performance. 1800 new pathology reports were then processed. Each report was reviewed manually before and after deidentification to catalog all identifiers and note those that were not removed.

Results. 1254 (69.7 %) of 1800 pathology reports contained identifiers in the body of the report. 3439 (98.3%) of 3499 unique identifiers in the test set were removed. Only 19 HIPAA-specified identifiers (mainly consult accession numbers and misspelled names) were missed. Of 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contain numerous identifiers and were the most challenging to deidentify comprehensively. There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with our tool.

Conclusions. We have demonstrated that it is possible to create an open-source deidentification program which performs well on free-text pathology reports.

Comment. What's the OA connection? If researchers in medicine and the social sciences can anonymize their data files, then they can deposit them in OA repositories without violating the privacy of patients or research subjects. An open-source tool that does most of the work, and only needs fine-tuning for specific data formats, should remove the privacy roadblock barring OA to mountains of useful and reusable research data.

Sunday, March 05, 2006

Self-archiving by French mathematicians

Anna Wojciechowska, Analyse d'usage des archives ouvertes dans le domaine des mathématiques et l'informatique, a preprint self-archived February 23, 2006. In French, but Anna Wojciechowska wrote an abstract in English for this blog posting:
Analysis of use of open archives in mathematics and computer science. This study analyzes the self-archiving activities of one part of mathematical and computer science community in France. The questionnaire was sent to members of this community by several libraries of the National Group of Libraries in Mathematics with the aim of determining the use of institutional open archives in France, in particular Hal (French ArXiv : all recent documents are transferred to ArXiv automatically).

The objective of this analysis is: [1] to understand the position of the researchers in relation to open archives, [2] to examine the modalities of depositing files in institutional open archives, [3] to detect and evaluate the practices used by the researchers.

Conclusion: [1] Almost half of participants of the inquiry say that they know the term "open archives". [2] The majority of researchers find the articles they need (or their references) in libraries, but on-line journals with access to the full text are consulted often. [3] The researchers use ArXiv to find electronic preprints and Google to find the full text of articles. [4] The sources of on-line articles with free access are still not very well known, as for example in the case of open free journals. [5] More than 80% of respondents find the articles which they need without any difficulty. Almost 60% ask the librarian's help only sometimes and 35% does not need any help. [6] The articles on line in full text the most consulted (at least once per week) were published during the last ten years. [7] Almost 50% of researchers publish 2-3 articles per year. [8] The majority put one copy of their articles on personal websites and 28% have done it for at least 5 years. [9] The researchers always deposit many more articles on their personal webpages (63%) than in Hal (12%) or ArXiv (16%). [10] Those who have already deposited publications in Hal or ArXiv find them easy to use and say they needed less than 30 minutes for the first deposit and less than 15 minutes for subsequent deposits. [11] The majority does not read the contracts signed with the editors, they do not know of the possibility of negotiating the contracts. [12] The difficulties of the development of open archives are not technical, but social. [13] The utility of institutional open archives is not well understood yet. [14] It seems essential to diffuse in a broader way the legal aspects of the scientific publications and to sensitize the researchers to the checking of contracts signed with the editors.

New OA journal on risk analysis

Project Risk and Decision Analysis is a new peer-reviewed, OA journal from the Intaver Institute. From yesterday's press release:
Intaver Institute, developers of the project risk management software RiskyProject and inventors of the project risk analysis process known as Event Chain Methodology, today announced the launch of “Project Risk and Decision Analysis” new online open access journal. "Our new online journal is a unique publication. It focuses on risk and decision analysis processes and methodology, including on quantitative methods and psychology of judgment and decision making in project management” said Ken McKinley, business development manager of Intaver Institute Inc. "We currently have posted eight new white papers and articles related to project risk management methods and tools."

Comment. All the articles published by this journal are unsigned white papers from the Intaver Institute, and peer-review is apparently done in-house by Intaver employees. This doesn't reflect on the quality of the research, which might be impeccable. But it's probably more accurate to say that the intitute is providing OA to its research than to say that it's running a peer-reviewed journal in the usual sense.

QUT launches OA law project

Queensland University of Technology (QUT), the first university anywhere to adopt an OA mandate, has launched an Open Access to Knowledge (OAK) Law project. From the site:
In today’s ever-changing world, open access to knowledge is increasingly important, as both an economic and social force. This project aims to ensure that people can legally and efficiently share knowledge across domains and across the world. This will be of significance to the every day citizen through to top-end researchers. The project will develop legal protocols for managing copyright issues in an open access environment and investigate provision and implementation of a rights expression language for implementing such protocols at a technical level. At both levels, legal and technical, the project will integrate with existing open access repositories. The significance of the project is that it will provide a vital infrastructure to the open access landscape that does not adequately exist at the moment.

Quoting Tom Cochran, QUT's Deputy Vice-Chancellor:

It is often observed that the law and law making lags behind technology. Nowhere is this more apparent than in the maze of rights management and rights assertion surrounding access to content which is increasingly in digital form, often exclusively. Queensland University of Technology is pleased to be the supporting institution for this project, to systematically develop more assured processes, understandings and protocols to assist the smooth flow of knowledge and information for a variety of communities in future years. In doing so it particularly recognizes the public policy imperative of freeing up access to publicly funded research and its outputs.

Also see the press release announcing the project (February 21, 2006):

A new "open access to knowledge" project hosted by the Queensland University of Technology aims to ensure that anyone can legally share knowledge across the world, whether they be an every day citizen or a top end researcher. The QUT team, led by School of Law head, Professor Brian Fitzgerald is embarking on a $1.3 million, two year project to develop legal protocols for managing copyright issues in an open access environment. The Open Access to Knowledge (OAK) Law Project, when complete, would provide legal protocols that would free up the national and international research environment and remove barriers to reusing and remixing information, Professor Fitzgerald said...."If researchers know they can safeguard their work with OAK Law protocols they will be more comfortable with making it available online and thus increase the stock of knowledge available to everyone," Prof Fitzgerald said. "The OAK Law protocols will benefit everyone from school students to Nobel Prize winners who can go online, do a Google search, find relevant research and use it without fear of being sued for copyright infringements. The project will work with cutting edge research repositories in marine science and medical research to ensure that all Australians have the right to access and, where permitted, reuse high quality research data in their daily lives."

New OA journal on wildlife biology

Wildlife Biology in Practice is a new "open access scientific, peer-reviewed, multi-disciplinary journal devoted to the rapid dissemination of wildlife research. It publishes research papers, review papers, discussion papers, [and] methodological papers." It's published by the Portuguese Wildlife Society, Minho University, and Portugal's Fundação para a Ciência e Tecnologia. (Thanks to Marcus Zillman.)