Open Access News

News from the open access movement

Saturday, October 08, 2005

Commercial v. non-commercial search engines

Bettina Fabos, The Commercial Search Engine Industry and Alternatives to the Oligopoly, MOKK (Media Research Center at the Department of Sociology and Communications of the Budapest University of Technology and Economics), n.d.

Abstract: This essay details the search engine industry’s transformation into an advertising oligopoly. It discusses how librarians, educators, archivists, activists, and citizens, many of whom are the guardians of indispensable noncommercial websites and portals, can band together against a sea of advertising interests and powerful and increasingly overwhelming online marketing strategies.

From the body of the paper:

Google, Yahoo and MSN...are skewing the nature of all online information in favor of commercial enterprise, and will have enormous impact and power over the direction of information access and, indeed, democratic discourse, in the years to come....Despite the considerable implications of search engine commercialization for knowledge access, the topic has not gained much attention in academic and library spheres....If we want to go beyond a mainstream, commercialized, sponsored online information repository we need to turn to a different structure that offers a more inclusive, democratic information environment. As it turns out, there is hope (although it comes with acronyms that are a lot harder to remember than catchy commercial search engine names like Yahoo! and Google). Numerous computer scientists and digital librarians have been developing open source technology, such as the Open Access Initiative for Metadata Harvesting Protocol (OAI-PMH), iVia, and Data Fountains, that offer (and enhance) a user’s ability to search across multiple (that is, thousands of) subject gateways. These digital repository harvesting services imitate the functions and interface of a search engine, but they can be moulded to search in specific academic areas. In other words, one can create completely noncommercial searching environments that offer the scope and feel of a search engine.

Making information scarce instead of open

David Bollier, Herman Daly on the Commonwealth of Nature and Knowledge, On the Commons, October 3, 2005. Excerpt:
Why do economists insist on treating information and creative works as scarce – while making the opposite mistake with respect to the depletable services of nature, which they treat as limitless by pricing at zero? Last week, in the inaugural presentation of the new Forum on Society Wealth lecture series at UMass, Amherst, economist Herman Daly tried to shed some light on these paradoxes....In the information commons...intellectual property law is used to make an essentially limitless resource – knowledge – scarce. The over-propertization of knowledge can have lots of unfortunate effects, from preventing universal access and benefit to inhibiting the development of new knowledge. Economists see the imposition of artificial scarcity on knowledge (via copyright and trademark law) as a necessary condition for enabling market exchange. But the upshot, said Daly, is that “we mistakenly think that scarcity increases public wealth.” In fact, its chief result is the creation of private wealth.

Daly lamented the fact that economics deals mostly with the allocation of a resource among competing users, but fails to deal with issues of scale and just distribution. Economists don’t really address the appropriate physical size of the economy relative to the ecosystem – and thus they ignore the environmental sustainability of the economy. Similarly, economists don’t trouble themselves with the issue of who gets property rights in the first place -- and therefore, whether the distribution of market results are legitimate and just. Neither of these problems – sustainability and just distribution – can be solved from within the market paradigm, Daly warned. They require pressure from outside of the market, from civil society and governments.

New OA journal on communication and action

Systems, Signs & Actions: An International Journal on Communication, Information Technology and Work is a new peer-reviewed, open-access journal sponsored by Linköpings and Aarhus Universities. For the topic and scope of the journal, see the editorial by Peter Bøgh Andersena and Göran Goldkuhlb in the inaugural issues. (Thanks to Marcus Zillman.)

Comparing Google Scholar, PubMed, and Scirus

Dean Giustini and Eugene Barsky, A look at Google Scholar, PubMed, and Scirus: comparisons and recommendations, a preprint forthcoming from the Journal of the Canadian Health Libraries Association. Excerpt:
In summary, information professionals have no choice but to recommend Google Scholar under certain conditions and caveats. Librarians should be prepared to teach GS and PubMed side by side and answer questions about it, especially how it compares to commercial tools like OVID. Clearly, GS provides an easy means to access the health literature. Health librarians should not dismiss it outright, especially for simple browsing, known-item searching, and linking to free materials on the open Web. Where literature reviews are required, i.e., grants, clinical trials, or systematic reviews, health librarians will continue to recommend MEDLINE, Cochrane (with Google for grey literature), and other trusted sources. Finally, clinical queries must be answered by replacing requests in context. Health professionals already search Googleand will continue to use it (responsibly, one hopes) to satisfy their basic information needs.

New OA journal of palliative care

The Indian Journal of Palliative Care is a new peer-reviewed, open-access journal from MedKnow Publications. (Thanks to D.K. Sahu.) From the announcement:
The Indian Journal of Palliative Care is an interdisciplinary, peer reviewed journal published biannually. The journal welcomes contributions on clinical research, psycho social, ethical and spiritual issues related to palliative care. The website of the journal allows immediate open access to articles published in the journal. The journal is published by Medknow Publications and similar to all other journals published by Medknow the open access is without article submission, processing or publication fee. Articles could be submitted in the following sections: original articles, review articles, clinical guidelines, case reports, case discussions, narratives, reports on important meetings, book reviews, short reports and letters to the editor. Electronic submission of articles via email is welcomed.

Friday, October 07, 2005

New & forthcoming journals from BMC

Journal of Ethnobiology and Ethnomedicine launched in August at BioMed Central. It is not yet mirrored at PubMed Central, but will be shortly.

Journal of Ethnobiology and Ethnomedicine - Fulltext v1+ (2005+); ISSN: 1746-4269.

In addition, I've listed eight forthcoming titles from BioMed Central. Forthcoming titles:

Biological Knowledge; ISSN: 1745-4743.

Biology Direct; ISSN: 1745-6150.

Diagnostic Pathology ; ISSN: 1746-1596.

International Breastfeeding Journal; ISSN: 1746-4358.

Journal of Biomedical Discovery and Collaboration ; ISSN: 1747-5333.

Philosophy, Ethics, and Humanities in Medicine; ISSN: 1747-5341.

Substance Abuse Treatment, Prevention, and Policy; ISSN: 1747-597X.

Synthetic and Systems Biology; ISSN: 1747-8332.

Two geology journals providing free access

Bulletin of Geosciences, from the Czech Geological Survey, and Revista Mexicana de Ciencias Geologicas, hosted by Universidad Nacional Autonoma de Mexico (UNAM), provide free online access to current geological research. RMCG has a focus on the IberoAmerican region. Bulletin of Geosciences focuses on the geology of the Czech Republic.

Bulletin of Geosciences - Fulltext v77+ (2002+); ISSN: 1214-1119.

Revista Mexicana de Ciencias Geologicas - Fulltext v3(2)+ (1979+); ISSN: 1026-8774.

SciELO releases two more Open Access journals

SciELO (Scientific Electronic Library Online) continues to promote the dissemination of Latin American research. The newest, previously limited circulation, print journals to launch Open Access editions are Tempo Social and Agora: Estudos em Teoria Psicanalitica.

Tempo Social - Fulltext v16(2)+ (November 2004+); Print ISSN 0103-2070.

Agora: Estudos em Teoria Psicanalitica - Fulltext v7(2)+ (July/December 2004+); Print ISSN: 1516-1498.

Useful data from OA repositories

Two recent sources have separately described a promising way to use OpenURL link resolvers and OA repositories to help libraries generate data on the journals that their users cite, search, read, and publish in. The first is the August 22 open letter from several UK scholars (Tim Berners-Lee et al.) rebutting the ALPSP objections to the draft RCUK policy. The second is an October 7 AmSci posting by Tim Brody.

We know that OA archiving can improve an article's citation impact, which can improve the impact factor of the publishing journal. What's new is that if we draw the right kinds of data from OA repository traffic, then libraries will be able to make more intelligent subscription decisions, not just more intelligent cancellation decisions. Journals can help generate these benefits by encouraging authors to deposit their work in OA repositories, not just on personal web sites. Finally, the data could help measure impact at the article level, which helps authors and readers more than measurements at the journal level.

Excerpt from the open letter:

"[P]ublishers and institutional repositories can and will easily work out a collaborative system of pooled usage statistics, all credited to the publisher's official version....The easiest thing in the world for Institutional Repositories (IRs) to provide to publishers (along with the link from the self-archived supplement in the IR to the official journal version on the publisher's website -- something that is already dictated by good scholarly practice) is the IR download statistics for the self-archived version of each article. These can be pooled with the download statistics for the official journal version and all of it (rightly) credited to the article itself....All these statistics and benefits are there to be shared between publishers, librarians and research institutions in a cooperative, collaborative atmosphere that welcomes the benefits of self-archiving to research and that works to establish a system that shares them among the interested parties. Collaboration on the sharing of the benefits of self-archiving is what learned societies should be setting up meetings to do -- rather than just trying to delay and oppose what is so obviously a substantial and certain benefit to research, researchers, their institutions and funders, as well as a considerable potential benefit to journals, publishers and libraries....Librarians' decisions about which journals to renew or cancel take into account a variety of comparative measures, citation statistics being one of them (footnote 2). Self-archiving has now been analysed extensively and shown to increase journal article citations substantially in field after field; so journals carrying self-archived articles will have higher impact factors, and will hence perform better under this measure in competing for their share of libraries' serials budgets...

Excerpt from Tim Brody's posting:

I can have a lot of fun with hypothetical scenarios, in particular how open access could provide more in depth usage and impact analytical tools for librarians. The situation now is that journal usage metrics are being used because it provides easy access to comparative institution-specific information for user's online 'reading'. (Although is there evidence to show cancellation based on usage stats?) So let us suppose that an institution's authors are self-archiving 90%+ their own material into the institutional archive. The librarian can then discover which journals their authors are : (1) publishing in, (2) editing, (3) citing. An OpenURL resolver is used that points users automatically at the journal version (where subscribed), or an author self-archived version (where available). That resolver will provide the institutional manager with information on usage, journal interconnectedness etc. The resolver naturally aggregates open access and subscription content usage. Perhaps attempts by users to access an unsubscribed journal will be drawn to the librarian's attention. The institutional archive records logs of who accesses its content, and provides those logs to a 3rd party service that aggregates data across commercial and open access sources. The service then provides summary reports, as a comparative 'web impact' metric for journals (/authors/institutions/publishers).

The last-mile problem for knowledge

AskPhilosophers is a new Q&A site for philosophy. Readers send in philosophical questions, a moderator screens out the cranks and pornographers, and philosophers from a hand-picked panel answer them. The questions are sorted into 20 categories or sub-topics, which users can follow individually or in a mix. All the content is OA. The site even has an RSS feed.

Comment. It's simple but it works. I like it, and not just because it's in my own field. I like it because it's the most promising format I've seen for solving what could be called the last-mile problem for knowledge. Lots of dedicated researchers do lots of difficult research, which is then written up, vetted, published, and disseminated. This knowledge makes it from the ether to the mind of the researcher, then to paper or disk, then to a publisher, and then to the library or internet. But it very rarely jumps the last gap to the curious person who wants to know what it's all about. There are lots of reasons, including the cost of access and the scarcity of time. But an important part of the problem is that this knowledge is usually intelligible only to specialists, excluding both lay readers and professional researchers from other fields. What we badly need is a service to connect these large bodies of knowledge to curious minds --to solve the last-mile problem. Research publications typically leave the gap unbridged. Listservs either let in too much spam at the query end or let out too much self-righteous gas at the answer end. General Q&A sites don't attract enough knowledgeable people as question-answerers. And lay summaries of cutting-edge research don't necessarily answer the questions that curious minds want to ask. What I like about AskPhilosophers is that it's question-driven, professionally staffed, and moderates both the input and the output. Serious questions will get through and, when they do, they will receive serious answers. I'd love to see every discipline set up something similar. Right now, I'd spend a lot of my time at AskGeologists, AskMathematicians, and AskNutritionists. Scholars: start your engines.

New OA research repository at University College London

From a University College London press release, dated today:
A new web database for entering and viewing UCL academic staff research publications details online is now available to view on the UCL website. The database [is] named MyOPIA (MySQL Online Publications Index Administration)....Research publications data has been formally collated across UCL since 1997 and MyOPIA allows both academic staff publications from before this date and those published while employed outside of UCL to be added to the system. This will allow academic staff to have a complete personal listing of all their publications on MyOPIA, which is also accessible to the public....MyOPIA is also connected to UCL’s Eprints system, managed by UCL Library Services, which allows researchers to submit full-text copies of their research papers to an online archive. Once submitted to Eprints, the papers will automatically appear in the research publications database.

More on the threat to OA in Canada

Nadya Bell, Amendments to copyright law could cost universities, Manitoban Online, October 7, 2005. Excerpt:
Universities could have to pay for students and professors to use free Internet sites if an amendment to the copyright act passes in the House of Commons. Bill C-60 is intended to adapt Canadian copyright to the Internet and regulate things like music sharing and website use. Under the proposed bill, Internet services that would be free to use at home would require copyright royalties to be used in the classroom or for homework. Opposition MPs and education advocates are calling on the government to allow schoolteachers and professors an exemption from copyright restrictions....“We’re already paying a lot of money to copyright,” said [Steve] Wills [manager of legal affairs for the Association of Universities and Colleges of Canada]. “Adding to the fees would be particularly galling in the case of publicly-available material on the Internet.”...Wills said in one case a professor was quoted $66 per minute for a video clip he wished to use in class, but under American law a professor would have free access to the same material.

More on the Rowman & Littlefield opt-out

Brad Hill has an interesting new detail on the Rowman & Littlefield opt-out decision: the publisher "is withdrawing from Google Print For Publishers, over the Google Print For Libraries policy."

(PS: Remember that Google Print for Publishers seeks publisher permission first while Google Print for Libraries doesn't.)

OA digital texts talking to each other

Gregory Crane, Reading in the Age of Google, Humanities, September/October 2005. Crane is the editor in chief of the OA Perseus Digital Library. Excerpt:
[I]n the Phaedrus, Plato has Socrates commenting that written words are like statues that may imitate life but have no life of their own. He stresses the inert quality of written language: "Writing says one single thing --whatever it may be-- the very same thing forever."...Twenty years ago, Marvin Minsky, a proponent of artificial intelligence, responded to this ancient challenge and imagined a time when people could not imagine a library in which the books did not talk to each other....Google, Amazon, and other companies mine data, analyzing our queries and making inferences about our goals, using as much information as they have to help us spend money. Many of these same techniques can, however, help us learn. In the Perseus Digital Library, we already have the beginnings of new reading environments that help us understand complex documents in a variety of languages. For people studying Plato, for example, the Perseus Digital Library can assemble a range of materials relevant to the Phaedrus, including a Greek text, English translation, and a list of documents that comment on the opening section of the dialogue. The reader can customize the display by explicitly asking for original source text in Greek, choosing a translation font, and making other decisions about what should be displayed. The reader can ask a question about a particular Greek word, and the system can personalize its response: it recognizes that the reader is looking at a dialogue of Plato and highlights all citations to Plato in the online lexicon entry. These electronic actions are simple in nature but profound in their implications. Many different books are, in effect, having a conversation among themselves and deciding how best to serve the human reader.

Jacob Neusner opts out

Vincent Kiernan, Academic Press and Prolific Author Tell Google to Remove Their Books From Its Scanning Project, Chronicle of Higher Education, October 7, 2005 (accessible only to subscribers). Excerpt:
A well-known scholar and his publisher have demanded that Google withdraw his books from the digital archive that the Internet-search company is compiling from the holdings of five university and research libraries. "The basic problem is copyright violation," said Jacob Neusner, a research professor of theology at Bard College, who has written [or edited] more than 900 books...In an interview on Thursday, Mr. Neusner said that he had asked Google to remove his works from its Google Library project, but Google had insisted that he fill out a separate form for each of his books. That was wrong, said Mr. Neusner, because under copyright law it is Google's responsibility to seek permission to use a copyrighted work. So the Rowman & Littlefield Publishing Group, which has issued many of his books, took up the banner and has insisted that all of its works be removed from Google Library as well. Jed Lyons, president of Rowman & Littlefield, said that his company had not requested a royalty from Google for using the works. Nor will he. "We think it's unfair and arrogant and disrespectful of publishers' and authors' rights, and we don't want to do business with an organization that thumbs its nose at publishers and authors," he said. But Google, he said, is seeking to change his mind about withdrawing the works. "They're trying to convince us it's a mistake."

(PS: It's Jacob Neusner's loss. We know from the Authors Guild lawsuit that he's not the only author who would rather dig in his heels than find readers and buyers. If any of our searches would have pointed to his work, then it's also our loss.)

More on the OCA

Max Chafkin, Yahoo Takes Friendly Approach to Book Digitization, Sidesteps Google Uproar, The Book Standard, October 06, 2005. Excerpt:
The consortium, says OCA founder Brewster Kahle, could eventually scan millions of books. “We’ve been trying to digitize materials for years,” said Kahle, adding that publishers will eventually be invited to submit copyrighted works, which will be made available on a more limited basis. “The breakthrough is that we are doing this in the open --everybody’s shoulders drop, the lawyers go back to their cubicles, and we’re free to get things done.” The other breakthrough, say librarians for both universities, is scale. Prior to linking up with Kahle, the UC library system estimated the cost of scanning and archiving documents at $20 per page, while the University of Toronto, which has already scanned small portions of its collection, estimates that cost at $1. By contrast, Kahle’s technology costs only 10 cents per page. “It’s the production level,” says University of Toronto chief librarian Carole Moore. “Before, we weren’t doing it on any mass scale.” While the project was widely interpreted in initial reports as a rejoinder to the Google project, Google Director of Content Partnerships Jim Gerber said..., “I don’t think [the projects] are competitive at all,” said Gerber, adding that he sees OCA “as additive and beneficial” both to the publishing community and to Google’s mission of indexing the world’s information....While Yahoo’s actual dollar contribution to OCA pales in comparison to the tens of millions of dollars Google is poised to spend on its scanning project, the softball approach to digitization could have two potential benefits. First, by partnering with an innocuous project like the OCA, Yahoo can sit back and let Google slog through the legal muck of digitizing copyrighted books, something that both sides agree will eventually become the norm. Second, the fact that the OCA will allow anyone to host and index public-domain works may serve to undercut Google’s effort to protect --and profit from-- its digital copies.

OA to articles and data increases impact

Kristina Fister, At the frontier of biomedical publication: Chicago 2005, BMJ, October 8, 2005. Excerpt:
Last month the fifth congress on peer review and biomedical publication was held in Chicago. The presentations highlighted that we still have plenty of room to improve the quality of published research....Smaller journals may have to adopt other strategies to raise their impact factor. A study by Sahu and colleagues suggested that open access might be a powerful means for small journals to increase their visibility, citations, and consequently impact factor. Citations of articles published in the Journal of Postgraduate Medicine between 1990 and 1999 rose significantly after the journal went open access in 2001. Half of the articles were first cited only after open access was introduced....Apart from improving the quality of published literature, better reporting should speed up the advent of trial banks --open access electronic knowledge bases that can capture in detail aspects of trial design, execution, and results in a form that computers can understand. Decision support systems can then use these data more selectively, providing clinician friendly computer assistance for critical appraisal and evidence based practice. Sim reported that trialists found it easier to enter their data into the trial bank than to write a traditional research paper, and that readers found it easier to extract information about the trial --surely a sign that the days of journals reporting trials are numbered.

Bringing peer review to preprint archives

Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, The Convergence of Digital-Libraries and the Peer-Review Process, a preprint forthcoming from the Journal of Information Science. (Thanks to Charles W. Bailey, Jr.)
Abstract: Pre-print repositories have seen a significant increase in use over the past fifteen years across multiple research domains. Researchers are beginning to develop applications capable of using these repositories to assist the scientific community above and beyond the pure dissemination of information. The contribution set forth by this paper emphasizes a deconstructed publication model in which the peer-review process is mediated by an OAI-PMH peer-review service. This peer-review service uses a social-network algorithm to determine potential reviewers for a submitted manuscript and for weighting the relative influence of each participating reviewer's evaluations. This paper also suggests a set of peer-review specific metadata tags that can accompany a pre-print's existing metadata record. The combinations of these contributions provide a unique repository-centric peer-review model that fits within the widely deployed OAI-PMH framework.

Cost-recovery instead of OA at GPO and LOC

The U.S. Government Printing Office (GPO) and the and Library of Congress Cataloging Distribution Service (CDS) have decided not to allow open access to the latest edition of the Library of Congress Subject Headings (e-LCSH). From the announcement (October 5):
Note that the 28th edition (2005) of LCSH was distributed by FDLP in paper this year based on the current selection profiles. Please note that the Library of Congress Subject Headings is a CDS sale product for which costs must be recovered to sustain its continued availability. This electronic version is being made available to the Federal Depository Library Program with the condition that the files NOT be redistributed or made accessible outside the premises of participating FDLP libraries. If downloaded to a local server, the e-LCSH files must be placed on a location that is not accessible to Web crawlers or to users outside the premises of the FDLP library. To download the review copy of the e-LCSH, please go [here].

Thanks to James Jacobs at Free Government Information for the alert and also for this comment:

This is an excellent (though sad and ironic) example of the promise of digital information being crippled by contract for economic reasons. Where digital information holds the promise of being easily copied, re-distributed, and re-used, we see instead extreme restrictions being imposed on the information because "costs must be recovered". The restrictions bear repeating so that we can imagine the future of a world without digital deposit or a world with DRM locked down deposit or a world where use is limited not by copyright, but by contract: [1] Files may "NOT be redistributed", [2] Access only on "the premises", [3] Digital access hidden from web crawlers, [4] Digital access prohibited by users outside the library. The true "Luddites" are those that impose restrictions on access to government information rather than envisioning and enabling the possibilities created by digitization of information.

(PS: Of course costs must be recovered. But this is taxpayer-funded information. The cost-recovery model envisioned here is to charge taxpayers twice. This model is not only unfair to taxpayers, who have already paid once, but thwarts the public purpose in producing the information in the first place. Instead of being available to all taxpayers with a need to use it, the information will be available only to the subset who pass a means test.)

Lessig on the CC

Lawrence Lessig recaps the story of Creative Commons, and kicks off its fund-raising campaign, in a lengthy posting to the CC blog. Excerpt:
We stole the basic idea [for CC] from the Free Software Foundation -- give away free copyright licenses. Because copyright is property, the law requires that you get permission before you "use" a copyrighted work, unless that use is a "fair use." The particular kind of "use" that requires permission is any use within the reach of the exclusive rights that copyright grants. In the physical world, these "exclusive rights" leave lots unregulated by copyright. For example, in the real world, if you read a book, that's not a "fair use" of the book. It is an unregulated use of the book, as reading does not produce a copy (except in the brain, but don't tell the lawyers). But in cyberspace, there's no way to "use" a work without simultaneously making a "copy." In principle, and again, subject to fair use, any use of a work in cyberspace could be said to require permission first. And it is that feature (or bug, depending upon your perspective) that was the hook we used to get Creative Commons going.

ALPSP meeting with the RCUK

On September 16, the ALPSP met with representatives of the RCUK to discuss publisher objections to the draft OA policy. The ALPSP has publicly disclosed this much about the results of the meeting:
We are reassured that RCUK have agreed to explain to grant recipients why publishers might find it necessary to impose an embargo or time limit for deposit of articles in order to protect subscription and licence sales, and also to insist that such embargoes must be observed; we have offered to help with drafting the wording for this. We are also pleased to know that RCUK will be consulting publishers over the specification of the research which will be conducted over the next two years, to evaluate the likely effects of the policy (although papers arising from research funded after the beginning of 2006 are unlikely to have been published by the review date of 2008); we hope that the research will be sufficiently objective to ensure that publishers do provide data about the effects, if any, on downloads, subscription/licence sales, and other measures of journal sustainability. RCUK plan to hold a workshop for societies in the early part of next year, and ALPSP has offered to help in any way that might be required.

The ALPSP minutes of the meeting are available to members only.

(PS: It looks like the RCUK will not close the "copyright loophole" in the current draft, which allows publishers to impose embargoes. Instead, it may even let publishers reword it to suit themselves.)

Updated Dworaczek index

Marian Dworaczek has updated his Subject Index to Literature on Electronic Sources of Information. The October 1 edition indexes 2,157 separate works.

How good is Wikipedia?

There's an interesting new Slashdot thread on the quality of Wikipedia.

Thursday, October 06, 2005

Making ER materials OA for the public

Klaus Graf, Electronic Reserve and Open Access, Archivalia, October 7, 2005. Excerpt:
Copyright law requires that an ER must be restricted to students and staff. Even if the ER is in the same repository as the OA eprints (this is the case e.g. in Essen-Duisburg) web users without a specific account cannot view the course materials....ERs contain both copyrighted modern works and Public Domain (PD) materials which were scanned for classroom use....Administrators and staff of ERs should give the general public access to PD documents. Administrators should encourage staff members to do so, and inform them about the legal framework and copyright issues (e.g. in Germany a work is PD if the author is 70 years dead). Concerning the copyrighted material (modern articles and book chapters) there is also a way to support OA. When preparing a course ER scholars can ask the authors for permission to make the materials available freely. In the US it is likely that the rights holder is the publisher. If publishers agree with OA (a lot of them do so) there is no legal problem to put OA versions in the web. Administrators of OA repositories should allow moving stuff from ERs (i.e. from mostly non-affiliated authors) into the archive. In the case of an unified system one has only to set access rights for the public. What is the advantage for the authors if their works are put into the OA part of an ER? They don't have to scan the documents and upload them to the repository. A permission request to an author can educate that author about OA, who may then be interested to know more about OA. Sending some permission mails is not really a lot of work. Conclusion: Administrators and staff of ERs should support OA by asking for permission to make OA versions of ER materials available.

Survey of what libraries are doing with institutional repositories

Elizabeth Winter is running a survey on institutional repositories. From her request for responses:
My colleague, Tim Daniels, and I are conducting a survey of librarians on the subject of institutional repositories, and we would be grateful for your participation. **Your institution DOES NOT have to have an institutional repository in order for you to participate.** We hope to learn some specifics about what libraries are doing with institutional repositories, and will be incorporating the results of this survey into a presentation for a conference this fall. The survey will only take 5-10 minutes to complete, and will be available [online] until Wednesday, October 19th. Your participation is, of course, voluntary, and we are not collecting any information that will identify your responses with you personally. We will be glad to share the aggregate results with you (just send me an email if you're interested: If you have any questions, please contact me.

More on the RCUK policy

The Dangers of Open Access, RCUK Style, Research Fortnight, October 3, 2005. An unsigned comment, accessible only to subscribers, critical of the draft RCUK OA policy. For some quoted selections, and direct rebuttals, see Stevan Harnad's response.

More integrated OA databases coming

From an Indiana University press release, dated yesterday:
Medical scientists must sift through and analyze mammoth amounts of data to find ways to treat disease, and an Indiana University School of Informatics-led team has been assembled to help them develop new discoveries. The School has been awarded a two-year $500,000 grant from the National Institutes of Health to establish the Chemical Informatics and Cyberinfrastructure Collaboratory, and it brings together experts in informatics, medicine, computer science, chemistry, biology and from IU’s Pervasive Technology Labs (PTL). Chemical informatics is the application of computer technology to chemistry in all of its manifestations, particularly in the drug-manufacturing industry. The group seeks to devise an integrated cyberinfrastructure composed of diverse and easily expandable databases, simulation engines and discovery tools such as PubChem, the NIH’s small molecule chemical and biological database. They will use emerging high-capacity computer networks and data repositories and develop grid and Web technology for chemistry research.

More on ACS v. PubChem

Emma Marris, Chemical Reaction, Nature, October 6, 2005 (accessible only to subscribers). Excerpt:
The American Chemical Society (ACS) is the world’s largest scientific society....The society owes most of its wealth to its two ‘information services’ divisions — the publications arm and the Chemical Abstracts Service (CAS), a rich database of chemical information and literature. Together, in 2004, these divisions made about $340 million — 82% of the society’s revenue — and accounted for $300 million (74%) of its expenditure....Although the ACS is a non-profit organization, the information-services divisions are increasingly being run like businesses. Any net revenue is naturally fed back into the society’s other activities, but the business-like attitude is making some ACS members uneasy. A small but vocal group of critics fears that business priorities are supplanting the goal laid out in the society’s charter: “to encourage in the broadest and most liberal manner the advancement of chemistry and all its branches”....An ongoing dispute between the ACS and the US National Institutes of Health (NIH) reflects some of the problems. The NIH has recently unveiled a freely accessible database called PubChem, which provides information on the biological activity of small molecules. The ACS sees this as unfair competition to the fee-based CAS because it is taxpayer-funded, and the society wants the database restricted to molecules that have been screened by NIH centres. A few ACS members argue that the society is being unduly aggressive in protecting CAS and ought not to be challenging the scope of a database that could be a useful and free resource for chemists. For the record, Nature’s sister journal Nature Chemical Biology links all of its articles to PubChem. “I am growing increasingly upset with their direction,” says Chris Reed, an inorganic chemist at the University of California, Riverside, and one of the more outspoken critics of the ACS. “They have a culture of a for-profit corporation.”...Steve Heller, who lives in Silver Spring, Maryland, is part of an e-mail listserver community that is a source of lively discussion on this issue. Heller is a retired chemist and ACS member who also serves on an NIH advisory board on PubChem. “It seems as if those members of the ACS who see and know what is going on — and it is not a very large number — are very upset that the management and staff are taking a position without any consultation with the membership or discussion with experts in the field, and doing things that are not in the interest of their members, who want [PubChem] for free,” he says.

OCA gets it almost right

Preston Gralla, Yahoo Gets Book-Scanning Right...Almost, Networking Pipeline, October 5, 2005. Excerpt:
The Yahoo-led project to scan books and library material [called the Open Content Alliance or OCA] and make them available online is on target, unlike the wrong-headed Google initiative that will lead to massive copyright violations. Despite a few minor problems with the Yahoo program, Google should learn from its competitor and follow the same rules....There are only a few drawbacks to the plan. First is that the material will be made available in Adobe Acrobat format, rather than as text. Acrobat is a notoriously finicky format, and the Acrobat reader has probably crashed more computers than anything this side of Windows. It's big, it's ugly, and it's a resource hog. People should have the option of viewing in plain text. Second is that all the work in the archive, regardless of copyright, will be made fully available as Acrobat files, so it can be easily printed out. This is great for public domain works, but not so great for copyrighted works. Copyright holders justifiably won't want their entire works made available this way, and few will probably want to participate. Yahoo should have a two-tiered program --- snippets for copyrighted works; full online access for the rest.

Comment. Three quick replies. (1) Gralla is hasty to conclude that Google's opt-out policy violates copyright. See my defense of it from last week's issue of SOAN. (2) I wholeheartedly share Gralla's preference for plain text over PDF. The fact that Adobe is a partner in OCA doesn't mean that OCA has to lock up the content in this annoying format. Users should have a plain-text option. (3) Gralla may be right that the full-text or nothing plan will lead many copyright holders to choose nothing. But the solution isn't to limit copyright holders to snippets. OCA can enlarge the menu and offer copyright holders full-text, snippets, or nothing. Many publishers will choose full-text, just as many publishers are volunteering their books to the Google Publisher program.

Mandating OA: In what kind of repository?

Dorothea Salo, Heard 'Round The World, Caveat Lector, October 3, 2005. Excerpt:
For the first time, a certain class of researchers must provide open access to their research results as a condition of their grant. The huge UK funder Wellcome Trust made deposit in PubMed obligatory as of yesterday. We here in the States had a golden opportunity to fire the open-access shot heard ’round the world: the NIH chewed on policy for nearly a year. We backed down. Wellcome Trust didn’t. Good for Wellcome Trust, and I hope to see a troop of funders fall into line behind them. That said --you’d think this would help me and the repository I manage, but it doesn’t....The Wellcome Trust grant agreement mandates PubMed, not just open access. They don’t positively forbid grantees to deposit somewhere else, but they don’t consider that a substitute for PubMed deposit. So I’m out in the cold, basically. The deeper question is which repositories are trustworthy enough to be viable substitutes for PubMed. Wellcome Trust understandably and correctly doesn’t want researchers slapping their stuff on their own websites and calling that a repository. (Why not? Well, because real repositories make guarantees about bit preservation and URL non-breakage that ordinary websites don’t. 404s aren’t acceptable in this business.) Nor, sadly, are all actual repositories likely to make it, long-term, because not everyone who has opened a repository quite realizes what a commitment they should be making to it. The answer may lie in repository certification. It’s terribly hard for an entity like Wellcome Trust to define just now which repositories are acceptable for deposit. (Mandate software platform? Sure, but the software platform is only one small part of the story.) Once repositories can be certified as trustworthy under a central definition, it becomes easy. So as much as I disagree with parts of NARA-RLG’s recommendations, I’m very glad they exist. I want a piece of the mandated-OA action, I do, and certification seems likely to be my path to it.

CLA Info Commons Interest Group now online

Heather Morrison, Info website and listserv, Imaginary Journal of Poetic Economics, October 5, 2005. Excerpt:
The Canadian Library Association's newly formed Information Commons Interest Group's website is now live, and our listserv is up and running and open to all!...Projects in progress: [1] Copyright in Libraries: the Digital Conundrum (Proposal for CLA Preconference), [2] wiki setup, [3] Drafting response to SSHRC Consultation on Open Access.

More on the impact advantage of OA

Stevan Harnad, How to compare research impact of toll- vs. open-access research, Open Access Archivangelism, October 4, 2005. Excerpt:
[Sally Morris objected:] "The problem is, there is no evidence of correlation between citations and the return on research expenditure."

[Harnad replied:] Citations are one direct, face-valid measure of return on research expenditure. Research is funded in order to be applied and built-upon, i.e., to be used; citations are an index of that usage. Uncited, unused research may as well not have been conducted, and represents no return on the research investment. Whatever increases usage and citations, increases the return on the research investment. Any loss of such a potential increase is a loss of potential return on the research investment. Self-archiving increases citations 50%-250%. Hence the failure to self-archive loses 50%-250% of the potential return on the research investment....

[Morris:] "Clearly, we are a long way off being able to analyse whether or not self-archiving (or any other form of open access) does or does not contribute to these objective output measures."

[Harnad:] I thought the question was about whether citation counts are correlated with these measures. We already know that self-archiving is correlated with increased citation counts.

(PS: For the studies showing the correlation to which Harnad refers, see Steve Hitchcock's excellent bibliography.)

Open-source submission tool for ETD repositories

VALET is a new, open-source submission tool for Fedora-based ETD repositories. From the September 27 press release:
VTLS Inc. has been collaborating with the NDLTD project at Virginia Tech, the FEDORA Project, and the Australian Research Repositories Online to the World (ARROW) Project, led by Monash University in Australia, to develop VALET for ETDs [Electronic Theses and Dissertations]. This open-source product is simple, flexible, adaptable and easy to implement. A typical process allows for thesis submission by students, editing and approval by faculty, approval by the graduate school and final deposit into a FEDORA-based, institutional repository. The institution can configure the number of steps in the process and the details of each step. The software minimizes errors and offers instant, formbased validation. It also offers multi-level security for students, faculty and administration. When a thesis enters the repository, the software automatically creates standardized metadata. It is preconfigured to allow users to choose Dublin Core or ETD-MS, but can support other metadata standards or schemas, such as MARCXML. VALET helps streamline the submission process while increasing the quality of the final ETD resource. While the initial version supports submission via a Web interface, the next version will also support submission via e-mail. FEDORA is packaged with VALET.

Scirus will index ETDs

From a Reed Elsevier press release, dated yesterday:
Elsevier today announced a landmark partnership between Scirus, its free science-specific search engine, and the Networked Digital Library of Theses and Dissertations (NDLTD) to add the extensive collection of [open-access] theses and dissertations of its member institutes to Scirus. In addition to indexing the content on, Scirus will power a search service on the repository's site. The service will ensure this content will be easier to find on both the NDLTD and Scirus websites. The launch of the service was announced this week in Sydney at NDLTD's annual conference ETD2005. "Until now, theses and dissertations have not been fully leveraged by postgraduate students and researchers in their work because these documents have been difficult to find and retrieve," said Deborah Kahn, an associate at Electronic Publishing Services, Ltd. To combat this trend, Scirus has indexed over 200,000 theses and dissertations, in more than twelve languages...."With its particular expertise in indexing for scientific and research content, Scirus is a logical partner for NDLTD," said Edward Fox, executive director of NDLTD and professor of computer science at Virginia Tech. "Building on their impressive history of providing scientists and students with the information they need for their research, Scirus now also supports NDLTD's goals of enhanced access to scholarship worldwide. We are looking forward to expanding our collection and partnership in the future."

Amherst joins ATA

The future of OA to biodiversity data

Roger Harris, To Be Free, or Not To Be, American Scientist, November-December, 2005. Excerpt:
Imagine walking into your downtown library and finding that you can't check out a book without paying a fee. What you took for granted as a free service, you now have to pay for. A similar situation may soon face biologists who study biodiversity, the variety and number of species....Today, biodiversity databases are growing and struggling for funds --which may come in the form of private investment that could transform what is now an open, public resource reliant on government and nonprofit funding....Biodiversity databases, each with its own way to codify, organize and search data, have proliferated as experts in various taxonomic groups have built catalogs to meet their specific needs. (An example is the well-known FishBase.) The Catalogue of Life Programme is the biggest and boldest attempt to integrate these databases. It is a joint agreement between Species 2000 (acting as a coordinating umbrella organization), the Global Biodiversity Information Facility (GBIF) and the Integrated Taxonomic Information Systems (ITIS). ITIS, the main U.S. contributor, is in turn a partnership of federal agencies and nonprofit organizations (themselves collaborations!) including NatureServe, the U.S. Geological Survey, the Smithsonian Institution and the National Biological Information Infrastructure. The organizational layers illustrate the complexity and cost of developing gigantic data sets as well as the extent of public-agency involvement....Stuart Pimm of the Nicholas School of the Environment and Earth Sciences at Duke University agrees: "So many of the data are collected by state and federal agencies, there is enormous public pressure to keep access open." A hint of private interest in the growing databases came in January 2004, when Thomson Scientific, the world's largest information corporation, acquired Biosis, known for indexing and abstracting life-sciences journals. Biosis managed the Zoological Record, whose computers had hosted the Species 2000 project to that point. With the acquisition, Thomson now hosted the Species 2000 database, an arrangement that continues. Although Jim Pringle, Thomson's vice president of development, says the company does not have definite plans to privatize biodiversity data, Thomson promptly applied to become a member of Species 2000. Frank Bisby, executive director of Species 2000, said the Species 2000 directors "took advice … and decided that [it] was not appropriate for a subsidiary of a major multinational."

More on ALPSP's objection to Google's opt-out policy

Somehow I forgot to blog Danny Sullivan's interview with Sally Morris on the ALPSP's objection to Google Library's opt-out policy. It appeared in SearchDay for August 30. Sally Morris is the Chief Executive of the ALPSP. (Thanks to Gary Price.) Excerpt:
The ALPSP put out a statement (PDF format) last week with this key highlight that caught my eye: "Google Print for Libraries is a very different matter. We firmly believe that, in cases where the works digitised are still in copyright, the law does not permit making a complete digital copy for such [indexing] purposes." I asked Morris: "Google...has indexed nearly 1,000 pages from the ALPSP web site. My assumption is that the ALPSP never overtly asked for these pages, all of which are copyrighted, to be digitized and included in Google. Despite this, I've never heard your organization complain about such indexing....In short, why is opt-out OK when it comes to web content but not OK when it comes to [other] published works?"

Morris replied: "[Y]ou're right, in principle Google should seek opt-in permission before indexing freely available web pages, too...However, I think the issue is much more acute where the content is not made freely available by its copyright owner - which is, of course, the case for all the in-copyright content Google are planning to digitise from libraries."

I wasn't convinced on the "freely available" front and sent this follow-up: "Why is publishing a book not making content freely available? If I go into a library, I've got plenty of content for free. That's exactly why Google has gone into the libraries....I don't know of any library being sued for allowing people to borrow books, which arguably goes directly to the potential earnings a publisher could make....In contrast, Google is not making the full text of books available as a library does. If anything, libraries are far greater infringers than Google and have been so longer. Why aren't libraries being targeted?"

Morris replied: "A published book is sold - to the individual or to the library. Lending it out does not contravene copyright. To my mind, making a digital copy of the whole thing does. We are not saying that increasing visibility via Google Print is a bad thing - I think those of our members who participate in the Google Print for Publishers program (or who otherwise allow Google to index their closed content) are generally pleased with the increased hits, though I'm less clear whether they are in fact seeing increased sales. All we're saying is that the method of achieving it seems to us clearly to break copyright laws - and we'd like to work with Google to find an acceptable way of getting publishers' opt-in."

[Sullivan again:] And I guess all I'm saying is that those publishers, if they try to push this angle with Google via a lawsuit, had better be prepared for explaining why they've never complained about having their web sites indexed by Google for years without permission. Moreover, woe to the publisher or member of a publishing group that is ever found during legal disclosure to have complained about not being indexed better on Google. You can't enjoy years of free traffic from a source, then suddenly decide that copyright law is now different just because the words appear in print, rather than on the web. One interesting solution will be to see if Google simply goes out and buys a copy of every book it wants to offer in its virtual library.

Wednesday, October 05, 2005

On the length of the UK term of copyright

Suw Charman, Should the term of copyright protection be extended or shortened in the UK? Open Rights Group, October 1, 2005. Blog notes on a panel discussion among Lawrence Lessig, John McVay, and Adam Singer, moderated by John Howkins. (Thanks to QuickLinks.)

Searching Medline via PubMed

E. Motschall and Y. Falck-Ytter, Searching the MEDLINE Literature Database through PubMed: A Short Guide, Onkologie, September 2005.
Abstract: The Medline database from the National Library of Medicine (NLM) contains more than 12 million bibliographic citations from over 4,600 international biomedical journals. One of the interfaces for searching Medline is PubMed, provided by the NLM for free access via the Internet. Also searchable with the PubMed interface are non-Medline citations, i.e. articles supplied by publishers to the NLM. Direct access to an electronic full text version is also possible if the article is available from a publisher or institution participating in Linkout. Some publishers provide free access to their journals. Other journals require an online license and are fee based. The following example demonstrates some of the most important search functions in PubMed. We will start out with a fast and simple approach without the use of specific searching techniques and then continue with a more sophisticated search that requires the knowledge of Medline search functions. This example will show how the application of Medline search tools and how the use of the controlled vocabulary of ‘Medical Subject Headings’ (MeSH) will influence the results in comparison with the fast and simple approach. Let’s try to find the best evidence to answer the following question: Is a 30-year-old man with typical acid reflux symptoms for many years (gastroesophageal reflux disease, GERD) more likely to develop esophageal cancer than people without reflux symptoms? This question can be split into several components: - a patient with reflux symptoms (GERD), - esophageal cancer: etiology, risk, - study design for etiology studies: cohort studies, case-control studies.

Another publisher on Google Print

Karen Christensen, Google and the library, another installment in the EPS debate on Google Print, October 4, 2005. Karen Christensen is CEO of the Berkshire Publishing Group. Excerpt:
Our problem is that the guys and girls at Google don’t really get books. They want to believe that books are just primitive webpages, simply more information to be organized for the benefit of everyone....But websites are built for the web. Books were not written for Google Library. Forcing publishers and authors to opt-out, instead of opt-in, is not fair. It’s coercive....Librarians, unfortunately, don’t understand the rights of the creators and producers of books. Most librarians do not understand the work and expense, the expertise and talent, involved in creating the publications they buy. And quite a few believe that information should be free --unless it is only available through them. Besides that, Google has an unhealthy fascination for librarians: they are (rightly) terrified by the fact that students go to Google instead of to them, but they can’t take their eyes off it. Google is taking advantage of librarians by making them partners in a process that undermines the sources of information and knowledge that their institutions and communities depend on. As a result, authors and publishers can easily be made to look obstructive and mean-spirited....It’s a good thing the Google lawsuit isn’t going to be decided by a public referendum, because we authors would lose hands down. I’ve taken to asking people whether, if it were possible, they would be happy if they knew Google was going to scan, store, and index copies of all their personal photographs and diaries, photos of the interior of their house and their closets, all without permission? (And use that content to make money.) Our challenge is to show people just what it takes to create and publish a book and that intellectual creation merits every bit as much protection as physical property. And we need to talk about this is simple terms. When Google says it will take and hold and use content that does not belong to them, without asking permission, they are coming awfully close to breaking their own rule, “Don’t be evil.”

Comment. Three quick replies. (1) For a defense of Google Library's opt-out policy, see my article in this week's SOAN. (2) The analogy to personal photos and diaries is very bad. We don't make them hoping to bring them to the largest possible audience. We don't hope to make money from them. We don't welcome free advertising for their contents. But book authors do all of these things. (3) Publishers who don't want to look "obstructive and mean-spirited" should stop using the false and grasping comparison of intellectual property to physical property. Physical property doesn't enter the public domain after a fixed term of years, and non-owners have no fair-use rights over it. Intellectual property is only quasi-property that every country on Earth treats very differently from physical property.

More on applying trade embargoes to science

John Miller, US societies reverse rules on Iranians, TheScientist, October 4, 2005. Excerpt:
Two American academic societies have reversed their policies toward Iranian scientists. One, the American Institute of Aeronautics and Astronautics (AIAA), has decided to no longer prohibit Iranian authors from publishing in its journals, while the American Concrete Institute (ACI) has decided to install a new ban barring Iranian students from taking part in an annual engineering competition they routinely enter each year....AIAA enacted the ban because the board feared it might be violating US embargo law....In September 2003, the U.S. Treasury Department's Office of Foreign Assets Control (OFAC) ruled that the little-known embargo law prohibited the Institute of Electrical and Electronics Engineers (IEEE) from editing manuscripts from all embargoed countries, leaving it with no choice but to publish them unedited or reject them. Subsequently, a few other scientific societies stopped publishing Iranians after the IEEE decision out of fear that OFAC would also charge them with a crime. A consortium of academic publishers and societies led by the Association of American Publishers pressed OFAC for over a year to drop the embargo, but to no avail. Last October, the consortium sued the agency, and last December, OFAC reversed its decision, granting a general license to all US publishers to edit and publish material from embargoed nations. Marc Brodsky, CEO of the American Institute of Physics and a central figure in the publishers' lawsuit, told The Scientist that OFAC's reversal means AIAA never needed to ban Iranian articles....Last January, the American Concrete Institute (ACI) decided to ban Iranian students from taking part in its annual international student engineering competition after OFAC ruled that certification courses ACI had been offering to Iranian professionals were illegal, because they provided a service. Students from Cuba and Sudan were also banned from the competition, although none have entered the contest, according to William Tolley, ACI'S executive vice president. Tolley told The Scientist that since his organization had never known it was violating OFAC's rules until the agency began investigating, it decided to temporarily exclude Iranian students from the competition while it asked OFAC whether their participation was legal. However, Tolley said he has written to OFAC four times—most recently in September-- and he still doesn't have an answer.

OA to GIS data helps in Katrina rescue

Dibya Sarkar, GIS aids the Coast Guard, Federal Computer Week, October 3, 2005. (Thanks to Patrice McDermott.) Excerpt:
In helping the Coast Guard with their rescue operations in Mississippi after Hurricane Katrina, Talbot Brooks had to calculate – in minutes – the coordinates of people who were stranded on rooftops or elsewhere. Brooks, director of Delta State University’s Center for Interdisciplinary Geospatial Information Technologies in Cleveland, Miss., assembled geographic information system experts to help with the rescue and recovery efforts. For the Coast Guard, his team translated more than 100 street addresses into degrees, minutes and seconds required for helicopter and land-base rescues. For example, Brooks recalled one phone call handed to him at the emergency operations center. A young man said he had just spoken to his mother, who was trapped in the water treatment facility in Waveland, Miss. “And people were shouting, ‘Hurry she’s dying, she’s dying,’” Brooks said. “And that’s all the information we had.” Brooks said they used a combination of vector data and aerial photographs of the area before the hurricane. They found the water treatment plant in the imagery and gave the latitude and longitude to the Coast Guard within seven minutes, he said....GIS professionals from various Mississippi universities and the Urban and Regional Information Systems Association’s GISCorps volunteered to help Brooks with numerous tasks at the state’s emergency operations center in Jackson County. The experts also created a missing persons database and plotted their last known locations on a map. At last count, they produced more than 400 search and rescue maps for first responders showing the last known location of more than 10,000 missing persons. Additionally, they collected and mapped vital data such as power outages; cell tower and coverage areas; location of hazardous material, such as underground gas storage; location of wells, electrical substations and other critical infrastructure; and locations of shelter, food and water distribution points and capacities.

Warning to libraries in the Google Library project

Public Interest Watch issued a press release on September 22 (reissued October 3) warning that Google might censor the books it scans in its Library project. Excerpt:
Public Interest Watch ("PIW") today warned Harvard University, Stanford University, the University of Michigan, and the New York Public Library against turning over their research collections to Google, Inc. for inclusion in its Internet databases. PIW demanded that each institution receive an iron-clad guarantee from Google that the full contents of its collections would be used in its entirety for public benefit and would remain completely uncensored. PIW Executive Director Lewis Fein commented, "Google stands to make a fortune from this partnership, but what's in it for the taxpayers who have subsidized these collections? And what business does a university or a library have in affiliating with an $84 billion for-profit company that openly and willingly censors data to protect its market share?" Researchers at Harvard University conducted a study that revealed that Google routinely censors its databases. Since 2004, Google has cooperated with the Chinese government to censor the data it allows Chinese consumers to see. In return, Google enjoys broad, lucrative access to the Chinese marketplace. Google also censors data in France and Germany and, in 2002, it blocked access to Internet sites critical of the Church of Scientology rather than face the expense and distraction of a potentially frivolous lawsuit. Fein noted, "All Google cares about is making a profit. Faced with the choice of defending free and open access to a book or manuscript, or censoring it to avoid conflict, controversy, or expense, Google has demonstrated that censorship is its preferred option."

Comment. On its web site, PIW describes its mission as "Keeping an eye on the self-appointed guardians of the public interest." OK. But PIW is another self-appointed guardian of the public interest, and it could use less caffeine. The PIW warning is an overreaction. I'm familiar with cases of Google capitulation to the Chinese government and the Church of Scientology and I join PIW in deploring them. But PIW cites no evidence that Google has censored passages from the books in the Library project. Moreover, PIW makes inconsistent recommendations about the risk of book censorship. Libraries can't both (1) withhold their books from Google indexing and (2) demand guarantees of uncensored indexing. PIW also seems unaware that Google's contract with the participating libraries gives the libraries a role in deciding what content Google will index (Par. 2.1) and the right to back out if Google's indexing doesn't live up to the agreed-upon guidelines (Par. 2.4).

OA policies on the horizon in China

Richard Poynder, China Mulls Open Access, Open and Shut, October 4, 2005. Excerpt:
[T]he Chinese Academy of Sciences (CAS) has also begun mulling over the question of open access (OA). "According to my contact in China," says Jan Velterop, director of open access at the STM journal publisher Springer, "the Chinese Academy of Sciences is now in the process of organising a group of prominent scientists to issue an open call to Chinese funding agencies, and research and educational institutes, to promote open access." To this end, adds Velterop, the Academy is currently working on a draft document for scientists to review. It is also in the early stages of developing institutional operating policy guidelines for CAS to enable it to support open access. These developments come in the wake of an international meeting held at the Beijing-based CAS in June....[S]ays Key Perspective's Alma Swan, who gave a presentation at the Beijing meeting: "The amount of Chinese science being published is growing rapidly but much of it remains largely invisible to the rest of the world. Although some of the best is published in 'western' journals — there has been a 1500% increase in the number of Chinese articles indexed by ISI over the last 20 years — an enormous amount of Chinese research is tucked away in Chinese journals that are hard to get at. There are 2000 Chinese university journals, for example, and the vast majority of those are not indexed by any of the major indexing services. Chinese science is hiding its light under the proverbial bushel."...As one of the delegates who attended the Beijing meeting — speaking on condition of anonymity — put it: "Whatever the outcome of the current initiatives in the UK and China, those commercial publishers and learned societies who continue to resist open access are holding their fingers in a dyke that will, sooner or later, inevitably burst. The only issue for them now, therefore, is whether they learn to swim in the open waters, or choose to drown."

More on academic blogging

Henry Farrell, The Blogosphere as a Carnival of Ideas, Chronicle of Higher Education, October 7, 2005 (accessible only to subscribers). Excerpt:
While blogging has real intellectual payoffs, it is not conventional academic writing and shouldn't be an academic's main focus if he or she wants to get tenure. But to dismiss blogging as a bad idea altogether is to make an enormous mistake. Academic bloggers differ in their goals. Some are blogging to get personal or professional grievances off their chests pursue nonacademic interests. Others, perhaps the majority, see blogging as an extension of their academic personas. Their blogs allow them not only to express personal views but also to debate ideas, swap views about their disciplines, and connect to a wider public. For these academics, blogging isn't a hobby; it's an integral part of their scholarly identity. They may very well be the wave of the future. Look at what's happening in the disciplines of law and philosophy....In both of those disciplines, those who don't either blog or read and comment on others' blogs are cutting themselves out of an increasingly important set of discussions....Academic blogs offer the kind of intellectual excitement and engagement that attracted many scholars to the academic life in the first place, but which often get lost in the hustle to secure positions, grants, and disciplinary recognition. Properly considered, the blogosphere represents the closest equivalent to the Republic of Letters that we have today....While blogging won't replace academic publishing, it builds a space for serious conversation around and between the more considered articles and monographs that we write....Once you get used to this rapid back-and-forth, it can be hard to return to the more leisurely pace of academic journals and presses. In the words of the National University of Singapore philosophy professor and blogger John Holbo, the difference between academic publishing and blogging is reminiscent of "one of those Star Trek or Twilight Zone episodes where it turns out there is another species sharing the same space with us, but so sped up or slowed down in time, relatively, that contact is almost impossible."...Cross-blog conversations can turn the traditional hierarchies of the academy topsy-turvy....This openness can be discomfiting to those who are attached to established rankings and rituals -- but it also means that blogospheric conversations, when they're good, have a vigor and a liveliness that most academic discussion lacks....Most important, the scholarly blogosphere offers academics a place where they can reconnect with the public....Blogging democratizes the function of public intellectual.

New European signatories to the Berlin Declaration

The Ligue des Bibliothèques Européennes de Recherche (LIBER) and the Danish Royal Library have signed the Berlin Declaration on Open Acces to Knowledge.

The latest on ERIC

Paula J. Hane, Update on ERIC, Newslink, October 2005. Excerpt:
The venerable [open-access] ERIC database has been undergoing an extensive restructuring and modernization program. The ERIC database had been compiled by 16 subject-specific clearinghouses, but the clearinghouse contracts expired in December 2003 and a complete re-engineering began. (See our NewsBreak [from April 2003].) In March 2004, the U.S. Department of Education awarded a contract for the new ERIC system to Computer Sciences Corporation (CSC) of Rockville, Md. CSC launched a new database interface on Sept. 1, 2004. On Oct. 1, 2004, more than 107,000 full-text non-journal documents (issued 1993–2004), previously available through fee-based services only, were made available for free. In the future, the collection may include other electronic resources such as audio and video materials. The Education Resources Information Center (ERIC), sponsored by the Institute of Education Sciences (IES) of the U.S. Department of Education, is responsible for the database of journal and non-journal education literature. The ERIC online system now provides the public with a centralized Web site for searching the ERIC bibliographic database of more than 1.1 million citations going back to 1966, the year ERIC began.

BMC highlights its most-downloaded articles

BioMed Central is marking its most-downloaded articles with a new "Highly accessed" icon. As the site puts it, this is "a useful (though by no means perfect) way to identify articles that may be of broad interest." Unlike the appearance on a Top 10 list, the highly-accessed icon will persist, allowing authors to cite it. See for example the most-viewed articles of the past 30 days and the most-viewed articles of all time. You can also see the most-viewed articles for each individual BMC journal, the most-viewed articles for each institutional member, and the access statistics for any BMC author.

Combining ETDs and eprints in one OA repository

Arthur Sale, Unifying ETD with open access repositories, published in Carneglutti and Tony (eds.), Proceedings 8th International Electronic Theses and Dissertations Symposium, Sydney, Australia, 2005. (Thanks to Klaus Graf.)
Abstract: The fundamental proposition of this paper is to argue that ETD collections should be unified with institutional open access repositories, where they sit alongside other forms of research output. This change from separated collections confers the following benefits: [1] Better accessibility and searchability, leading to greater impact and citation rates, [2] Better archival processes, [3] Value can be returned directly to the thesis authors to help in their research planning, [4] Easier provision of bibliometric information about access, [5] Lower operating costs through use of one software system, one data store, and reduced training, [6] Redundancy in Internet access paths. The paper examines each of these claims and substantiates them. The practice of separate ETD collections may have arisen from paper-based practice, and the paper argues that this should be challenged. An actual implementation of a unified ETD collection is also discussed, with actual data on its performance. Software was written that allowed the Australian Digital Theses Program (ADTP) to harvest research theses from a unified institutional repository running GNU Eprints. This software was tested with ADTP, and will be used by the University of Tasmania. The University of Melbourne has already adopted the software and serves up theses to ADTP from its own Eprints repository. The software is available to any university under a GNU open source license. This implementation has allowed the measurement of accesses and a range of useful data, which are available to the thesis author. Examples of this type of data and the information that can be derived from it will be discussed. Since some theses are available on the Internet via ADT, the UTas repository, and the ARROW Discovery Service, comparative information on searchability can be demonstrated.

Tuesday, October 04, 2005

Peer-reviewed OA books and journals built on blogging software

Andrew Doan, Melding Traditional Publishing Models with Innovation, MedRounds Blog, October 3, 2005. Excerpt:
What do you get when combining elements of traditional publishing and Internet innovation? [1] Professional Copy Editing, [2] Academic Editors, [3] Expert Authors, [4] Google’s Blogger Technology. The products produced are peer-reviewed academic journals and medical textbooks that are published with Blogger technology. The educational material is open-access and free to the 6.5 billion readers in the world. We refer to it as "academic publishing with Blogger". This is exactly what MedRounds has done, and MedRounds will continue to produce high-quality educational materials utilizing Internet technologies to meld text with sound and video.

MedRounds then announces the release of an OA textbook, Cataract Surgery for Greenhorns, and an OA journal, The Journal of Ocular Pathology --both peer-reviewed and both based on Blogger software. The journal charges no author-side fees.

(PS: If you're surprised to see blogging software turned to these purposes, have a look. You'll be more surprised to see that the fit is elegant and natural.)

More on the OCA and Google Library

Andrew Orlowski, Yahoo! follows Google into print minefield, The Register, October 4, 2005. Excerpt:
Unlike Google, Yahoo! has set off into the book scanning minefield without detonating any explosions....What [authors and publishers aggrieved with Google] really mean is that Google has got stuff, if not for free, then at a bargain price. Libraries had to pay for licenses or physical material: Google only pays for the scanning - which is an extremely good deal for Google. As Seth Finkelstein reminds us: "Consider that this is not Google contributing to culture. It's Google trying to supplant the publishers as the middleman business between authors and readers," he wrote. So what at first looks like a copyright issue on closer examination is really a compensation issue. Just as we've seen with music. There is no copyright crisis.

Librarians already make material available through computer networks, and so see computer networks as a good thing. Authors and publishers have no objection to their works being available digitally - to a wider audience - so long as it doesn't mean the wholesale destruction of value. There are few problems of this nature that can't be solved by the happy sound of coins rolling across a table. So at some stage the new would-be gatekeepers such as Google and Yahoo! will do the decent thing and pay for licenses to use the content, along with a coherent compensation model for the material. A digital license for all of us to use this material, would of course be better value - and avoid the risk of a Google tomorrow slipping into the role of a Time Warner today.

The case for allowing commercial use of OA content

Erik Möller, Are Creative Commons-NC Licenses Harmful? Podcasting News, October 4, 2005. Excerpt:
When the Creative Commons project published its first licenses in December 2002, it finally brought a sense of unity behind the free content movement....One particular licensing option, however, is a growing problem for the free content community. It is the allow non-commercial use only (-NC) option. The "non-commercial use only" variants of the Creative Commons licenses are non-free, and in some ways worse than traditional copyright law -- because it can be harder to move away from them once people have made the choice. There may be circumstances where -NC is the only (and therefore best) available option, but that number of circumstances should decrease as the business models around free content evolve. The key problems with -NC licenses are as follows: [1] They make your work incompatible with a growing body of free content, even if you do want to allow derivative works or combinations. [2] They may rule out other basic uses which you want to allow. [3] They support current, near-infinite copyright terms. [4] They are unlikely to increase the potential profit from your work, and a share-alike license serves the goal to protect your work from exploitation equally well.....As we will see, there are many desirable commercial uses....[I]f you choose an -NC license, your work will not be compatible with Wikipedia, Wikinews, Wikibooks, and similar free content projects. One reason for this is that licenses like Wikipedia's, the GNU Free Documentation License, work according to the copyleft (or..."share-alike") principle: You can make derivative works, but they have to be licensed under the same terms. You cannot make a derivative work through addition of -NC content, as you can no longer apply the (more liberal) "share-alike" license to the entire work. Even where the license allows it, marking up regions of content as non-commercial and consistently following these boundaries is almost impossible in a collaborative environment....The use of an -NC license is very rarely justifiable on economic or ideological grounds....Finally, if you must use such a license for one reason or another, please do add an additional notice specifying the term of copyright protection you desire for your work. Otherwise, traditional copyright law will apply, and commercial use will be forbidden long beyond your death.

U of California's participation in the OCA

Julie Strack, UC Will Put Vast Collection of American Literature Online, The Daily Californian, October 4, 2005. Excerpt:
UC has inked a deal to join a partnership of universities, technology companies and other organizations in the creation of an online, multimedia resource archive, offering the university's prized American literature collection. UC libraries will build a collection of thousands of out-of-copyright works, including work by Mark Twain and Jack London, to contribute to the effort. The collection will cover works written from the 1800s until the 1920s and will be composed entirely of books within the UC libraries. With the digital support of Yahoo Inc., which will provide its search technology to the project, the materials are scheduled to be made available beginning in the spring of 2006 on the Open Content Alliance Web site, the global consortium building the archive....The literature will be available for download free of charge, opening the door to convenient public access to the historical documents. "It is part of the mission of UC libraries to make knowledge available to the people of California and to the world at large," said Daniel Greenstein, university librarian and director of UC's California Digital Library, a digital library of resources including scholarly texts, images and manuscripts. "The inclusive approach of this program emphasizes open access to libraries at the University of California."...Ultimately, digitizing collections is the next step toward fulfilling the mission of archiving and providing access to information, and does not detract from the original print editions of materials, Greenstein said. "Creating online resources is another role that libraries will take on. The Berkeley libraries are more than just a place to access information-they house many cultural artifacts whose value does not diminish by creating online sources of information," Greenstein said. "Online publishing is just another way Berkeley can serve the university and the public."

Also see the U of California press release (October 3) about its OCA participation.

It's not all about Google

The press coverage of the EU's i2010 Digital Libraries project (launched September 30) and the Open Content Alliance (launched October 3) has triggered this brief rant from Gary Price on SearchDay:
I completely realize (aka not naive) that Google is Google and whatever they do gets most of the press attention. The word Google gets people's attention. However, digitization programs have been going on for years, long before Google was even around. Heck, Project Gutenberg was around before Larry and Sergey were even born. Why does the press seem to believe that every other project must "rival" Google Library (I know it sells papers, gets clicks, I'm not naive). Yes, Google Library is a massive and important undertaking but turning into a contest or war seems to make little sense. Other digitization programs (some large and some very small) working to digitize important materials are also crucial. Let the content, the quality of the digitization, the ease of access, be what really matters.

Google, Wikipedia, and plagiarism changing the culture of knowledge

Stefan Weber, Kommen nach den "science wars" die "reference wars"? Telepolis, September 29, 2005. How online plagiarism and the "Google-Wikipedia monopoly" are changing the culture of knowledge (in German). (Thanks to medinfo.)

Update on Google Scholar

Lois E. Smith, Google Scholar Open House, SSP News and Views, October 2, 2005. Excerpt:
On September 14, Google held the first Google Scholar Open House at corporate headquarters in Mountain View, California. About 50 people attended, representing large and small commercial and noncommercial publishers, university presses, and other providers of scholarly publishing content. The purpose of the open house, according to Google engineer Anurag Acharya, is to "start a conversation" between Google and content providers and to share information about the status of, changes to, and near-term plans for Google Scholar. The morning featured presentations by John Sack (Highwire), Mark Doyle (American Physical Society), Peter Binfield (Sage Publications), Gordon Tibbitts (Blackwell), and Ted Freeman (Allen Press). Speakers reported on significant increases in traffic to journal content after Google began crawling it. Google referrals have vastly exceeded those from any other search engines, they reported. Most of the presenters have seen increasing numbers of referrals from Google Scholar in the 10 months since it launched, though not nearly as many as from - at least not yet. Many of the morning presenters talked about challenges Google Scholar poses to publishers. These include lack of publisher branding within search results; the presence of myriad versions of the same work, some of which is free of charge versus by subscription or pay per view; and perceived bias toward more highly cited works in ranked search results because of the way Google Scholar's algorithm works. Some of these challenges also present opportunities for publishers, a few speakers noted. By working closely with Google, branding could improve, for example, making it more apparent to users which is the publisher's article of record. Doyle (APS) said one of the terms of its collaborative agreement with Google is that the APS-published article will appear first in the search results. Binfield (Sage) remarked that the combination of author self-archiving, institutional repositories, and Google Scholar poses a threat to non-open access publishers. He said it is in the interest of publishers to work with Google to increase the use of paid-for content....Google has automated the citation extraction process, though issues such as wide variation in citation styles and the propagation of erroneous citing make this challenging. Google attempts to normalize citations and facilitate ranking by grouping different versions of the same work. Google Scholar has grown by 66% in the last six months, Acharya reported, though he did not say how large the index is. Roughly, its coverage by category (in order of size) is 22% medicine, 14% engineering, 13% biology, 13% sociology, 12% physics, 7% chemistry, and 5% business. Query traffic has increased by 200% in the last six months, with the largest source countries being the United States (50%), the United Kingdom, Australia, and Germany....Recent changes in Google Scholar include coverage of institutional repositories (about 325 libraries).

The article concludes with a very useful summary of the Q&A. If you are a publisher with a question about Google Scholar, your question is probably on the list.

How much will Google Print help users?

Barbara Quint, Apology, Searcher, October 4, 2005. Excerpt:
I somehow got the false impression that Google was transmitting electronic copies of the books it was digitizing back to publishers participating in the Google Print for Publishers program....Well, back up the truck. Google does not give publishers digital copies of their books. The copies Google gives to the participating "G5" libraries are TIFF or JPEG files containing images of every page, not complete books in a convenient format such as PDF. As for the public domain books, which Google does allow readers to see cover to cover, all reading must be done while connected to Google. So the question arises: With the exception of public domain, e.g., pre-1920s books, how does Google Print contribute to the distribution of book literature? Insofar as a user finds an in-print book from a Google Print publisher, Google will provide links to online booksellers and publisher Web sites. But most of the books on library shelves are out of print, especially those taken from giant research libraries. Those online booksellers may help you find used copies and a connection to the OCLC Open WorldCat "Find in a Library" service could help too. But digitizing millions of out-of-print books might end up swamping the retrieval of in-print books that have a good chance of delivery. In any case, the Google Print delivery routes offend all three of the Web's iron laws of user-friendliness: They're not free; they're not fast; and they're not online. Add one more depressing note: They're not reliable....Who would think that one would ever have to prod Google into broadening its vision? Yet here it is....The only way to make the Google Print project work for publishers, libraries, authors, and, most importantly, the Web users of the world, is to guarantee that what people find online, they can fetch online. Delivery is the key. Otherwise, it could end up worse than when it started. End users searching for the books Google Print presents to them will find traditional sources — publishers and librarians — rejecting their requests. The matchless collections of the "G5" libraries are called matchless exactly because they have what others do not. OCLC's Open WorldCat will do the best it can, most cases Marian/Marion the Librarian will not help. One of the new offers to Google Print publishers made in August allows publishers to register the books they expect Google to find on the G5 library shelves and, when searchers find the books, connect users to publisher Web sites. Gosh, thanks! So when all those out-of-print book requests come in, the publishers get to tell users to go shinny up a pole. Are we having fun yet? I know it's the early days for the massive Google Print project, but it's never too early to do it right. Come on, Google. Give publishers and copyright holders e-books they can deliver. Change the world ... again. I'll gladly write an apology for being wrong about being wrong, if only you make it right.

New ATA members

Profile of AnthroSource

Susan Skomal, Transformation of a Scholarly Society Publishing Program, ARL Bimonthly Report 242, October 2005. A detailed profile of AnthroSource from the American Anthropological Association. Because access is limited to dues-paying AAA members, AnthroSource is not OA, but it provides inexpensive digital access to a wide range of scholarly resources in anthropology.

Second ALPSP publisher survey

ALPSP has launched its second Publisher Survey Questionnaire. Responses are due by October 15. John Cox and Laura Cox will collect and analyze the second survey data, just as they did for the ALPSP's first publisher survey in 2003.

Final version of Kaufman-Wills report now available

Cara Kaufman and Alma Wills, The Facts About Open Access, ALPSP, October 4, 2005. This is the final version of the "study of the financial and non-financial effects of alternative business models for scholarly journals," sponsored by the ALPSP, AAAS, and Highwire, and undertaken by the Kaufman-Wills Group. ALPSP is charging £55 for members and £95 for non-members. From the web site:
Discussion of Open Access tends to be strong on rhetoric but short on facts. The objective of this independent study was to determine the impact of open access on scholarly journals’ financial and non-financial factors and to establish a substantial body of data about different forms of Open Access publishing, and a baseline of comparison with traditional subscription publishing. In the first phase of the study, the researchers surveyed 495 journals from four groups: ALPSP member journals (128), AAMC member journals (34), a subset of journals hosted by HW (85) and 248 journals from the Directory of Open Access Journals (DOAJ). The survey consisted of 33 closed-ended and 5 open-ended questions and addressed the following major categories: [1] Demographic: Including type of publisher, location of publishing offices, subject area, type of content published, [2] Financial: Including revenue models, sources of financial support, percentage of total each revenue type represents, revenue trends and expectations, current surplus or deficit, [3] Non-financial: Including print format, copyediting policy, number of internal/external peer reviews, services offered to Authors, copyright and permissions policies, pre/post-publishing rights of authors. The open-ended questions asked for the respondent’s thoughts on the challenges and opportunities presented by open access, as well as the movement’s impact on their own journal or journals and all of scholarly publishing. In the second phase of the study in-depth interviews were conducted with 22 scholarly journal publishers of all types and sizes, representing more than 4,000 journals. The survey, completed in 2005, covers the full-spectrum of business models being used in scholarly publishing – from traditional access provided primarily via subscriptions (Subscription Access) through Delayed Open Access to Optional (author-side payment) and Full Open Access.

For those who can't afford the full, final version, see the first 32 pages of the full report, which are OA; Cara Kaufman's presentation of an earlier version (London Book Fair, 3/14/05); or Lila Guterman's summary (Chronicle of Higher Education, 3/15/05).

Update. Although the final version of the study was first posted to the ALPSP site on October 4, it was not officially announced until October 11. Here's the 10/11 press release.

Update. Although on October 4, the full report was available only to paying members or customers, it's now OA.

Monday, October 03, 2005

Network of OA databases

Consortium to improve access to molecular interaction data, Scientific Computing Newsline, September 2005. An unsigned news story. Excerpt:
The management teams of five major molecular interaction databases have signed an agreement to share curation and to exchange completed records in order to provide researchers with a network of stable, synchronised, and freely accessible databases. The activity will be managed through a consortium called the International Molecular Exchange (IMEx). IMEx will also act as a conduit for jointly capturing all published molecular interaction data - in a manner similar to the global collaborations for protein and DNA sequences. Databases participating in the consortium as of August 2005 are: [1] BIND, run by the Blueprint Initiative (in Singapore and Canada); [2] DIP, from the UCLA-DOE Institute for Genomics & Proteomics, Los Angeles, US; [3] IntAct, from the European Molecular Biology Laboratory at the European Bioinformatics Institute, UK; [4] MINT, run by the University of Rome, Italy; and [5] MPact, at MIPS/Institute for Bioinformatics in Munich, Germany. More are expected to join the consortium soon....Data exchange is expected to start in 2006.

Australian ETD presentations

The presentations from the 8th International Symposium on Electronic Theses & Dissertations (Sydney, September 28-30, 2005), are now online. (Thanks to Arthur Sale.)

October issue of First Monday

The October issue of First Monday is now online. Here are the OA-related articles.

More on OCA

Barbara Quint, Open Content Alliance Rises to the Challenge of Google Print, Information Today, October 3, 2005. Excerpt:
What a great idea!...The goal of the effort is to establish a flexible, open infrastructure for bringing large collections of digitized material into the open Web. Permanently archived digital content, which is selected for its value by librarians, should offer a new model for collaborative library collection building, according to one OCA member. While openness will characterize content in the program, the OCA will also adhere to protection of the rights of copyright holders....Even though Yahoo! Search has taken a leading role with the OCA, the fundamental principle behind the program is open accessibility. As material comes online, all search engines --and yes, that does include Google-- will have access to the repository....Experience has shown that the most stringent barriers to digitization often lie in the bureaucratic politics and complex legalities. The Open Content Alliance hopes to work through these problems and, according to Kahle, "establish mechanisms for sharing while meeting each institution’s responsibility in opening content." Kahle described the organization’s goals. "In essence, we want to get the rules right, to enable libraries to work with commercial sources, governments, etc., without having to hammer out separate agreements."...Initially, according to Kahle, the OCA content will be completely open access; it will be available to all, with no password required. The OCA may carry notices on specific requirements due to Creative Commons licensing, but it will not police compliance....When asked what will distinguish the OCA material from Internet Archive’s existing archives, e.g., the snapshots of the Web in the Wayback Machine, Kahle said that the "Open Content Alliance will be more library-like, as opposed to an archive. Content will be more curated, more vetted by library staff. The OCA is trying to kick off with an end-user focus, as opposed to where the collections come from, but how it will evolve, we don’t know yet."...I interviewed Carole Moore, chief librarian at the University of Toronto, and Daniel Greenstein, associate vice president and university librarian at the California Digital Library. Interestingly, both saluted Google Print for getting the ball rolling. Discussing mass digitization projects, Greenstein said: "Google kicked us into gear. They woke us up." Moore said: "It is an idea whose time has come. Before, when it came to digitizing books, the world was not ready, but the world has changed. Google can’t do it all. Other people have to contribute."...For Greenstein, probably the single most promising factor was that he now sees librarians tapping into collection budgets to fund digitization projects. Instead of treating digitization as an extra service that would probably be funded by grant money, librarians have begun seeing digitization and sharing with other institutions—and the world through the open Web—as a form of collaborative collection building.

More on Authors Guild v. Google

Barbara Quint, The Other Shoe Drops: Google Print Sued for Copyright Violation, Information Today, October 3, 2005. Excerpt:
"Does Google Library violate copyright?" by Peter Suber is the best article I have seen for rounding up coverage of the controversy. It was published in the SPARC Open Access Newsletter, issue 90. Though the title of the newsletter alone tips readers off to Suber’s general position on the issue even before they read his clearly pro-Google Print analysis, his sourcing is impeccable....I interviewed Paul Aiken, executive director of the Authors Guild. He said that the same board that unanimously agreed to sue Google also unanimously agreed that the Google Print goal was "a grand idea." Many of his remarks focused not on Google stopping its effort, but on the need for licensing and licenses that would accommodate most or all authors. He considered a favorable decision in their case as a step toward ensuring that licensing would go forward. Perhaps Aiken should be negotiating with the Open Content Alliance.

PLoS Pathogens added to PubMed Central

PLoS Pathogens, like its four predecessor Open Access journals from the Public Library of Science (PLoS), is now mirrored by PubMed Central.

PLoS Pathogens - Fulltext v1+ (2005+) PLoS | PubMed Central; Print ISSN: 1553-7366 | Online ISSN: 1553-7374.

More on the OCA

Elinor Mills, Yahoo to digitize public domain books,, October 2, 2005. Excerpt:
"If we get this right so enough people want to participate in droves, we can have an interoperable, circulating library that is not only searchable on Yahoo but other search engines and downloadable on handhelds, even iPods," said Brewster Kahle, founder of the Internet Archive. The project, to be run by the newly formed Open Content Alliance (OCA), was designed to skirt copyright concerns that have plagued Google's Print Library Project since it was begun last year....The University of California's 10 campus libraries have about 33 million volumes, of which an estimated 15 percent are in the public domain, said Daniel Greenstein, associate vice provost and University Librarian of the California Digital Library. Greenstein said that contrary to publisher concerns that people will choose not to buy books if they can read or download them free online, the ability to easily find books on the Internet will broaden the public's exposure to them and is likely to increase, not decrease, sales. "There is good evidence to suggest that if people see (that a book) is (out) there, they will buy it. Print sales either increase or are unchanged," he said. "We haven't once seen data to suggest that open access, at least to published printed works, decreases sales."...By exposing more people to scholarly works, the OCA project could contribute to improved research and help reverse the trend among publishers of cutting back the number and print runs of books, said Lawrence Pitts, chairman of the University of California Academic Counsel Special Committee on Scholarly Communication. Rising prices on books from academic publishers has meant fewer purchases by universities, he said. For example, academic presses that used to print 12,000 copies of a book a few years ago are now printing as few as 250 copies, he said. "It is a terrible problem in the liberal arts, in particular, of getting a first book published, and that is often the ticket to being hired by a good university and getting tenure," Pitts said. "Data show that if you can put the material in an open access arena, the mention of the work doubles or quadruples because people out there in the world can find it better."

The OCA is appealing to publishers and other libraries, universities and archives worldwide to offer materials as well. "This is an international effort, not just domestic," said Dave Mandelbrot, Yahoo's vice president of search content. For example, "we would be very eager to integrate French content into the Open Content Alliance and are working with people in France to make that happen."...The OCA effort was applauded by publisher and author groups who have been critical of Google's effort, including the Association of Learned and professional Society Publishers, the Text and Academic Authors Association, or TAAA, and the Authors Guild. "It is a wonderful idea. It does all the good things that the Google project was represented as doing, but it respects the copyright," said Richard Hull, executive director of the TAAA. "Sounds fine, but we would want to see the details, of course," said Paul Aiken, executive director of the Authors Guild. "We have absolutely no problem with digitization of public domain works. With copyright works, we want to make sure the people who actually have the rights are the ones granting the licenses. In most cases it would be the authors."

More on the OCA

Gary Price, A New Digital Library Alliance Makes its Debut, Search Engine Watch, October 3, 2005. Excerpt:
The OCA project differs from other digitization projects in that the database of scanned material will be available for anyone to use on any site. Yes, it's an open access database! You could even create a focused database (let's say one on American literature) and use it on your own web site. Without getting into legal "what if's," most of the material in the OCA will be available as full text. There are no limits on how much you can view or download for offline viewing or printing. Kahle said that in some cases you can find content via the Open Content Alliance, print it, and slap a cover on it. Sort of a, "make your own book" type of thing....The press release also contains a quote from Sally Morris, Chief Executive of the Association of Learned and Professional Society Publishers (ALPSP) [PS: and a critic of the Google Library project]...."We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners....Many publishers already make some of their book and journal content freely available online, and the OCA's model of allowing rights holders to control which of their works are opened up, when, and where they are hosted may encourage others to do so."...I hope other organizations decide to join the OCA (it will be interesting to see how the opt-in approach is received) and that any legal issues that arise get resolved quickly. Of course, as a searcher, I have to be concerned with the already large Yahoo database getting even larger without people having the skills or the time to get what they need. I think the idea of allowing people to mine OCA content and create their own database is an exciting one. It's easy to think how something like this could be of value to the college academic or even a elementary school teacher. Of course, the OCA will have to make it easy for disparate groups to create their own tools, but this will be a business opportunity for those who can help create these types of tools.

Tara Calishain on OCA

Tara Calishain, Yahoo Announces Open Content Alliance, ResearchBuzz, October 3, 2005. Excerpt:
My first question was "What about all the public domain stuff that's already out there?" The home economics archive from Cornell? The Oak Knoll Press collection? The Math Book collection? And probably lots of other ones I don't know about? [Yahoo VP] David Mandelbrot said, "Part of this announcement is to encourage other organizations to participate in the OCA's efforts. We're trying to make a large collection available. We want to a variety of content and make it available in a way that's easily spiderable by search engines." Part of that effort is going to educating archivists and other collection-keepers about digitizing and making content available online in this more open, visible way. All kinds of content? I asked. Yes, they said. Even corporate archives? Yes, though the emphasis would be on culturally-important corporate archives, commercials and so forth. Even multimedia? Absolutely. Even newspapers and periodicals? Yes. Even small and self-publishers, like those who might use the services of Yes, said Mandelbrot. "We want to work with all kinds of commercial publishers; initially we're working with O'Reilly Media. But this project is open to all kinds of commercial publishers who want to participate, both small and large."...There WILL be a separate page on Yahoo to search the subset of content generated by the OCA. It's too early to tell how much you'll be able to break it down, whether you'll be able to create sets of books, etc. It depends on how much is indexed and what kind of material is indexed. There is no goal in that regard. "We don't have a specific number goal; we're focused on quality," said Mandelbrot. "Our general goal is to make this the richest library of cultural materials available online." It seems like some people are seeing this announcement as a slap at Google and stopping there, but I think there's much more to it. While there's no denying that the Search Engine Wars are back (whoopee!) and Yahoo and Google are not for an instant going to forget each other's existence, and the press release does exude the faintest whiff of Eau De Meowwwww, those are all small facets. This project is breathtaking in its scope. No matter what happens with the Google Print project, the OCA is building something complementary that could be tremendously useful without, perhaps, detracting from Google Print at all.

More on the OCA

Scott Carlson and Jeffrey Young, Yahoo Works With 2 Academic Libraries and Other Archives on Project to Digitize Collections, Chronicle of Higher Education, October 3, 2005 (accessible only to subscribers). Excerpt:
Leaders of the project stressed that no books that are under copyright will be scanned unless the copyright holders give explicit permission. In that way the project hopes to avoid the controversy raised by Google's plan to scan nearly every book at the library of the University of Michigan at Ann Arbor, even works under copyright. Publishers' and authors' groups have said that Google must obtain permission before scanning copyrighted books, even if it offers only short excerpts of their content, as it plans to do. In fact, one publishing group that has been critical of Google's project, the Association of Learned and Professional Society Publishers, has endorsed the Yahoo plan. In a press release, Sally Morris, chief executive of the association, said, "We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners." That plan means the Open Content Alliance will be limited mostly to out-of-copyright works -- and to works by publishers who are willing to experiment with giving their content away online. The project will allow generous access to the materials it holds, however, in some cases even allowing users to download the full texts of books. Neither Yahoo nor any other group involved has been given exclusive rights to the content, according to the project's leaders. In fact, the books will be made available in ways that can be searched by other search engines, David Mandelbrot, Yahoo's vice president for search content, said in an interview Friday. The project is modeled on open-source software projects, in which volunteers extend and improve free software. "Open source was a fantastic success; they figured it out," Mr. Kahle said in an interview on Sunday. He hopes the Open Content Alliance "can do the same for open content. We would like to see the great wealth of our libraries get made much more available, where everybody is psyched and everybody knows their place and part....This is a stab at what different organizations should do and what if any restrictions should be made on what is out there." Daniel Greenstein, executive director of the California Digital Library, a project of the University of California system, said, "The focus of this thing is really open access."...Mr. Greenstein said..."One meaningful service for a library community is to build something which enables the libraries to identify instantly what's in there and what's not in there," and then add to the collection...."One of the interests of the group is exploring ways to get people to upload materials directly to the archive," he said...."We're trying to nail bringing public access to the public domain," said Mr. Kahle. "We want people to be able to do great things with the classics of humankind."

Dental journal offers OA to back run

Anthony J. Smith, Research Throughthe Years —the Foundation for Dentistry Today, Journal of Dental Research, 84, 10 (2005) p. 870. Excerpt:
Increasingly, researchers are dependent on electronic searching and accessing of the scientific literature. With the immense volume of literature now available, this technology provides us with powerful tools to "interrogate" the literature....effectively. However, the recent move to online publication of journals has meant that many of the archives of different journals are not readily available in electronic format, thus constraining the opportunities for this technology to be exploited. Recognizing these constraints, the IADR set up a "Legacy" task group to drive the digitization of the entire archive of the JDR, from its inception in 1919. This task has been facilitated by a grant from the Gies Foundation and reflects the continuing influence of William J. Gies on dental research, even today. The care and diligence of the Legacy task group and staff in the IADR Publications Department have allowed us to digitize the entire JDR archive, and this is now available for access by researchers. Importantly, our concordance with the DC Principles of Open Access means that this archive is available for free and open access to all. Current content will be released from access control 12 months after publication.

Launch of the Open Content Alliance

The Open Content Alliance (OCA) is a new coalition of profit and non-profit organizations administered by the Internet Archive and devoted to building "a permanent archive of multilingual digitized text and multimedia content." From the web site:
Content in the OCA archive will be accessible soon through this website and through Yahoo! The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the content owners and contributors. Contributors to the OCA must agree to the principles set forth in the Call for Participation.

From the call for participation:

Please join an Open Content Alliance (OCA) made up of cultural, technology, nonprofit, and governmental organizations from around the world, which will offer broad, public access to a rich panorama of world culture by building a permanent archive of multilingual digitized text and multimedia content. By creating a growing archive of digital materials, the OCA will serve the combined interests of its contributors and the global community of Internet users. Contributors will donate collections, services, facilities, tools and/or funding to the OCA. The contributing organizations support the following principles: [1] The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors. [2] Contributors will determine the terms and conditions under which their collections are distributed and how attribution should be made. [3] The OCA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible. [4] The OCA will offer collection and item-level metadata of its hosted collections in a variety of formats. [5] The OCA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive. [6] Copies of the OCA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all.

From Brewster Kahle's introduction to the OCA on the Yahoo blog:

[I]t is time to have more great material available on the Internet and to be able to have it be open and free. The opportunity before all of us is living up to the dream of the Library of Alexandria and then taking it a step further-- Universal access to all knowledge. Interestingly, it is now technically doable. Then the question became-- is it in the interest of enough people and institutions to get there? Some hang-ups have been around costs, rights, and guidelines for sharing. All of these things were worked out for their domains by Internet folks and open source folks in the last few decades. But how are we going build a system that has everything available to everyone?...To kick this off, Internet Archive will host the material and sometimes helps with digitization, Yahoo will index the content and is also funding the digitization of an initial corpus of American literature collection that the University of California system is selecting, Adobe and HP are helping with the processing software, University of Toronto and O'Reilly are adding books, Prelinger Archives and the National Archives of the UK are adding movies, etc. We hope to add more institutions and fine tune the principles of working together. Initial digitized material will be available by the end of the year. So the costs are mostly being borne by the host institutions based on their own fundraising or business models. The cost of digitization is sometimes offset by a different party (in the case of American Lit-- Yahoo!). We think this can scale to millions of books movies and audio recordings....To be clear, the public domain works in the Open Content Alliance can be "borrowed" in bulk for build navigation services, do research on, and the like. Bits and pieces of the public domain collections can be re-used and re-interpreted. If someone wants to print and binding a book and sell it on go nuts, if they want to make it into an audio book and post it on the web-- go for it (we will even supply the hosting for this), basically let’s have a blast building on the classics of humankind.

Also see Katie Hefner's story about the OCA in today's NYTimes, In Challenge to Google, Yahoo Will Scan Books. Excerpt:

An unusual alliance of corporations, nonprofit groups and universities plans to announce today an ambitious plan to digitize hundreds of thousands of books over the next several years and put them on the Internet, with the full text accessible to anyone. The effort is being led by Yahoo, which appears to be taking direct aim at a similar project announced by its archrival, Google, whose own program to create searchable digital copies of entire collections at leading research libraries has run into a series of challenges since it was announced nine months ago. The new project, called the Open Content Alliance, has the wide-ranging goal of digitizing historical works of fiction along with specialized technical papers. In addition to Yahoo, its members include the Internet Archive, the University of California, and the University of Toronto, as well as the National Archive in England and others....[T]he potential power of the new collaboration lies in the collective ability of many institutions to compare and cross-reference materials, said Daniel Greenstein, librarian for the California Digital Library at the University of California....In a departure from Google's approach, the Open Content Alliance will also make the books accessible to any search engine, including Google's. (Under Google's program, a digitized book would show up only through a Google search.) And by focusing at first on works that are in the public domain - such as thousands of volumes of early American fiction - the group is sidestepping the tricky question of copyright violation....When it comes to copyrighted materials, the newly formed group appears to be taking a more cautious approach by seeking permission from copyright holders and by making works available though a Creative Commons license, whereby the copyright holder stipulates how a work can be used. "Other projects talk about snippets," said Brewster Kahle, the founder of the Internet Archive, a nonprofit organization in San Francisco that is building a vast digital library. "We don't talk about snippets. We talk about books." Dr. Greenstein said that the University of California, which plans to contribute as much as $500,000 to the project in the first year, will scan 5,000 volumes of early American fiction at the outset, with the eventual goal of scanning another 5,000 to 15,000 volumes within the next year....Yahoo did not disclose the overall budget for the project, although its own contribution has been estimated at between $300,000 and $500,000 for the first year. Hewlett-Packard and Adobe Systems are contributing equipment to the project, and the Internet Archive will do the actual digitizing and archiving of the books. The Internet Archive has set up shop at the University of Toronto and has scanned some 2,000 volumes at a cost of about 10 cents a page....The new group is calling for others to join. And Mr. Kahle of the Internet Archive said he hoped to recruit Google. "The thing I want to have happen out of all this is have Google join in," he said. "I know we're dealing with archcompetitors, but if there's room for these guys to bend, by the time my kid goes to college, we could have a library system that is just astonishing."

Also see the OCA press release and news stories in the Washington Post, Associated Press, Search Engine Watch, RedNova, and PaidContent.

An OA register for animal names

Andrew Polaszek, A universal register for animal names, Nature, September 22, 2005. (Thanks to Donat Agosti.) Excerpt:
How can we maintain and continue to benefit from our planet’s biodiversity? A first step, the effective exchange of information about biodiversity, needs an efficient and stable means of naming species. For animals, this is achieved with the Linnaean system of binominal nomenclature, introduced in 1758, and a comprehensive set of rules administered by the International Commission on Zoological Nomenclature (ICZN). Although the Linnaean system and the ICZN code have been hugely successful, they are often perceived as failing to meet the needs of today’s biologists. To meet these new demands, we propose the creation of a mandatory web register for all new animal names, and the subsequent inclusion of all existing animal names and nomenclature in a single information system....We propose a register of new zoological names — ZooBank — to be established and administered by the ICZN, and bolstered by a mandatory requirement, in the next edition of the code, for the registration of new names. The register would be web-based and open-access, and would cover all taxonomic ranks relevant to the code.

Also see the technical paper which the Nature comment above merely summarizes.

Sunday, October 02, 2005

New OA journal on digital libraries

Digital Watch is a new peer-reviewed, open-access journal on digital libraries and contemporary culture. The inaugural issue is now online. DW is published by the University of Pittsburgh School of Information Sciences and edited by graduated students in the program. (Thanks to LIS News.)

Businesses recognizing costs of IP extremism

James Kanter, The Idea Economy: Battle over right to sell knowledge, International Herald Tribune, October 2, 2005. (Thanks to Manon Ress.) Excerpt:
Ideas that are free, widely available and instantly duplicated were impossible to contemplate in the days when copyright and patent law took root, a time when the expenses needed to print, distribute and sell a book or movie were considerable. Now, the information, entertainment and technology industries say they lose billions of sales to the free exchange of ideas. Incremental advances are stalled by endless lawsuits over inventions. Drug companies are on the defensive when they refuse to share their original research....The battles pit companies against companies, creators against distributors, almost everyone against the United States — and, some say, China against the rest of the world. ''This is warfare,'' said Jerry Klein, a Silicon Valley entrepreneur. ''It's a high-stakes intellectual battle, and it's very complicated.'' Companies, even those the size of Intel, could one day be blocked from marketing a particular product whose design is made up of hundreds of thousands of patents just because an opportunist has claimed ownership of a single patent, said Adam Jaffe, dean of arts and sciences at Brandeis University in Massachusetts and a patent expert. Some intellectuals say that the more such rights are expanded, the less good the public reaps, a benefit that government's protection of innovation once intended. And now some companies are starting to agree, arguing that the race for rights and royalties can actually harm competition. "In certain cases," said Elsa Lion, an analyst at the London research firm Ovum, "technology companies are beginning to realize they have more to gain by releasing patents to the general public than by hoarding licensing income." By giving away some of their knowledge, companies like IBM and Nokia are not just polishing their image among the Internet generation. They also questioning a business strategy that has become a bedrock of contemporary capitalism: Whoever has the most patents wins.

October issue of SOAN

I just mailed the October issue of the SPARC Open Access Newsletter. This issue takes a close look at the OA policy from the Wellcome Trust, which took effect yesterday, and the Authors Guild lawsuit against the Google Library project. The Top Stories section takes a brief look at the new OECD report on scientific publishing, the CODATA proposal for a Global Information Commons for Science, Stevan Harnad's calculation of the monetary cost of lost citation impact, the new AZoM OA journal with a patented business model, the two Salvador declarations, and the continuing news and comment on the draft OA policy from RCUK.

More on authors v. Google

Mike Langberg, Google's libraries project facing writers' block, Mercury News, October 2, 2005. Excerpt:
The benefit to society in having a free, fully searchable collection of millions of books and journals is huge -- so huge that we should all root for Google to prevail. But the Mountain View search giant also needs to work harder at finding common ground with authors and book publishers who aren't happy about being Googled. It's the right thing to do. What's more, it might not be possible to complete the program with authors and publishers standing in the way....Google is getting value from the Library Project because the search pages would ultimately display ads along the side. Authors and publishers therefore deserve compensation, [Paul] Aiken [Executive Director of the Authors Guild] argues. I disagree, as do many legal experts. Copyright law supports so-called "transformative" efforts, where someone seeks to profit from fair use. A book review in this newspaper, after all, contributes to our goal of attracting readers who also look at our advertising.