Open Access News

News from the open access movement


Tuesday, August 21, 2007

More on Carl Malamud's campaign for OA to public domain information

Tim O’Reilly, Carl Malamud Takes on WestLaw, O’Reilly Radar, August 19, 2007.  Excerpt:

Carl Malamud has this funny idea that public domain information ought to be... well, public. He has a history of creating public access databases on the net when the provider of the data has failed to do so or has licensed its data only to a private company that provides it only for pay. His technique is to build a high-profile demonstration project with the intent of getting the actual holder of the public domain information (usually a government agency) to take over the job.

Carl's done this in the past with the SEC's Edgar database, with the Smithsonian, and with Congressional hearings. But now, he's set his eyes on the crown jewels of public data available for profit: the body of Federal case law that is the foundation of multi-billion dollar businesses such as WestLaw.

In a site that just went live tonight, Carl has begun publishing the full text of legal opinions, starting back in 1880, and outlined a process that will eventually lead to a full database of US Case law. Carl writes:

1. The short-term goal is the creation of an unencumbered full-text repository of the Federal Reporter, the Federal Supplement, and the Federal Appendix.
2. The medium-term goal is the creation of an unencumbered full-text repository of all state and federal cases and codes.

This is clearly public data, but as Carl wrote in a letter to West Publishing that accompanies the first data release on his site, asking for clarification about what information West considers proprietary versus public domain....

In private email, Carl wrote:

The SEC database was fairly straightforward, taking a couple of years of hard work. But, getting patents online took 5 years of drawing lines in the sand and sending shots across the bow. Our line in the sand here is all state and federal cases and codes, and I guess our shot across the bow is publishing a 3.6 gbyte tiff file and announcing our intention to systematically walk through the 5 million or so pages of federal case law.

That's a big challenge, but with computing power and storage getting ever cheaper, and with the dedication of volunteers like Carl, it does indeed seem like a possible project. (After all, when Carl pressured the SEC to put its Edgar database online in the early 90's, they said it would take years and millions of dollars. Carl did it in six weeks, and operated the database for two years before persuading the SEC to take it over.) ...

Update. Also see this comment by Susan Crawford:

Go, Carl.

Routing around traditional publishers who want to create friction (or barriers to entry) for online access to data isn't easy. This is the same extended tussle that ScienceCommons.org is engaged in. In the end, the gatekeepers should lose, particularly where the public benefits so far outweigh the private returns to the publishers. A cure for Parkinson's, made possible because scientists can easily share data across disease silos, or another royalty for Reed Elsevier? You be the judge.