Open Access News

News from the open access movement


Friday, May 09, 2008

OA meta-database

Infochimps.org is a database of databases. The site launched on March 5. From the site's description:

Exploring rich data is fun, but finding it, formatting it, and tagging it with metadata is drudge work barely fit for a trained chimp. And if you want to share a large dataset online, you face two troubling prospects: either a) that no one will find it, or b) that everyone will find it.

A central, community-driven repository solves these problems and presents amazing possibilities. Once we interconnect the datasets along concepts they share, instead of 100,000 datasets, there's just one. Study the physics of baseball by comparing the hourly weather during every single baseball game to game outcomes. Uncover political campaign irregularities by comparing neighborhood per-capita income, historical voter trends, and public campaign finance records. Plan real-estate decisions based on what news-and-other-media keywords rank highly in each area. Let's start building tools that make this way of thinking available to everychimp.

  • Open: No redistribution bureaucracy, no larcenous prices for government-generated data, no teaspoon-sized servings from a sumptuous buffet. Apart from prior restrictions attached by the data provider, everything produced by the infochimps.org project is and will remain open.
  • Descriptive: Not just numbers — fields describe the real-world concepts they embody. Why should ints and strings get all the glory? Your data should arrive knowing that it describes a 'location', or a 'money value', or a 'book', and that it has taken the form of a 'latitude-longitude pair', or '2004 US Dollars' or 'ISBN Number'.
  • Free: Free to download, free to share, free to use, free to redistribute. Just share, give credit where credit is due, and respect existing restrictions.
  • Universal: Stop parsing flat files. As infochimps like you help to convert and import well-structured files, we can bundle and serve them in universal formats like XML, YAML, JSON, and Excel- or SQL-ready CSV.
  • Verifiable: We list all contributors and sources. If you need to reach the origins - to verify information or to shower them with thanks - it's all right there.