Open Access News

News from the open access movement


Thursday, January 05, 2006

What OA to data will make possible

Declan Butler, Mashups mix data into global service, Nature, January 5, 2005 (accessible only to subscribers). Excerpt:
Will 2006 be the year of the mashup? Originally used to describe the mixing together of musical tracks, the term now refers to websites that weave data from different sources into a new service. They are becoming increasingly popular, especially for plotting data on maps, covering anything from cafés offering wireless Internet access to traffic conditions. And advocates say they could fundamentally change many areas of science — if researchers can be persuaded to share their data. Some disciplines already have software that allows data from different sources to be combined seamlessly. For example, a bioinformatician can get a gene sequence from the GenBank database, its homologues using the BLAST alignment service, and the resulting protein structures from the Swiss-Model site in one step. And an astronomer can automatically collate all available data for an object, taken by different telescopes at various wavelengths, into one place, rather than having to check each source individually. So far, only researchers with advanced programming skills, working in fields organized enough to have data online and tagged appropriately, have been able to do this. But simpler computer languages and tools are helping....The biodiversity community is one group working to develop such services. To demonstrate the principle, Roderic Page of the University of Glasgow, UK, built what he describes as a “toy” — a mashup called Ispecies.org. If you type in a species name it builds a web page for it showing sequence data from GenBank, literature from Google Scholar and photos from a Yahoo image search. If you could pool data from every museum or lab in the world, “you could do amazing things”, says Page. Donat Agosti of the Natural History Museum in Bern, Switzerland, is working on this. He is one of the driving forces behind AntBase and AntWeb, which bring together data on some 12,000 ant species. He hopes that, as well as searching, people will reuse the data to create phylogenetic trees or models of geographic distribution. This would provide the means for a realtime, worldwide collaboration of systematicists, says Norman Johnson, an entomologist at Ohio State University in Columbus. “It has the potential to fundamentally change and improve the way that basic systematic research is conducted.” A major limiting factor is the availability of data in formats that computers can manipulate. To develop AntWeb further, Agosti aims to convert 4,000 papers into machine-readable online descriptions. Another problem is the reluctance of many labs and agencies to share data. But this is changing. A spokesman for the Global Health Atlas from the World Health Organization (WHO), for example, a huge infectious-disease database, says there are plans to make access easier. The Global Biodiversity Information Facility (GBIF) has linked up more than 80 million records in nearly 600 databases in 31 countries. And last month saw the launch of the International Neuroinformatics Coordinating Facility....Page and Agosti hope that researchers will soon become more enthusiastic about sharing. “Once scientists see the value of freeing-up data, mashups will explode,” says Page.