Open Access News

News from the open access movement


Sunday, September 07, 2008

Nature on Big Data

The September 3 issue of Nature is a special issue on Big Data.

Community cleverness required, editorial. Excerpt:
... Researchers need to be obliged to document and manage their data with as much professionalism as they devote to their experiments. And they should receive greater support in this endeavour than they are afforded at present. Those publicly funded databases that have taken on preservation responsibilities, such as GenBank and UniProt, are only a small part of the data landscape. Universities and funding agencies need to provide and support curation facilities, tools and training.

As is amply highlighted in this issue, all of these worthy aims require incentives. These include pressure from, and recognition through, journals. ...
The next Google. Excerpt:
Esther Dyson: I'm on the board of 23andMe of Mountain View, California, which makes genetic information accessible to its owners — and lets them share it for research if they want to. ...

As hundreds of thousands, and eventually millions, of people take part, the genetic information collected will enable us to know so much more through data mining, combined with analysis of the interactions of genes and other factors. We'll be able to pre-empt many diseases and treat others better. In addition, I hope this technology will change people's behaviour and encourage them to eat better and exercise more, because they'll have a better understanding of the impact of their behaviour on their health. ...

Joi Ito: The next big thing will come from connecting people and ideas together with a Google-like simplicity — making Wikipedia, Facebook and all sorts of other things completely seamless. ...

I think that a key part to it will be software that automatically gives attribution for the various parts of content we access and share. People want to share content with each other, but the infrastructure and legal framework makes it more difficult than it should be. Legal friction is holding back a lot of creativity. If you have software that works out who owns what for you and gives credit where it is due, and if it can support all different kinds of content, then you start to have a network that enables a great deal more creativity. ...
David Goldston, Data wrangling. Excerpt:
... Even without a [Bureau of Environmental Statistics], the US government releases a lot of environmental data. Much of this is information to determine compliance with regulations, but increasingly just making data available is seen as a way to encourage companies to clean up their operations. The model for such efforts is the Toxic Release Inventory (TRI), established by Congress in 1987, which requires companies to publicly report their annual emissions of certain chemicals. The TRI has resulted in substantial cutbacks in emissions as companies try to 'green' their reputations. ...

Data sharing by individual, non-governmental scientists has increasingly become a topic for public debate. Charging that a scientist has been unwilling to share data is a good way for politicians to raise suspicions about someone's work, especially when the work itself is too technical to be easily evaluated by laymen. But different fields have different mores about data sharing, and the issue is not clear-cut. ...
Mitch Waldrop, Wikiomics. Description:
Pioneering biologists are trying to use wiki-type web pages to manage and interpret data, reports Mitch Waldrop. But will the wider research community go along with the experiment?
Clifford Lynch, How do your data grow? Excerpt:
... Because digital data are so easily shared and replicated and so recombinable, they present tremendous reuse opportunities, accelerating investigations already under way and taking advantage of past investments in science. ...

Funders now rightly view data as assets that they are underwriting and so seek the greatest pay-off for their investments. They demand that researchers and host institutions document and implement data-management and data-sharing plans that address the full life cycle of data — including what happens after a grant finishes. Host universities thus find themselves with legal and ethical obligations to provide a legacy of faculty data. Publishers must also identify the most effective ways to connect publications with data and preserve the scientific record. ...