Monday, March 24, 2008

Challenges and rewards of data sharing

Heather Piwowar, Eating my own dogfood, Research Remix, March 21, 2008.

... Sharing data is indeed hard. Specifically:

  • time consuming
  • decision-intensive (where to put it? what to share? what format to share it in?)
  • scary (what if someone finds a mistake?)
  • embarrassing (the data isnít nearly as X as I wish I had the time to make it )

I also get to experience some of the first-hand benefits:

  • it forces additional organization
  • it helps me find my own data again later, from any computer!
  • it makes me feel proud to have made my science transparent (albeit after the fact, rather than as open notebook science)

Iím a firm believer in continual improvement. That means that Iíve shared my data now, in the best way that I have time for, rather than waiting until I can share it the way that Iíd ideally like to. There are lots of things Iíd like to improve:

  • Put it somewhere central and permanent (not clear where, for the esoteric dataset types that I have, but there are some neat possibilities)
  • Put it in a semantic format (!!!)
  • Document it better
  • Tag it so people can find it
  • Ö.

Iíll keep exploring and implementing these things as I get a chance.

If you want to put your data up but have hesitations about it, I say do it to the best of your ability right now given your current constraints. It isnít perfect? I know, but perfect is the enemy of good enough. ...

See also this poster, Prevalence and Patterns of Microarray Data Sharing (Pacific Symposium on Biocomputing, Kohala, Hawaii, January 4-8, 2008), posted online March 20. .