Open Access News

News from the open access movement


Sunday, February 17, 2008

Interview with Bill Hooker on OA and open science

Bora Zivkovic, Getting Publishing up to Speed: Interview with Bill Hooker, A Blog Around the Clock, February 14, 2008.
... How is a scientific paper going to look in 20 years from now?  How is that going to affect the way scientific research (and teaching) is done?

Over the next 20 years, the two most important things that will happen to the scientific paper are: universal adoption of Open Access, and the richly deserved death of the Portable Document Format.

Although it will do a number of wonderful things, Open Access won't dramatically change the way a paper looks, at least not in the next 20 years.  Both because researchers are a conservative bunch, and because the format has served well for a very long time, I would guess that papers will look something like they do now -- Intro/Methods/Results/Discussion -- for some decades yet.  The most important things that will change in a 20 year timeframe are the level of detail available with a single click, and the number of entities which can understand the paper. 

Right now, even if you can access a paper what you get is pre-digested in the form of a PDF file -- useless for anything except being read by humans (which, of course, is very useful indeed -- but nowhere near as useful as a paper could, and should, be).  If there is any supplementary data, which there usually isn't, it's another bloody PDF!  In 20 years, something like XML will provide a way to make papers a machine-readable platform for accessing data, not just a pixelated proxy for a hunk of dead tree.  Instead of photocopying that graph three times at 200% so as to be able to draw lines on it and estimate the underlying values, you'll be able to grab the raw data into your own favorite graphing application, so that you can re-work it and look at it from your own angle.  You'll be able to zoom in on that spectrum and see the fine details.  You'll be able to get an unretouched version of that photograph and do the Photoshop work yourself, so as to emphasize whatever you're interested in.  All of this will be possible, not by writing to the authors and waiting three months for an answer, but with a single click right from the paper itself. 

The other thing that this sort of markup will do is to greatly enhance the number and scope of research tasks that can be automated.  We already rely heavily on search and filtering interfaces (Pubmed, Google, GenBank, and so on) to keep us afloat in a sea of information, and that situation is only going to intensify.  When machines can read papers, they will be able to do something no human can do: read every paper, and find connections among them all.  For a taste of what this might be like, check out iHOP, a text-mining navigation interface to the research literature.  Now imagine what iHOP could do if it could not just read text, but could place that text in context, and then again what it could do if it could access data as well as text.  (Note also that none of this makes sense without OA: good as it is, iHOP is currently crippled because it can only pull sentences from abstracts.  Imagine what it could do with the full text of all those papers!  To fully realize the power of machine readability requires that the entire knowledge base be Open Access.)

What that will mean for research is speed.  You can already see it happening in physics, where OA has been the de facto norm for more than a decade thanks to arXivBrody et al. showed that, in the high-energy physics section, the time between deposit in arXiv and citation in another paper has been dropping steadily since the arrival of arXiv in 1991, and was cut roughly in half between 1999 and 2003.  That's the research cycle -- the uptake of published ideas in further work -- accelerating in real time.  Multiply that by the power of text- and data-mining, driven by the combination of OA and machine readability, and you get a tremendous acceleration in the rate of scientific progress. 

I'm not a teacher, so I'm hesitant to make predictions about that field -- but what is clear is that teachers and students will have much greater access to detailed information.  On that basis, I guess I'll venture one (hopeful) prediction: science teaching will focus more on primary sources, on the actual data rather than predigested information in textbooks.  Rather than trying to absorb a body of knowledge being handed down from on high, learning science will become much more like doing science, with students being asked to think, explore and experiment rather than simply memorize. ...