Open Access News

News from the open access movement

Monday, October 27, 2003

Text mining open-access literature

Mark Uehling, Digging Into Digital Quarries, Bio-IT World, October 10, 2003. A very good survey of the prospects for text mining. Excerpt: "As nearly magical as all current and next-generation text-mining capabilities may seem, they are being applied to only a fraction of the most tantalizing text: the abstract. The full, unabridged text of scientific articles is almost always locked away from the clutching paws of software. Generating those abstracts is, by definition, an art. That means that far more unexplained connections could emerge from text-mining the entire mountain of life science data, not just the summit. Fortunately, even that is changing, thanks to new online reservoirs of insight such as the Public Library of Science and BioMed Central, where Matthew Cockerill is doing text mining with new software built into Oracle databases. Says Cockerill: 'The full text articles are locked in prisons in publishers' Web sites. We make our whole corpus of data available. People can download it and work with it whenever they want.'" (Thanks to Jason Bobe.)