Open Access News

News from the open access movement


Thursday, March 09, 2006

Increasing the diffusion rate of scientific knowledge

Walt Warnick, Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science, a talk at the AAAS annual meeting, February 16–20, 2006. Warnick is the Director of the US Department of Energy's Office of Scientific and Technical Information. Excerpt:
Science is all about the flow of knowledge....According to the National Science Foundation, there are over 2.5 million research workers worldwide, with more than 1.2 million in the U.S. alone.1 If we look at all the articles, reports, emails and conversations that pass between them, we could count billions of knowledge transactions every year. This incredible diffusion of knowledge is the very fabric of science. Given that the diffusion of knowledge is central to science, it behooves us to see if we can accelerate it. We note that diffusion takes time. Sometimes it takes a long time. Every diffusion process has a speed. Our thesis is that speeding up diffusion will accelerate the advancement of science....Currently it is difficult for researchers, who primarily track journals within their specific discipline, to hear about discoveries made in distant scientific communities. In fact, diffusion across distant communities can take years. In contrast, within an individual scientific community, internal communication systems are normally quicker. These include journals, conferences, email groups, and other outlets that ease communication. Many communities use related methods and concepts: mathematics, instrumentation, and computer applications. Thus there is significant potential for diffusion ACROSS communities, including very distant communities. We see this as an opportunity....Diffusion to distant communities takes a long time because it often proceeds sequentially, typically spreading from the community of origin (A) to a neighbor (B), then to community (C), a neighbor of B, and so on. This happens because neighboring communities are in fairly close contact. Science will progress faster if this diffusion lag time is diminished. The concept of global discovery is to transform this sequential diffusion process into a parallel process....We are particularly interested in recent work that applies models of disease dynamics to the spread of scientific ideas. The spread of new ideas in science is mathematically similar to the spread of disease, even though one produces positive results, the other negative. Our goal is to foster epidemics of new knowledge....Looking at these models has led us to focus on a parameter called the contact rate. In the disease model, this is the rate at which people come into contact with a person who has the disease. Increasing the contact rate speeds up the spread of the disease....To [increase the contact rate for knowledge] we must reduce a huge gap in how the Internet works today....Analysts estimate that perhaps 99 percent of all the Web-accessible scientific documents are in deep Web databases. Because these documents are not accessible to search engines and robots, this creates a huge gap in knowledge searchability. The problem of accessing all this deep Web science mirrors the problem of diffusion across distant communities. This is because many of the deep Web databases are maintained within specific communities, including specialized journals, scientific societies, university departments, or with individual researchers. Within each community the deep Web document repositories are typically well known. But they are hard for a scientist in a distant community to find. Worse, once found, each repository must be searched sequentially, making widespread search prohibitively difficult....We have begun to close this gap and solve the sequential search problem. Conceptually the solution is simple. It is simultaneous deep Web search with integrated ranking of results. All it takes is virtual aggregation or federation of diverse deep Web databases. The federated databases are searched in parallel, not sequentially. This greatly increases the contact rate across distant communities, speeding up the diffusion of new knowledge. We call this result Global Discovery. It means making each original discovery globally available. Federated deep Web search transforms local discovery into global discovery. While the concept is simple, making it a reality is not. The current challenge of metasearch is that the number of databases that can be searched simultaneously is limited. That's a tough problem to solve, and one that we're working on....When trying to integrate information from diverse sources, it is important to avoid adding burdens to information owners. The history of information management has seen a number of instances where seemingly promising efforts to integrate information have been hampered because too few information owners signed on: Government Information Locator System (GILS), Open Archive Initiative (OAI), Institutional Repositories, and others. While DOE adopted the protocols advanced by these efforts, too often few other information owners did so. Our view is that these efforts stumbled because they placed demands on the information owners who did not enjoy the benefits. In contrast, we believe that those who seek to integrate information from diverse sources need to bear the burdens themselves.