Open Access NewsNews from the open access movement Jump to navigation |
|||
Dirk Lewandowski and Philipp Mayr, Exploring the Academic Invisible Web, a preprint self-archived May 17, 2006.
Abstract: Purpose: To provide a critical review of Bergman’s 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scientific search engines. We provide an overview of approaches followed thus far. Design/methodology/approach: Discussion of measures and calculations, estimation based on infor-metric laws. Literature review on approaches for uncovering information from the Invisible Web. Findings: Bergman’s size estimation of the Invisible Web is highly questionable. We demonstrate some major errors in the conceptual design of the Bergman paper. A new (raw) size estimation is given. Research limitations/implications: The precision of our estimation is limited due to small sample size and lack of reliable data. Practical implications: We can show that no single library alone will be able to index the Academic Invisible Web. We suggest collaboration to accomplish this task. Originality/value: Provides library managers and those interested in developing academic search engines with data on the size and attributes of the Academic Invisible Web. From the body of the article: Library collections and databases with millions of documents remain invisible to the eyes of users of general internet search en-gines. Furthermore, ongoing digitization projects are contributing to the continuous growth of the Invisible Web. Extant technical standards like Z39.50 or OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting) are often not fully utilized, and consequently, valuable openly accessible collections, especially from libraries, remain invisible.... Comment. There's a lot here for friends of OA to think about. One lesson is that an OA article can still be invisible in the relevant sense (not indexed by all or most search engines) if it has no incoming links, if it's in a file format most search engines ignore, or if it's in a relational database for which access requires filling out an interactive form. Most OA content is visible in this sense, but not all of it is. We can do better, both by making existing OA content more visible and (of course) by making more content OA. See my tips (co-written with Google) on how to facilitate Google-crawling of OA repositories and my tips on how to make visible OA content even more visible or discoverable. |