Abstract: The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.
Posted by
Peter Suber at 3/08/2006 10:15:00 PM.
The open access movement:
Putting peer-reviewed scientific and scholarly literature
on the internet. Making it available free of charge and
free of most copyright and licensing restrictions.
Removing the barriers to serious research.
I recommend the OA tracking project (OATP) as the best way to stay on top of new OA developments. You can read the OATP feed on a blog-like web page or subscribe to it by RSS, email, or Twitter. You can also help build the feed by tagging new developments you encounter.