You may have set up your repository and filled it with interesting papers, but it is still possible to screw things up technically so that search engines and harvesters cannot index your material. Here are some common gotchas:
Require all visitors to have a username and password
Harvesters and crawlers will be locked out, and a lot of end users will give up and go away. It is reasonable to require a username and password for depositing items, but not for just searching and reading.
Do not have a 'Browse' interface with hyperlinks between pages
Search engine crawlers will never index past your first page. Button-style controls cannot normally be followed.
Set a 'robots.txt' file and/or use 'robots' meta tags in HTML headers that prevent search engine crawling
Google, Yahoo!, etc., may find your pages, but if you tell them not to index them or to follow the links, they won't.
Restrict access to embargoed and/or other (selected) full texts
Search engines and harvesters may index the metadata pages, but not the full texts of the relevant items.
Accept poor quality or restrictive PDF files
Some PDF-making software packages (usually free, cheap, or esoteric) generate poor quality PDF files that sometimes cannot be read properly by harvesting and indexing programs. However, you can still cause problems even with high-end software if you use it to restict the functionality of the PDF file - e.g. preventing copy-and-paste. It may not be possible to index such files.
Hide your OAI Base URL
If harvesters cannot find your OAI Base URL, they cannot harvest your data. Good places to give the OAI Base URL are on your repository's 'About' page or home page. Also, register it with OpenDOAR and ROAR.
Have awkward URLs
Many harvesters and firewalls will spit out or block:
Numeric URLs - e.g. http://126.96.36.199/
URLs that use 'https:' instead of 'http:'
URLs that include unusual port numbers e.g. :47231
Stick to 'http:' and alphabetical URLs. It should be possible to avoid using port numbers in URLs.
Peter Suber at 7/31/2008 12:05:00 PM.
The open access movement:
Putting peer-reviewed scientific and scholarly literature
on the internet. Making it available free of charge and
free of most copyright and licensing restrictions.
Removing the barriers to serious research.