Open Access News

News from the open access movement


Thursday, June 11, 2009

New tool for finding full-text papers

Kevin Davies, Got PubMed? Pubget Searches and Delivers Scientific PDFs, Bio-IT World, June 10, 2009.

Imagine a search tool for the life sciences literature that could, with one click, pull up a full-text PDF of any paper. That in essence is the attraction of Pubget, the first product of a small Cambridge, Mass. start-up. ...

The original Pubget product was developed by one of the three co-founders, a clinical pathologist at Beth Israel Hospital (Harvard Medical School) named Ramy Arnaout. He got his PhD in mathematical biology from Oxford, but was frustrated by the challenge of getting full-text PDF access to science journal articles -- even while working inside well-endowed institutions like Harvard and Oxford. ...

[Pubget president Ryan] Jones, who was previously with a start-up acquired by Microsoft enterprise search, says Pubget is built on three key components. “One is a search engine that has all the content that Medline or the NIH’s PubMed has in it – 20 million research documents. ... We took an initial data dump from PubMed, and now we’ve based direct connections to the publishers themselves, so as soon as research is available, we get that feed from the publisher.”

Second, Pubget built a ‘pathing engine’ that understands the location of the full-text PDFs across all 20,000 journal titles. “It knows exactly where on the web that full-text document lives,” says Jones. “We have crawlers that go out and understand at Nature or Cell or Science where those full-text documents live. ...”

The third component is what Jones calls “a credentials engine, which understands the credentials of the subscriptions you have based on where you are ...”

What this means is that when scientists use Pubget to search by author for example, the results are delivered in the form of the full-text PDF, without having to navigate through abstracts or publisher’s electronic portals. “The end user sees us in two ways,” says Jones. “If they are not associated with a larger institution, we are the most thorough resource for free full-text documents. We not only have everything that’s in PubMed Central and the other free resources, but we spider the web for other full-text documents that happen to be out there. If you’re at an institution, we’re the fastest way to take advantage of the subscriptions your institution has provided for you.” ...

Pubget will in time make money in two ways. One will be the provision of premium services. The other will be by aggregating analytics about current life science search topics. “We can help vendors like Agilent or Bio-Rad understand what the community is searching on,” says Jones. “If you do a search on swine flu, and someone did a virus study and in the methods of that study cited a specific type of microscopy, we can present ads relevant to that.” Host institutions can decide if they want those ads presented or, for a fee, they can opt for “a closed, white label site.” ...