The Entrez Programming Utilities (EUtils) are REST based web services provided by the NCBI. They provide programatic access to the NCBI databases.
The "Practical Guide" on the NCBI Bookshelf provides a chapter on Eutils.
The NCBI Power Scripting class is excellent –well-taught, with lots of time for hands-on examples. The classes are also free: if your organization will spring for plane fare and accommodations, you can simply register and come for the course, email the instructors for details.
PDFs of the slides for the Power Scripting class are available at: http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/schedule.html (follow the links for each section). The content is oddly organized, but easy enough to get to. Solutions to the exercises are linked from “Scripts” in the blue bar on the left side of the page. See Class handouts at ftp://ftp.ncbi.nih.gov/pub/PowerTools/eutils/Apr.2005/docs/
The following series of hack have been contributed by nodalpoint readers, enjoy.
In response to a post on nodalpoint regarding filtering pubmed results, Mark Johnson a contractor with the NCBI provided the following hacks via email. The content here is an edited version.
The original problem:
"I would like to limit searches to pubmed made through nodalpoint to journals that offer free online access. The reason for this is that an online community centered around the discussion of scientific literature without access to that literature is rather pointless."
Every article in PubMed that's available as free full text is explicitly annotated as such, using the PubMed filter “free full text”. For example, try:
The “sb” means “subset”; it's a nickname for the Entrez “Filter” field. For more on filter fields, see this post on hublog.
So the answer is simply to add “AND free full text[sb]” to any PubMed query. There are dozens of such filters that you can use to slice-n-dice output, not just in PubMed, but in all 29 Entrez databases.
Once you have the list of IDs, you'll probably want the list of free full text URLs, right? Use “elink” with cmd=prlinks to get the XML that describes where the external resource is. When you do that, you'll notice sometimes that you get <Info>No primary links</Info> instead of a url object. So what's up with that? I'm not certain, but it looks like documents that show up as free full text, but that have no “primary links”, are available from PubMed Central. I guess NCBI doesn't consider those to be “external” links because PMC is an NCBI resource. I have a question in about that, and I'll let you know if I'm wrong or if I find out more detail. The XML that comes back from “cmd=prlinks” is pretty self-explanatory. See example at bottom. Elink wants “db=pubmed”, “cmd=prlinks”, and “id=(list of ids)” or {WebEnv, query_key} to return list of “primary” links (meaning “links external to NCBI”).
Tips:
Examples:
Example 1: Result id list from:
esearch.fcgi?db=pubmed&term=cat+vomit+AND+free+full+text[sb]:
15072199,11842593,10676902,10484390,10229366,9868268,9868267,9749634,9575955 ,5724965,5334966
Execute:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?&db=pubmed&cmd=prlinks&id=15072199...
Result:
<?xml version="1.0"?> <!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd"> <eLinkResult> <LinkSet> <DbFrom>PubMed</DbFrom> <IdUrlList> <IdUrlSet> <Id>15072199</Id> <Info>No primary links</Info> </IdUrlSet> <IdUrlSet> <Id>11842593</Id> <Info>No primary links</Info> </IdUrlSet> <IdUrlSet> <Id>10676902</Id> <ObjUrl> <Url>http://joi.jlc.jst.go.jp/JST.JSTAGE/jvms/62.113?from=PubMed</Url> <IconUrl>...egifs/http:--linkout.jstage.jst.go.jp-logo.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text online</Attribute> <Provider> <Name>J-STAGE, Japan Science and Technology Information Aggregator, Electronic</Name> <NameAbbr>JSTAGE</NameAbbr> <Id>3580</Id> <Url>http://www.jstage.jst.go.jp/</Url> <IconUrl>http://linkout.jstage.jst.go.jp/logo.gif</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>10484390</Id> <ObjUrl> <Url>http://ajpgi.physiology.org/cgi/pmidlookup?view=long&pmid=10484390</Url> <IconUrl>...stanford.edu-icons-externalservices-pubmed-free-ajpgi-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text online</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>10229366</Id> <ObjUrl> <Url>http://jas.fass.org/cgi/pmidlookup?view=reprint&pmid=10229366</Url> <IconUrl>...-icons-externalservices-pubmed-free-animalsci-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text PDF</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>9868268</Id> <ObjUrl> <Url>http://nutrition.org/cgi/pmidlookup?view=long&pmid=9868268</Url> <IconUrl>...stanford.edu-icons-externalservices-pubmed-free-nutrition-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text online</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>9868267</Id> <ObjUrl> <Url>http://nutrition.org/cgi/pmidlookup?view=long&pmid=9868267</Url> <IconUrl>...stanford.edu-icons-externalservices-pubmed-free-nutrition-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text online</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>9749634</Id> <ObjUrl> <Url>http://www.ajtmh.org/cgi/pmidlookup?view=reprint&pmid=9749634</Url> <IconUrl>...stanford.edu-icons-externalservices-pubmed-free-tropmed-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text PDF</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>9575955</Id> <ObjUrl> <Url>http://ajpregu.physiology.org/cgi/pmidlookup?view=long&pmid=9575955</Url> <IconUrl>...stanford.edu-icons-externalservices-pubmed-free-ajpregu-free.gif</IconUrl> <SubjectType>publishers/providers</SubjectType> <Attribute>publisher of information in URL</Attribute> <Attribute>full-text online</Attribute> <Provider> <Name>HighWire Press</Name> <NameAbbr>HighWire</NameAbbr> <Id>3051</Id> <Url>http://highwire.stanford.edu</Url> <IconUrl>...edu/icons/externalservices/pubmed/highwirepress.jpg</IconUrl> </Provider> </ObjUrl> </IdUrlSet> <IdUrlSet> <Id>5724965</Id> <Info>No primary links</Info> </IdUrlSet> <IdUrlSet> <Id>5334966</Id> <Info>No primary links</Info> </IdUrlSet> </IdUrlList> </LinkSet> </eLinkResult>
Need to write up some of the responses to this post: