PubMed: what is it? what's its future?
Lois Ann Colaianni
[SLIDE 1]PubMed: What is it? What's its Future?
[SLIDE 2] There has been a great deal of excitement following the announcement June 26, 1997, by the Vice President of the United States that the National Library of Medicine would make Web based searching of MEDLINE free. Usage has increased tenfold since that announcement. In January over 37,000 users each day accessed PubMed. Thirty-five percent of the users are from outside the U.S.
The Library has two Web search interfaces to MEDLINE, Internet Grateful Med and PubMed. Today, I will talk about PubMed, what it is and how it developed, its features, and its future.
First. What is PubMed? How did it develop?
[SLIDE 3] In 1986 NLM completed a long range plan for its programs and services using five panels of outside experts. One panel addressed the importance of obtaining factual information by accessing databases. In the future, instead of obtaining a citation or reference to an article, book, or report which contained the information, the panel believed that the user would access information directly in factual or textual databases, in practice linked databases, in patient records, etc. The emerging molecular biology field was mentioned as an area where direct access to DNA and messenger RNA sequence data, restriction maps, chromosome libraries, protein sequences, etc. would make a significant contribution. The National Center for Biotechnology Information (NCBI), was established at NLM in 1988 as a direct outgrowth of this planning process. The NCBI staff took a subset of MEDLINE to link citations to articles in which a DNA sequence was described to the actual sequence in GenBank. They mounted the subset of MEDLINE as a relational database with links to the DNA sequences in GenBank with direct TCP/IP access via the Internet using the Entrez client/server based retrieval system they designed. It was fast, easy to use, and well received by scientists throughout the world who searched the NCBI version of MEDLINE directly. The NCBI subset of MEDLINE grew, the capabilities of the searching interface grew in sophistication incorporating a feature which identifies related references or sequences, and links from this subset of MEDLINE expanded from GenBank to other electronic resources.
On a separate track, as part of its system reinvention activities, NLM has been looking for a retrieval system to replace its aging ELHILL software, to enable the MEDLARS databases to run on smaller, powerful computers. The reinvention team looked at Entrez and felt that it could be the foundation for the Library's new retrieval system. The NCBI system had many of the hardware and software features NLM desired. NLM is now working to modify this software for a greater variety of users than the community of research scientists.
Geneticists need to have rapid access to the sequence data. Users complained that the citations in MEDLINE appeared several months after the articles' publication. They asked if this process could be speeded up so they could search more current citations and their links to the gene sequences? Several publishers were willing to help make this subset more current by sending citations and abstracts for their journals directly to NCBI to be included in PubMed before the printed journal issues were distributed to subscribers. Their objective was to have the citation information in PubMed's MEDLINE more quickly. Thus, the MEDLINE subset on the NCBI system became more current than that searchable using ELHILL. The citations for these publisher supplied citations and abstracts, however, did not have the MeSH indexing terms until the printed issues had been received and the articles indexed by NLM staff, International MEDLARS Centers, and contractors. This was the beginning of PREMEDLINE. PubMed now includes all of MEDLINE and PREMEDLINE. [SLIDE 4]
So what is PubMed? PubMed is a World Wide Web retrieval system developed by the National Center for Biotechnology Information at the National Library of Medicine. PubMed uses the Entrez software to search MEDLINE and to link to other Web sites. At present PubMed consists of MEDLINE and links to the full text of articles at publishers' Web sites, DNA and protein sequences, genome and chromosome mapping data, and 3-D protein structures.
PubMed is the Library of Medicine's new retrieval system. The name is also used to denote a search interface and the database which is being searched. The Library may have several search interfaces, including Internet Grateful Med but they will access the PubMed system, not ELHILL on a main frame computer. PubMed is easy for the novice user to use. It is being augmented to include additional search features for the sophisticated command language searcher. And it is free.
Second, PubMed's features:
A new version of PubMed became public on January 26 and I will use that as the basis for my description of its features.
This is the top part of the PubMed search screen that a user sees. [SLIDE 5] This is the screen for the basic user who is not expert in MeSH or the structure of MEDLINE.
I'd like to show you a search run in this basic mode and also one run in the advanced search mode.
The following is a real search received at a medical library in the U.S. The user wanted articles published in the last five years on using the tilt-table test to diagnose syncope, fainting.
In ELHILL, one would formulate the search for five years: [SLIDE 6]
Search MEDLINE: *Syncope/diagnosis and Tilt-Table Test
Search MED93: *Syncope/diagnosis and Posture and 94 (yr)
Tilt-Table Test became a MeSH term in 1995; prior to that time one must search on POSTURE . In order to obtain five years the search must be run against MEDLINE (1995-98) and MED93 (1993-1994). The formulation retrieves 33 citations in MEDLINE and 7 citations in MED93.
[SLIDE 7] Let's run this search using the simple search screen of PubMed. Novice users can type in one or two terms describing the topic for which they are seeking information. They can review the retrieval. They can view the abstracts to help identify relevant citations. If the user wants more citations, they can click on "see related articles". Using a relevancy ranking algorithm based on the frequency and location of words used in the abstract, the title, and the assigned indexing terms, the system will look for additional articles which match the important characteristics of the one the user thought was relevant. Actually a user with a relevant citation can enter that and then ask for additional relevant articles without entering search terms.
On the screen the user types tilt table diagnosis syncope, selects the date limit of 5 years, and clicks on the Search button.
Tilt table diagnosis syncope
[SLIDE 8]PubMed found 291 citations. The user can obtain the details of how PubMed has searched by clicking on Details. [SLIDE 9]
Returning to the retrieval screen, you can see the first two citations retrieved. [SLIDE 10] The citation indicates if an abstract is not available. By clicking in the box by the citation you can elect to see the abstract or order the article.
There are two other things to note on this slide. The first is the PMID and UI numbers at the end of each citation. Most of you are familiar with the MEDLINE unique identifier (UI). The PubMed number is the number assigned when the citation is loaded into PubMed. This is often assigned to a citation before it is indexed. NCBI staff use the PubMed number in a specific program for publishers in which they link a reference at the end of an article in a journal on a publisher's Web site to the respective abstract in MEDLINE. Publishers must do this before the MEDLINE UI is assigned and, so, this link is maintained using the PMID.
The second item is [See Related Articles]. As I mentioned before, the user can click on the [See Related Articles] and the system will retrieve relevant citations from MEDLINE based on the words used in the title, abstract and indexing. The next slide shows the first couple of citations in the retrieval if one had clicked on [See Related Articles]. [SLIDE 11]
I urge you to use this feature. Since the relevancy ranking considers words in the title and abstract as well as indexer assigned terms, this feature can compensate somewhat for the specificity of indexing, retrieving citations indexed before a MeSH term was introduced, or indexer inconsistency.
Note also the opportunity to order the documents for any of the citations. I will talk about who may use this feature later.
Also note the ability to select articles and then display the linked data. Links [SLIDE 12] go to a number of electronic resources.
I hope that you will explore these features of PubMed. Now, I'd like to turn to the advanced search mode. There isn't time to cover advanced search or command language searching adequately but let me show you some features. [SLIDE 13] In Advanced searching you may select the field in the MEDLINE record you are searching. [SLIDE 14] In this case a Mesh term as the major topic is selected. You can look up MeSH terms using the MeSH browser. For example, if you don't know the MeSH term for fainting, [SLIDE 15] you can enter fainting and the system suggests the MeSH term syncope. [SLIDE 16] The bottom of the screen tells you that syncope is found in more than one MeSH tree. The MeSH trees [SLIDE 17] are displayed so you may select the term in one or both trees and the terms under it. You then have the choice of whether to see a more detailed MeSH display or adding the term to your query. [SLIDE 18] The detailed MeSH display shows the allowable subheadings, allows you to restrict the search to the MeSH term when it has been designated as a major topic and you can turn off the explode. [SLIDE 19] You can also search MeSH to select the best term. This display gives you the number of citations with a particular main heading subheading combination. Illustrated on the slide is tilt table test and adjacent terms.
[SLIDE 20] Command language searching is better learned by a searcher practicing searching. It is not learned as well by listening to a speaker, especially one who is not communicating in your native language. NLM and the Regional Medical Libraries have developed a course to train command language searchers to do sophisticated searches using PubMed. An updated training manual should be available via the NLM home page soon for people to download. Since searching is free, I urge you to try different things to see how to use PubMed. Also, I think it would be good to have someone trained to use PubMed who knows ELHILL well and is fluent in Spanish. NLM would welcome a small number of librarians from Central and South America and the Caribbean who want to learn how to use PubMed and will teach others in this geographical region.
Searching PubMed requires that command language users rethink their approach to searching. This is especially true for those who are used to searching ELHILL. I liken it to the time when automatic transmissions became common in automobiles. Those who were used to manual clutches and liked deciding what gear to be in were not comfortable letting the automatic transmission decide. PubMed is like an automatic transmission. It does a lot of things for the searcher. The sophisticated searcher needs to understand what those are and how to use them effectively. Also sometimes the searcher can override the automatic searching; the searcher needs to know when and how to do this.
I want to mention the document delivery feature:
[SLIDE 21] Ordering documents: PubMed has a second way to order documents. As you know MEDLINE provides a citation and currently 76% of the time an abstract in English. How does a user get the full text? Librarians use their interlibrary loan arrangements to obtain copies of documents for their users, but PubMed is used by many more physicians, and others who do not have ready access to medical libraries. For those in the U.S., PubMed has the Loansome Doc feature. In order to use this the user must make an arrangement with a medical library. NLM and the National Network of Libraries of Medicine help users identify a medical library which will agree to serve them. The user retrieves one or more citations for which the full text is wanted. Using Loansome Doc the user can order these documents from a medical library. NLM is planning a way for librarians and others outside the U.S. to use this feature.
[SLIDE 22] PubMed also has links to the full text of articles on the publishers' Web sites. The NCBI staff have been working with publishers to encourage them to provide electronic links from the citation in MEDLINE to the full text of the article on the publisher's Web site. As of the middle of February NCBI had arrangements with 40 publishers for 140 journals and more are in the discussion phase.
Plans for the future: [SLIDE 23]
MEDLINE is only one of 40 databases. What about the rest of them? Probably the most major change for the future is that over the next year or so NLM is examining the possibility of moving the unique citations to journal articles in many of these other databases into PubMed. This will benefit users by having one database to search. It will benefit NLM by not having duplicate citations in a number of derivative databases which must be maintained each year. Efforts are being made to move the factual databases to the Web so they can be searched directly.
There are about 17 million citation in the MEDLARS databases. 9.3 million (55.5%) are in MEDLINE and its Backfiles. About 5.4 of the remaining citations are actually duplicate MEDLINE citations of specific importance to the specialized database. Only around 2 million are unique or their uniqueness unknown.
Consolidating these records is not as trivial a task as it might seem. Additional data elements, such as the Space Flight/Mission or Keywords have been added to specialized databases for their needs. These must be added to the MEDLINE record. In some cases the form of the data in a field in a specialized database is different from the form of the field in MEDLINE .
This is a major initiative for the Library for the coming months. In the interim ELHILL will be available, especially for those databases which are not accessible through PubMed.
We will also be refining the command language searching capabilities of PubMed. It would be extremely helpful if you would search using PubMed and let us know how you like it and what you can and cannot do.
Thank you for inviting me to come to your regional congress to tell you about PubMed. I hope that you will use it and let NLM know your comments. When there is time, I'd pleased to try and answer your questions.