Text summarization as a decision support aid
© Workman et al.; licensee BioMed Central Ltd. 2012
Received: 18 October 2011
Accepted: 18 April 2012
Published: 23 May 2012
PubMed data potentially can provide decision support information, but PubMed was not exclusively designed to be a point-of-care tool. Natural language processing applications that summarize PubMed citations hold promise for extracting decision support information. The objective of this study was to evaluate the efficiency of a text summarization application called Semantic MEDLINE, enhanced with a novel dynamic summarization method, in identifying decision support data.
We downloaded PubMed citations addressing the prevention and drug treatment of four disease topics. We then processed the citations with Semantic MEDLINE, enhanced with the dynamic summarization method. We also processed the citations with a conventional summarization method, as well as with a baseline procedure. We evaluated the results using clinician-vetted reference standards built from recommendations in a commercial decision support product, DynaMed.
For the drug treatment data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.848 and 0.377, while conventional summarization produced 0.583 average recall and 0.712 average precision, and the baseline method yielded average recall and precision values of 0.252 and 0.277. For the prevention data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.655 and 0.329. The baseline technique resulted in recall and precision scores of 0.269 and 0.247. No conventional Semantic MEDLINE method accommodating summarization for prevention exists.
Semantic MEDLINE with dynamic summarization outperformed conventional summarization in terms of recall, and outperformed the baseline method in both recall and precision. This new approach to text summarization demonstrates potential in identifying decision support data for multiple needs.
Clinicians often encounter information needs while caring for patients. Several researchers have studied this issue [1–6]. In their 2005 study, Ely and his colleagues discovered that physicians developed an average of 5.5 questions for each half-day observation, yet could not find answers to 41% of the questions for which they pursued answers . Ely cited time constraints as one of the barriers preventing clinicians from finding answers. Chambliss and Conley also found that answer discovery is excessively time consuming; yet they also determined that MEDLINE data could answer or nearly answer 71% of clinicians’ questions in their separate study . PubMed, the National Library of Medicine’s free source for MEDLINE data, was not exclusively designed to be a point-of-care information delivery tool. It generally returns excessive, often irrelevant data, even when implementing diverse search strategies . Clinicians can spend an average of 30 minutes answering a question using raw MEDLINE data . This is by and large due to the process of literature appraisal, which is naturally lengthened by excessive retrieval . Thus this information discovery process is not practical for a busy clinical setting . Applications that use natural language processing and automatic summarization of PubMed and present it in a compact form potentially can provide decision support data in a practical manner.
The objective of this study was to evaluate the performance of a new automatic summarization algorithm called Combo in identifying decision support data. We hypothesized that a natural language processing application, enhanced with the algorithm, could identify intervention data which is also provided by a commercial decision support tool. To operationalize this pursuit, we incorporated the algorithm into Semantic MEDLINE , an advanced biomedical management application. We sought data on drug treatment and preventive interventions for four disease topics, and evaluated the results by comparing output to clinician-vetted reference standards based on recommendations from a commercial decision support product, DynaMed. The Combo system was also compared to a baseline as well as a conventional summarization method within the Semantic MEDLINE methodology.
Natural language processing applications that summarize bibliographic text such as PubMed citations try to facilitate literature appraisal by providing succinct, relevant information suitable for point-of-care decision support. The objective of automatic text summarization is “to take an information source, extract content from it, and present the most important content to the user in a condensed form and in a manner sensitive to the user’s application’s need” . Automatic text summarization can be applied to multiple documents or information sources , such as bibliographic citations retrieved from PubMed. Researchers have noted the potential value that summarized text holds in patient care. Previous research efforts provide interesting examples of approaches to summarizing PubMed and other text. Using a multimedia application called PERSIVAL, McKeown and her colleagues retrieved, ranked, and summarized clinical study articles (along with digital echocardiogram data) according to a patient’s profile information . Article characteristics, specifically the properties of individual segments of text, were matched against information from a patient’s record. Within this process, the researchers used templates to identify and represent content. These templates identified six potential relations (risk, association, prediction, and their negations) existing between findings, parameters, and dependence properties. The results are then ranked according to potential relevancy to the specific patient’s information, consolidated, and presented to the user. To operate the clinical question answering application AskHERMES, Cao and his colleagues used a machine learning approach to classify questions, and they utilized query keywords in a clustering technique for presenting output . AskHERMES draws answers from PubMed citations, in addition to eMedicine documents, clinical guidelines, fulltext articles, and Wikipedia entries. It uses a scoring system to assess similarity between text segments (adjacent sentence blocks) and the properties of clinical questions. Yang and his associates used a three-step pipeline to identify mouse gene information in PubMed data . Using a topically-focused subset of PubMed, they tagged gene and protein names. They stored abstract and title sentences in a database, along with MeSH entries and other data. Each gene was modeled according to associated MeSH headings, Gene Ontology terms, and free text citation terms referencing the gene of interest. They clustered the data using these three features and a direct-k clustering algorithm. Sentences addressing specific genes were ranked, allowing a user to access the desired amount of sentences for review.
SemRep  is a rule-based NLP application that interprets the meaning of abstract and title text in citations and transforms it into compact, subject_verb_object declarations known as semantic predications. It draws upon resources within the Unified Medical Language System (UMLS)  to accomplish this. For example, if the original text is:
"“These results suggest the possibility of molecular-targeted therapy using cetuximab for endometrial cancer” "
In this example, SemRep identifies the subject and object of the original text as cetuximab and endometrial cancer, respectively. Using MetaMap  technology, it maps these terms to the corresponding UMLS Metathesaurus preferred concept terms cetuximab and Endometrial carcinoma, as indicated in the resulting semantic predication. Utilizing the UMLS Semantic Network, SemRep also identifies the most likely logical semantic types associated with the subject and object, which in this case are pharmacological substance (abbreviated as phsu) and neoplastic process (abbreviated as neop). SemRep also utilizes the UMLS Semantic Network to identify the relation, or predicate, that binds the subject and object. In this case, it is TREATS. SemRep identifies 26 such relations, plus their negations, in PubMed text. Additionally, SemRep identifies the four comparative predicates compared_with, higher_than, lower_than, and same_as .
Summarization in Semantic MEDLINE  filters SemRep output for a point-of-view concept and a seed topic concept selected by the user. The project described in this paper implemented a dynamic form of summarization. Here we describe both the dynamic and conventional summarization methods. Conventional Semantic MEDLINE offers summarization for five points-of-view: treatment of disease ; substance interaction ; diagnosis ; pharmacogenomics  and genetic etiology of disease . For example, if the seed topic was Endometrial carcinoma and the point-of-view was treatment, summarization would identify semantic predications relevant to these paired concepts. Point-of-view concepts are similar to subheading refinements that can be combined with logical MeSH headings. For example, “Carcinoma, Endometrioid/therapy[MeSH]” could serve as a PubMed search query seeking citations addressing treatment options for endometrial carcinoma. Summarization accomplishes topic and point-of-view refinements of SemRep output by subjecting it to a four-tiered sequential filter:
Relevance: Gathers semantic predications containing the user-selected seed topic. For example, if the seed topic were Endometrial carcinoma, this filter would collect the semantic predication cetuximab-TREATS-Endometrial carcinoma, among others.
Connectivity: Augments Relevance predications with those which share a non-seed argument’s semantic type. For example, in the above predication cetuximab-TREATS-Endometrial carcinoma, this filter would augment the Relevance predications with others containing the semantic type “pharmacological substance” because it is the semantic type of the non-seed argument cetuximab.
Novelty: Eliminates vague predications, such as pharmaceutical preparation-TREATS-patients, that present information that users already likely know, and are of limited use. Such predications that Novelty filtering removes usually contain very general arguments that are of little use.
Saliency: Limits final output to predications that occur with adequate frequency. For example, if cetuximab-TREATS-Endometrial carcinoma occurred enough times, all occurrences would be included in the final output.
Operationalizing the points-of-view coverage of the summarization process can be done in one of two ways. Conventional summarization  requires creating separate applications known as schemas for each new point-of-view emphasis. This requires hard-coding specific subject_predicate_object patterns into the application, which limits output to predications matching the specific patterns for the new point-of-view. Prior to coding, designers must determine which patterns best capture semantic predications relevant to the given point-of-view. Conventional schema output may also be refined using degree centrality measurements . The novel approach to summarization that we explore here is to produce saliency measurements on the fly, using a dynamic statistical algorithm known as Combo . Combo adapts to the properties of each individual SemRep dataset by weighing term frequencies with three combined metrics. This flexibility enables summarization for multiple points-of-view, eliminates the work of hard-coding schemas, and uses a single software application.
The Combo algorithm to support summarization
The Combo algorithm combines three individual metrics to identify salient semantic predications:
Both distributions P and Q consist of relative frequencies for their respective predicates. Each predicate shared by each distribution receives a KLD value (before summing) indicating its value in conveying the point-of-view expressed in distribution P’s search query. A database of PubMed citations from the last 10 years processed with SemRep provides the distribution Q data. Prior to our research, the KLD metric performed well in a similar task involving predicate assessment .
We adapted RlogF to assess the value of a semantic type as paired with a predicate. The log of a semantic type’s absolute frequency (semantic type frequencyi) is applied to the quotient of dividing that same frequency with the absolute frequency of all semantic types that are also paired with the predicate (patterni). We use RlogF to appraise combinations of predicates and non-seed topic semantic types. Using the example above, in cetuximab-TREATS-Endometrial carcinoma, the seed topic “Endometrial carcinoma” has the semantic type “neoplastic process”. The opposing argument “cetuximab” has the semantic type “pharmacologic substance”. RlogF would assess the significance of “pharmacologic substance” as bound to the predicate TREATS. The RlogF metric has been noted for its efficiency in identifying important predicate and argument patterns .
Here, c represents the count of unique predicates. In rare cases where there is only one unique predicate, PredScal defaults to a value of 1.
Combo summarization output consists of the four highest scoring semantic typea_verb_semantic typeb Relevancy patterns (based on novel predications containing the summarization seed topic) and the four highest scoring Connectivity patterns (patterns sharing a non-seed topic argument’s semantic type from one of the high scoring Relevancy patterns).
In the Saliency phase, conventional summarization uses metrics developed by Hahn and Reimer  which appraise “weights” that are dependent on the predefined subject_verb_object patterns.
In contrast, dynamic summarization does not utilize such predetermined patterns; instead it applies the Combo algorithm to all novel predications in order to determine which are more prominent in the data.
DynaMed is a decision support tool that provides intervention recommendations. In a recent study, it tied with two other products for highest ranked evidence-based decision support tool . It draws upon the professional literature using a “Systematic literature surveillance” method in evaluating published results, using a tiered-ranking of study design types . For example, here is an excerpt of the DynaMed pneumococcal pneumonia drug treatment recommendation text that we used :
treat for 10?days
○ aqueous penicillin G 600,000 units IV every 6 hours (2 million units every 4-6 hours if life-threatening)
○procainepenicillin G 600,000 units intramuscularly every 8–12 hours
○penicillin V 250–500?mg orally every 6 hours
Diabetes mellitus type 2
Congestive heart failure
Each disease is a significant global health concern, and of interest to clinicians in many areas of the world. Collectively, they have an interesting variety of preventive interventions and treatment options.
Diabetes Mellitus, Type 2
prevention and control
For example, to acquire citations addressing drug treatment options for pneumococcal pneumonia, we executed the search phrase “Pneumonia, Pneumococcal/drug therapy[Mesh]”. To provide an evidence-based focus, we first restricted output to the publication types “clinical trials,” “randomized controlled trials,” “practice guidelines,” and “meta-analyses.” We then acquired citations for systematic reviews, using the publication type “review” and the keyword phrase “systematic review.” Realistically, a clinician could engage Semantic MEDLINE using anything from a general keyword search to a very sophisticated search utilizing many of PubMed’s search options. In addition to providing the initial topic/point-of-view pairing, this method of forming search queries also provided a middle ground within the spectrum of queries a clinician might actually use. We also restricted publication dates to coincide with the most recently published source materials DynaMed used in building their recommendations, which served as the base for our evaluative reference standards (described in detail below). We restricted the retrieval publication dates in order to not retrieve materials that DynaMed curators could not have reviewed in creating their own recommendations. These cutoff dates are indicated in the Results section tabular data. The eight total search queries resulted in eight separate citation datasets, each representing a pairing of one of the four disease topics with one of the two subheading concepts. We executed the eight search queries and downloaded all citations in the period of July - August 2011.
Diabetes Mellitus, Non-Insulin-Dependent
Congestive heart failure (OR Heart failure)
Maintain normal body weight
Reduce sodium intake
Increased daily life activity
Higher folate intake
Regular aerobic physical activity
Diet reduced in saturated and total fat
Walking to work
Increased plant food intake
Diet rich in fruits, vegetables and low- fat dairy products
Regular tea consumption
Limit alcohol use
Reference standard intervention counts
Diabetes Mellitus Type 2
Congestive Heart Failure
We built eight baselines that simulated what a busy clinician might find when directly reviewing the PubMed citations. This is based on techniques developed by Fiszman  and Zhang . To build baselines for the four disease topic/drug treatment pairings, we processed their PubMed citations with MetaMap, restricting output to UMLS Metathesaurus preferred concepts associated with the UMLS semantic group Chemicals and Drugs, and removed vague concepts using Novelty processing. Threshold values were determined by calculating the average mean of term frequencies in a baseline group, and then adding one standard deviation to the mean. In each group, all terms whose frequency scores exceeded the threshold value were retained to form the group’s baseline. For example, for the congestive heart failure drug treatment group, the method extracted 1784 terms that occurred 63924 times in the MetaMap data, with a mean of approximately 35.8 occurrences per term, and a standard deviation of 154.4. This produced a cutoff threshold of 190.3. Therefore, all MetaMap terms that occurred 190 times or more were included in the congestive heart failure drug treatment baseline (a total of 72 terms). This method is meant to simulate the types of terms a busy clinician might notice when quickly scanning PubMed citations originating from a search seeking drug treatment for a given disease.
We formed baselines for citations emerging from each disease topic/prevention and control pairing in a similar manner. We extracted the lines from the associated PubMed citations that contained the phrases “prevent,” “prevents,” “for prevention of,” and “for the prevention of.” These lines were processed with MetaMap, and all UMLS Metathesaurus preferred concepts associated with the UMLS disorders semantic group were removed, since the focus was preventive interventions and not the diseases themselves. Threshold values were calculated for the remaining terms, and those whose frequencies exceeded their threshold scores were retained as baseline terms. To reiterate, preventive baselines (as well as the drug treatment baselines) are meant to simulate what a busy clinician might notice when seeking interventions while visually scanning PubMed citations originating from a search seeking such interventions for a given disease.
Comparing outputs to the reference standards
We evaluated outputs for the two summarization methods (Combo algorithm and conventional schema summarization) and the baselines by manually comparing them to the reference standards for the eight disease topic/subheading pairings. Since the reference standard was always a list of interventions, the comparison was straightforward. We measured recall, precision, and F1-score (balanced equally between recall and precision).
For both summarization systems, we measured precision by grouping subject arguments by name and determining what percentage of these subject groups expressed a true positive finding. For outputs for the four disease topic/drug intervention pairings, we limited analysis to semantic predications in the general form of “Intervention X_TREATS_disease Y”, where the object argument reflected the associated disease concept. If the subject intervention X argument matched a reference standard intervention, that intervention received a true positive status. In similar predications where the subject argument was a general term, such as “intervention regimes,” we examined the original section of citation text associated with the semantic predication. If this citation text indicated a reference standard intervention it received a true positive status. For example, in the dynamic summarization output for arterial hypertension prevention, the semantic predication “Dietary Modification_PREVENTS_Hypertensive disease” summarized citation text that included advice for dietary sodium reduction ; therefore, the reference standard intervention “reduce sodium intake” received a true positive status.
Only the Combo algorithm summarized output for the four disease topic/prevention and control pairings was compared to the reference standard, since there is no conventional schema for prevention. In addition to predications in the form “Intervention X_PREVENTS_disease_Y,” other predications where argument concepts had prevention terms such as “Exercise, aerobic_AFFECTS_blood pressure” and “Primary Prevention_USES_Metformin” were used, because their value was confirmed in a previous study .
We evaluated each baseline by comparing its terms to those of its associated reference standard. If a term in a baseline matched an intervention in the relevant reference standard, the baseline term received a true positive status. We also assigned true positive status to less specific baseline terms if they could logically be associated with related reference standard interventions. For example, in the baseline for pneumococcal pneumonia prevention the term “Polyvalent pneumococcal vaccine” was counted as a true positive, even though it did not identify a specific polyvalent pneumococcal vaccine that was in the reference standard.
Citation retrieval results, with cutoff retrieval dates in parentheses
Diabetes Mellitus Type 2
Congestive Heart Failure
SemRep semantic predication outputs
Diabetes Mellitus Type 2
Congestive Heart Failure
Combo algorithm-enhanced summarization semantic predication output
Diabetes Mellitus Type 2
Congestive Heart Failure
Conventional treatment schema semantic predications output
Diabetes Mellitus Type 2
Congestive Heart Failure
Performance Metrics, Drug Treatment Point-of-View, for Combo-enhanced dynamic summarization (DS), conventional treatment schema (TS), and baseline (BL) methodologies
Mellitus Type 2
Performance Metrics, Prevention Point-of-View, for Combo-enhanced dynamic summarization (DS), and baseline (BL) methodologies
Mellitus Type 2
Inter-Annotator Agreement (IAA)
Diabetes Mellitus Type 2
Congestive Heart Failure
The results imply that dynamic text summarization with the Combo algorithm provides a viable alternative to direct review of PubMed citations for locating decision support data. This is encouraging, because dynamic summarization could expand the value of Semantic MEDLINE at the point-of-care. Performance improvements over the baseline methodology can be seen in both recall and precision results. Including findings from both drug treatment and prevention analyses, Combo produced average recall and precision scores of 0.75 and 0.35, while the baseline method yielded average recall and precision values of 0.25 and 0.28. Combo summarization outperformed the baseline methodology by an average F1-score margin of 0.21. The Combo algorithm especially performed well in terms of recall for large datasets. For the three disease topic/point-of-view pairings whose initial citation input exceeded 1000 (the drug treatment topics of arterial hypertension, diabetes mellitus type 2, and congestive heart failure) average recall was 0.916.
Drug treatment outputs
Combo algorithm-enhanced dynamic summarization outperformed conventional summarization and the baseline method in recall, but was outperformed by conventional summarization in terms of precision. Combo summarization achieved 0.85 average recall, and 0.38 average precision. The conventional schema produced average recall and precision scores of 0.59 and 0.71. Both dynamic summarization and conventional summarization outperformed the baseline method, which produced average recall and precision scores of 0.23 and 0.31. Based on these findings, if a clinician wished to locate the maximum amount of drug treatment options using one of these three methods, Combo would be the better choice. On the other hand, the new method is less precise, but this effect is moderated by the visualization tool that Semantic MEDLINE offers. Visualization conveniently presents all citation data (including the text of the abstract itself) that are relevant to an Intervention X_TREATS_disease Y relationship in an easily viewed, reader- friendly display. Viewed in context, clinicians can quickly discard irrelevant treatments. We would argue that recall is more critical in clinical browsing than precision. The cognitive load required to dismiss a false positive is lower than trying to deduce a missing (false negative) treatment. We chose to use the standard F1-score because it is more conventional, but if we weight recall more, in line with the argument above, then the Combo summarization would be quite competitive with the conventional technique.
Combo summarization was less effective in identifying preventive interventions in the relevant reference standards, producing an average recall of 0.66 and an average precision rate of 0.33. There are two obvious possibilities for this diminished efficiency. First, the citation sets were substantially smaller than three of the four drug treatment citation sets, thus providing less initial data. As with most statistical techniques, larger sample sizes tend to lead to better performance. Second, preventive interventions described in text are often more general than drug therapies. For example, “lifestyle changes” may be more difficult to interpret in the SemRep phase. Also, the lower inter-annotator agreement scores suggest that clinicians are less apt to agree on prevention standards. This may also be reflected in the professional literature. Dynamic summarization with the Combo algorithm outperformed the baseline methodology, which produced an average recall of 0.27 and an average precision of 0.25. This suggests that dynamic summarization is a superior alternative to directly reviewing PubMed citations for identifying preventive interventions.
We classified false positive findings by type, and false negative findings by the first sequential data source (i.e., PubMed, SemRep output, dynamic summarization output) that did not include them.
Most of the false positives for both drug treatment and prevention points-of-view could be classified as unproductive general subject arguments; pharmaceuticals or supplements not included in the relevant reference standards; or other therapies not included in the relevant reference standards. In the prevention data, pharmaceuticals or supplements not included in the relevant reference standards accounted for 62.5% of all false positives, while unproductive general subject arguments and other therapies not included in the relevant reference standards accounted for 17.5% and 15.5%, respectively. In the drug treatment data, pharmaceuticals or supplements not included in the relevant reference standard accounted for an even greater percentage of false positives at 73.7%, while unproductive general subject arguments and other therapies not included in the relevant reference standard accounted for 14.2% and 12%. There are several possible reasons why there was such a high percentage of non-reference standard pharmaceutical or supplement false positives. Initial citation retrieval was not limited by a beginning publication date. In other words, all search queries retrieved relevant citations for as far back in time as PubMed made available. Therefore, information retrieval likely included older drugs which had been replaced by newer medications as preferred treatments. Also, we used a single data source in creating the reference standard. If we had included recommendations from other decision support tools in addition to those from DynaMed, the final reference standard might have included other treatments found within this false positive classification. Another data trend substantially contributed to reduced precision. Subject arguments that occurred two times or less in an output for a given disease topic/point-of-view pairing accounted for 69.7% of all false positives. If these arguments were removed from the output, average precision for both drug treatment and preventive intervention data combine would increase from 35% to 80%, with a proportionately small effect on recall.
Because Semantic MEDLINE is a pipeline application, data loss can be tracked by documenting the first sequential process (among PubMed retrieval, SemRep, and dynamic summarization) that does not include a reference standard intervention. We applied this method in analyzing false negative interventions to determine which process “lost” the desired data. In tracking the 23 false negatives that addressed a drug treatment point-of-view, PubMed retrieval did not garner 43.5% (10 false negatives); SemRep output did not include 47.8% (11 false negatives); and dynamic summarization did not identify 8.7% (2 false negatives). False negatives emerging from the prevention point-of-view data were slighted more balanced. In this case, PubMed retrieval did not include 41.2% (7 false negatives) while SemRep output did not include 35.3% (6 false negatives) and dynamic summarization output did not include 23.5% (4 false negatives). However, in analyses for both points-of-view, dynamic summarization performed better than the other two processes. Visualization output was not included; it was considered irrelevant, since it automatically includes all output from summarization.
PubMed retrieval volume and performance
Performance measurements suggest a system preference for larger citation input. Among search queries pairing the disease topics with the drug therapy subheading, the only query resulting in a relatively small amount of citations (the pneumonia pneumococcal query) also lead to comparatively diminished performance. System performance for pneumococcal pneumonia drug treatment data produced only 0.65 recall, while the other disease topic/drug treatment pairings achieved 0.89 or higher recall. System performance for prevention had similar results, with recall ranging from 0.50 to 0.76, with overall fewer citations than the drug treatment data. However, in a pilot project the system produced 100% recall for prevention data on a single disease topic (acute pancreatitis), with only 156 citations . We conclude that citation volume can be a factor for some clinical topics, but not for all of them. In cases like acute pancreatitis, where therapeutic options are narrow, the system can perform comparably despite a relatively sparse citation set.
Reference standards and system performance
We selected DynaMed as the source for our reference standards because it ranked among the top three point-of-care information delivery products in a recent study by Banzi and colleagues . We chose DynaMed instead of one of the other top-ranking products, EBM Guidelines  and UpToDate , because we did not have access to EBM Guidelines, and DynaMed’s presentation format was superior to that of UpToDate for the purposes of this study. However, DynaMed is not necessarily an all-inclusive source of effective interventions. By Banzi’s own disclosure, no decision support product proved to be “the best”, at least according to his criteria. Reference standards including recommendations from all three products may be more comprehensive, and shed better light on all three summarization methodologies’ recall and precision performance.
Comparisons to other methods
It is difficult to perform a one-to-one comparison with other text summarization methods, due to the unique reference standards we used to evaluate dynamic summarization. However a performance comparison with other applications that implement a conventional point-of-view refinement may offer valuable insight. Zhang and her colleagues incorporated an application utilizing degree centrality into Semantic MEDLINE with conventional treatment summarization . The degree centrality component was applied after summarization. This approach achieved 73% precision and 72% recall when evaluated with a handcrafted reference standard of answers to disease properties. Fiszman and colleagues created an application for identifying citations valuable to clinical guideline creation . Using guideline-oriented questions, they created a set of rules that functioned similarly to conventional summarization, to achieve a type of point of-view filtering for guideline-relevant data. This application achieved 40% recall and 88% precision using another manually-assembled reference standard of relevant and non-relevant citations. Combo-enhanced dynamic summarization achieved lower precision than these methods. However, its combine average recall for both drug treatment and preventive interventions exceeds that of both degree centrality and clinical guideline citation identification. In future work, when the precision-improving adjustments are applied, precision may exceed these products.
There are limitations in this study. It explores summarization for only two points-of-view (prevention and drug treatment) for the single task of decision support. However, an earlier study examined Combo-enhanced dynamic summarization for a genetic disease etiology point-of-view, within the task of secondary genetic database curation . The curation study revealed improved summarization performance for that task. In this current study, we examined dynamic summarization for just four disease topics. However, a pilot project  featuring three different disease topics (acute pancreatitis, coronary artery disease, and malaria), again within the context of preventive intervention decision support, produced slightly superior results. This creates optimism that this text summarization method may enable others to locate decision support data. The initial search queries that retrieved the PubMed citations utilized controlled vocabulary terms. Keyword queries may offer additional insight to the dynamic Semantic MEDLINE application. Finally, we evaluated system output with recommendations garnered from a single commercial decision support product. Comparing performance to other decision support sources may shed further light on Combo-enhanced dynamic summarization as a potential decision support tool.
In order to evaluate the performance of a new dynamic text summarization extension (Combo) within Semantic MEDLINE, we applied it, plus conventional Semantic MEDLINE, and a baseline summarization methodology (designed to mimic manual clinical review) to a clinical decision support task. We chose four disease topics and processed PubMed citations addressing their drug treatment and prevention. We processed the citations with SemRep, an application that transforms PubMed text into semantic predications. We then processed the SemRep output using the three summarization methodologies.
An evaluation using reference standards (clinically vetted DynaMed) showed that the new summarization method outperformed the conventional application and baseline methodology in terms of recall, while the conventional application produced the highest precision. Dynamic and conventional summarization were superior to the baseline methodology. These findings imply that the new text summarization application holds potential in assisting clinicians in locating decision support information.
Natural language processing
Unified medical language system.
We express our gratitude to Denise Beaudoin, Bruce Bray, and Stan Huff for serving as data reviewers. We also thank Stephane Meystre for his counsel in disease topic selection, and Thomas Rindflesch for his essential work in natural language processing. We also thank the National Library of Medicine for funding this work through grant number T15LM007123.
- Covell DG, Uman GC, Manning PR: Information needs in office practice: are they being met?. Ann Intern Med. 1985, 103 (4): 596-599.View ArticlePubMedGoogle Scholar
- Gorman PN, Helfand M: Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered. Med Decis Making. 1995, 15 (2): 113-119. 10.1177/0272989X9501500203.View ArticlePubMedGoogle Scholar
- Alper BS, Stevermer JJ, White DS, Ewigman BG: Answering family physicians' clinical questions using electronic medical databases. J Fam Pract. 2001, 50 (11): 960-965.PubMedGoogle Scholar
- Bergus GR, Randall CS, Sinift SD, Rosenthal DM: Does the structure of clinical questions affect the outcome of curbside consultations with specialty colleagues?. Arch Fam Med. 2000, 9 (6): 541-547. 10.1001/archfami.9.6.541.View ArticlePubMedGoogle Scholar
- Graber MA, Randles BD, Monahan J, Ely JW, Jennissen C, Peters B, Anderson D: What questions about patient care do physicians have during and after patient contact in the ED? The taxonomy of gaps in physician knowledge. Emerg Med J. 2007, 24 (10): 703-706. 10.1136/emj.2007.050674.View ArticlePubMedPubMed CentralGoogle Scholar
- Graber MA, Randles BD, Ely JW, Monnahan J: Answering clinical questions in the ED. Am J Emerg Med. 2008, 26 (2): 144-147. 10.1016/j.ajem.2007.03.031.View ArticlePubMedGoogle Scholar
- Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME: Answering physicians' clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005, 12 (2): 217-224.View ArticlePubMedPubMed CentralGoogle Scholar
- Chambliss ML, Conley J: Answering clinical questions. J Fam Pract. 1996, 43 (2): 140-144.PubMedGoogle Scholar
- Golder S, McIntosh HM, Duffy S, Glanville J: Developing efficient search strategies to identify reports of adverse effects in MEDLINE and EMBASE. Health Info Libr J. 2006, 23 (1): 3-12. 10.1111/j.1471-1842.2006.00634.x.View ArticlePubMedGoogle Scholar
- Hersh WR, Hickam DH: How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998, 280 (15): 1347-1352. 10.1001/jama.280.15.1347.View ArticlePubMedGoogle Scholar
- Hoogendam A, Stalenhoef AF, Robbe PF, Overbeke AJ: Analysis of queries sent to PubMed at the point of care: observation of search behaviour in a medical teaching hospital. BMC Med Inform Decis Mak. 2008, 8: 42-10.1186/1472-6947-8-42.View ArticlePubMedPubMed CentralGoogle Scholar
- Kilicoglu H, Fiszman M, Rodriguez A, Shin D, Ripple A, Rindflesch TC: Semantic MEDLINE: A Web Application for Managing the Results of PubMed Searches. Proceedings of the Third International Symposium for Semantic Mining in Biomedicine:. 2008, 2008: 69-76.Google Scholar
- Mani I: Automatic summarization. 2001, John Benjamins Publishing Company, AmsterdamView ArticleGoogle Scholar
- Hahn U, Mani I: The challenges of automatic summarization. Computer. 2000, 33 (11): 29-36.View ArticleGoogle Scholar
- McKeown K, Elhadad N, Hatzivassiloglou V: 3rd ACM/IEEE-CS Joint Conference on Digital Libraries: 27-31 May 2003. Leveraging a Common Representation for Personalized Search and Summarization in a Medical Digital Library. Edited by: Lois Delcambre, Geneva Henry. 2003, IEEE Computer Society, , 159-170.Google Scholar
- Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino JJ, Ely J, Yu H: AskHERMES: An online question answering system for complex clinical questions. J Biomed Inform. 2011, 44 (2): 277-288. 10.1016/j.jbi.2011.01.004.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang J, Cohen AM, Hersh W: Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. AMIA Annu Symp Proc. 2007, 831-835.Google Scholar
- Rindflesch TC, Fiszman M: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003, 36 (6): 462-477. 10.1016/j.jbi.2003.11.003.View ArticlePubMedGoogle Scholar
- Workman TE, Hurdle JF: Dynamic summarization of bibliographic-based data. BMC Med Inform Decis Mak. 2011, 11: 6-10.1186/1472-6947-11-6.View ArticlePubMedPubMed CentralGoogle Scholar
- Rindflesch TC, Fiszman M, Libbus B: Medical Informatics: Knowledge Management and Data Mining in Biomedicine. Semantic Interpretation for the Biomedical Research Literature. Edited by: Chen H, Fuller S, Hersh W, Friedman C. 2005, Springer, New York, 399-422.Google Scholar
- Lindberg DA, Humphreys BL, McCray AT: The Unified Medical Language System. Methods Inf Med. 1993, 32 (4): 281-291.PubMedGoogle Scholar
- Takahashi K, Saga Y, Mizukami H, Takei Y, Machida S, Fujiwara H, Ozawa K, Suzuki M: Cetuximab inhibits growth, peritoneal dissemination, and lymph node and lung metastasis of endometrial cancer, and prolongs host survival. Int J Oncol. 2009, 35 (4): 725-729.PubMedGoogle Scholar
- Aronson AR: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001, 17-21.Google Scholar
- Rindflesch TC: SemRep Predicates. 2012, Technical Report. Cognitive Sciences Branch, Lister Hill National Center for Biomedical CommunicationsGoogle Scholar
- Fiszman M, Rindflesch TC, Kilicoglu H: Abstraction Summarization for Managing the Biomedical Research Literature. Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics. 2004, 2004:Google Scholar
- Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC: Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform. 2009, 42 (5): 801-813. 10.1016/j.jbi.2008.10.002.View ArticlePubMedGoogle Scholar
- Fiszman M, Rindflesch TC, Kilicoglu H: Summarizing drug information in Medline citations. AMIA Annu Symp Proc. 2006, 254-258.Google Scholar
- Sneiderman C, Demner-Fushman D, Fiszman M, Rosemblat G, Lang FM, Norwood D, Rindflesch TC: Semantic processing to enhance retrieval of diagnosis citations from Medline. AMIA Annu Symp Proc. 2006, 1104:Google Scholar
- Ahlers CB, Fiszman M, Demner-Fushman D, Lang FM, Rindflesch TC: Extracting semantic predications from Medline citations for pharmacogenomics. Pac Symp Biocomput. 2007, 209-220.Google Scholar
- Workman TE, Fiszman M, Hurdle JF, Rindflesch TC: Biomedical text summarization to support genetic database curation: Using Semantic MEDLINE to create a secondary database of genetic information. J Med Libr Assoc. 2010, 98 (4): 273-281. 10.3163/1536-5050.98.4.003.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang H, Fiszman M, Shin D, Miller CM, Rosemblat G, Rindflesch TC: Degree centrality for semantic abstraction summarization of therapeutic studies. J Biomed Inform. 2011, 44: 830-838. 10.1016/j.jbi.2011.05.001.View ArticlePubMedPubMed CentralGoogle Scholar
- Kullback S, Leibler RA: On information and sufficiency. Annals of Mathematical Statistics. 1951, 22 (1): 79-86. 10.1214/aoms/1177729694.View ArticleGoogle Scholar
- Resnik P: Selectional constraints: An information-theoretic model and its computational realization. Cognition. 1996, 61: 127-159. 10.1016/S0010-0277(96)00722-6.View ArticlePubMedGoogle Scholar
- Riloff E: Automatically Generating Extraction Patterns from Untagged Text. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. 1996, 1044, 1049Google Scholar
- Riloff E, Phillips W: An introduction to the Sundance and Autoslog Systems. 2004, University of Utah School of, ComputingGoogle Scholar
- Mani I: Maybury MT: Advances in automatic text summarization. 1999, MIT Press, CambridgeGoogle Scholar
- Banzi R, Liberati A, Moschetti I, Tagliabue L, Moja L: A review of online evidence- based practice point-of-care information summary providers. J Med Internet Res. 2010, 12 (3): e26-10.2196/jmir.1288.View ArticlePubMedPubMed CentralGoogle Scholar
- 7-Step Evidence-Based Methodology.http://www.ebscohost.com/dynamed/content.php,
- Pneumococcal Pneumonia.http://www.ebscohost.com/DynaMed/,
- Khan NA, Hemmelgarn B, Padwal R, Larochelle P, Mahon JL, Lewanczuk RZ, McAlister FA, Rabkin SW, Hill MD, Feldman RD: The 2007 Canadian Hypertension Education Program recommendations for the management of hypertension: part 2 - therapy. Can J Cardiol. 2007, 23 (7): 539-550. 10.1016/S0828-282X(07)70798-5.View ArticlePubMedPubMed CentralGoogle Scholar
- Workman TE, Stoddart JM: Rethinking information delivery: using a natural language processing application for point-of-care data discovery. J Med Libr Assoc. 2012, 100 (2): 113-120. 10.3163/1536-5050.100.2.009.View ArticlePubMedPubMed CentralGoogle Scholar
- EBM Guidelines.www.ebm-guidelines.com,
- Fiszman M, Ortiz E, Bray B, Rindflesch T: Semantic processing to support clinical guideline development. AMIA Annu Symp Proc. 2008, 187: 191.
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/12/41/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.