Annotation and extraction of age and temporally-related events from clinical histories

Hong, Judy; Davoudi, Anahita; Yu, Shun; Mowery, Danielle L.

doi:10.1186/s12911-020-01333-5

Volume 20 Supplement 11

Informatics and machine learning methods for health applications

Research
Open access
Published: 30 December 2020

Annotation and extraction of age and temporally-related events from clinical histories

Judy Hong¹,
Anahita Davoudi²,
Shun Yu³ &
…
Danielle L. Mowery ORCID: orcid.org/0000-0003-3802-4457^2,4

BMC Medical Informatics and Decision Making volume 20, Article number: 338 (2020) Cite this article

2444 Accesses
5 Citations
1 Altmetric
Metrics details

Abstract

Background

Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes. However, details of age and temporally-specified clinical events are not well captured, consistently codified, and readily available to research databases for study.

Methods

We expanded upon existing annotation schemes to capture additional age and temporal information, conducted an annotation study to validate our expanded schema, and developed a prototypical, rule-based Named Entity Recognizer to extract our novel clinical named entities (NE). The annotation study was conducted on 138 discharge summaries from the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus. In addition to existing NE classes (TIMEX3, SUBJECT_CLASS, DISEASE_DISORDER), our schema proposes 3 additional NEs (AGE, PROCEDURE, OTHER_EVENTS). We also propose new attributes, e.g., “degree_relation” which captures the degree of biological relation for subjects annotated under SUBJECT_CLASS. As a proof of concept, we applied the schema to 49 H&P notes to encode pertinent history information for a lung cancer cohort study.

Results

An abundance of information was captured under the new OTHER_EVENTS, PROCEDURE and AGE classes, with 23%, 10% and 8% of all annotated NEs belonging to the above classes, respectively. We observed high inter-annotator agreement of >80% for AGE and TIMEX3; the automated NLP system achieved F1 scores of 86% (AGE) and 86% (TIMEX3). Age and temporally-specified mentions within past medical, family, surgical, and social histories were common in our lung cancer data set; annotation is ongoing to support this translational research study.

Conclusions

Our annotation schema and NLP system can encode historical events from clinical notes to support clinical and translational research studies.

Background

In medicine, clinical histories contained within the electronic health record (EHR) document pertinent age and temporal information that could be useful for determining a patient’s disease risk, understanding the course of a disease phenotype, and predicting patient health outcomes [1,2,3]. Studies suggest that patients have elevated cancer risk, if one or more family members have cancer, if these cancers occur significantly earlier in life than those with sporadic cancer in the general population, or if the patient has a personal history of other prior cancers [1, 4]. Specifically, patient clinical histories play an important role in explaining risk of developing lung cancer. Studies suggest that about 8% of lung cancers are inherited or occur as a result of a genetic predisposition [2, 5, 6]. Patients have increased risk of lung cancer when multiple family members are affected with lung cancer, particularly first-degree relatives with early-onset lung cancer [7, 8]. Smokers have as much as a 15 to 30-fold increased risk of developing cancer, particularly lung cancer, when compared with their non-smoker counterparts [9]. Occupational exposures i.e., production, manufacturing, and factory workers as well as environmental exposures i.e., air pollution when considered independently from tobacco smoking are among the top 10 causes of lung cancer mortality in the United States [10]. Therefore, better characterization of lung cancer risk may lead to improved and better targeted screening efforts, which can potentially save patient lives because earlier detection of lung cancer is known to improve survival [11].

With the rapid adoption of EHR systems with coded data collection modules e.g., family and social history modules [4], clinical histories are increasingly available in electronic, structured formats allowing for large-scale retrospective research. However, the details of age and temporally-specified clinical events—past diagnoses, risk factors, surgical interventions—for both patients and family members are not well captured, consistently codified, nor readily available to research databases for oncology studies and to clinical decision support systems for cancer risk screening. Such events are often documented in unstructured form through clinical texts i.e., discharge summaries and history and physical (H&P) notes. To allow the analysis of such data, natural language processing (NLP) technologies are becoming increasingly important [12]. Our long-term goal is to construct patient phenotype profiles of relevant clinical events from all pertinent EHR data to support a variety of clinical and translational research studies and applications. For example, determining associations between clinical histories (past medical, family, social, and occupational histories) and genetic biomarkers with lung cancer outcomes e.g., progression and mortality [13]. As a means to this end, our short-term goal is to complete the development and validation of: (1) an annotation schema that explicitly describes age and temporal information in a computable format, (2) an annotation study that demonstrates this information can be manually and reliably encoded according to the annotation schema, (3) a prototypic NLP system that demonstrates such information can be automatically, accurately, and efficiently extracted, and (4) as proof of concept, an annotation study of H&P notes that demonstrates the portability and usability of this schema for encoding pertinent historical findings for a translational research study of a lung cancer cohort.

Annotation of age and temporal information

In the last 10 years, several clinical corpora have been created, providing temporal and semantic representations and annotations for developing NLP systems. Most annotation schemas build on top of the TimeML standard [14], which captures explicit temporal expressions such as times, dates, and durations. Elhadad et al. utilized TimeML to annotate TIMEX3 data elements for the ShARe corpus, which consists of de-identified, clinical free-text notes from the MIMIC II database [15]. Similarly, Styler IV et al. [16] annotated the “Temporal Histories of Your Medical Events” (THYME) corpus using an extension of the ISO-TimeML standard [16]. The inter-annotator agreement (IAA)^{Footnote 1} achieved for this schema on the THYME corpus is 80% for events and 80% for temporal expressions.^{Footnote 2} Viani et al. [17] expanded TimeML to create the CALEX schema, which captures age as a Named Entity (NE), although without attributes qualifying properties about the age mention. The IAA achieved for this schema on mental health records from the Clinical Record Interactive Search (CRIS) database was 77% overall for temporal expressions.^{Footnote 3}

Although there has been significant work in temporal modeling, existing annotation standards do not encode age information and have limited coverage of subjects other than the patient. These standards do not encode implicit mentions nor the degree of biological relation between the patient and other subjects important for clinical and translational research studies. Furthermore, most standards focus on annotating diseases and disorders, but other clinical events i.e., procedures and social determinants of health (SDOH), can also be relevant to patient outcomes; e.g., occupation and environmental exposures with relationships to lung cancer.

Extraction of age and temporal information

Successful NLP systems have been developed to extract age and temporal information utilizing supervised machine learning algorithms, heuristics, and rule-based components. One notable effort is the 2012 i2b2 temporal relations challenge which provided the research community with a corpus of discharge summaries annotated with temporal information for the development and evaluation of temporal reasoning systems [18]. For event detection, statistical machine learning (ML) methods consistently showed superior performance. For example, Xu et al. [19] trained a conditional random field (CRF) name entity extraction, achieving a 92% overall F1 score for extracting events. For the detection of temporal expressions, ML and rule-based methods performed equally well, though the best systems adopted a rule-based approach for value normalization. For example, Tang et al. [20] utilized predefined regular expressions applied within the HeidelTime system, achieving 87% overall F1 for temporal expression extraction. Finally, Mowery et al. developed a rule-based age information extraction system for discerning age of onsets from death with the free-text comments of an EHR family health history module using the Fast Healthcare Interoperability Resource (FHIR) standard, achieving a F1-score performance of 94% onset and 94% death [4].

Several automated NLP tools already exist to extract explicit temporal expressions and named entities that describe disease disorders. However, we introduce new NEs and attributes in our annotation schema to capture previously un-annotated age and temporal information. As an initial step towards automation, we built a prototypical tool to assess the amount of work necessary to extract this information and to serve as a foundation for a future hybrid rule-based and ML extraction method for large datasets. Finally, we demonstrate that our expanded schema can capture pertinent historical findings from a sample of history and physical (H&P) notes for a lung cancer cohort study. By applying the schema to this subset of H&P notes from the University of Pennsylvania Health System, we aim to assess the portability and usability of the expanded schema for representing pertinent clinical histories for lung cancer research.

Methods

In this University of Pennsylvania Institute Review Board (IRB)-approved pilot study, we leveraged the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus [21], a subset of the Medical Information Mart for Intensive Care (MIMIC)-II database [22] collected from the intensive care units of Beth Israel Deaconess Medical Center. We sampled 138 de-identified, free-text discharge summaries. To focus the search for historical rather than acute events, we extracted sections (past medical history, past surgical history, family history, social history) which have a higher likelihood of containing age and temporally-specified clinical events (Fig. 1).

Development of annotation schema

We aimed to develop a schema that integrates Named Entities (NEs), attributes, and relationships relevant for representation of age and temporal information. To align our annotated classes with current and well-adopted annotation efforts in the NLP community, we added new and expanded existing annotation classes to the ShARe [21], TimeML [14], and CALEX [17] schemas. Additionally, new classes were constructed based on a linguistic study of 20 randomly-selected discharge summaries from ShARe corpus. Documents were annotated according to the proposed schema in batches of 5 by authors, JH, a data scientist, and DM, a clinical informaticist. After each batch, we reached consensus, updated the annotation schema, and modified the annotation guidelines. At the end of the schema development process, previously annotated documents were revised to create a final reference standard. The schema addresses information important for interpreting a patient’s clinical history: (1) NEs, (2) attributes and their values, and (3) relationships between NEs.

Named entities, attributes, and relationships

The existing annotation scheme for the 2014 ShARe/CLEF eHealth Challenge corpus included NE classes: TIMEX3 (T), DISEASE_DISORDER (DD), SUBJECT_CLASS (S). We propose 3 new NEs: AGE (A), PROCEDURE (P), OTHER_EVENTS (OE) due to their relevance to clinical and translational research studies. We also describe a new attribute type, degree_relation, for the pre-existing SUBJECT_CLASS (S) and expand the class to include implicit mentions.

For each NE below, we define boundaries—start and end offsets—for the NE span in the text with square brackets followed by a subscript indicating the annotation type e.g., [left knee arthroscopy]_P is a spanned PROCEDURE mention.

TIMEX3 (T): describes any text span that specifies a temporal expression about a clinically-relevant event (i.e. DISEASE_DISORDER, PROCEDURE, or OTHER_EVENT). TIMEX3 has attribute types: date, time, and duration.

Ex. “Arthroscopy in [1997]_T”, Type: date.
AGE (A): describes any text span that specifies a subject’s age or the age at which a clinically relevant event occurred. AGE has the attribute types: fully-specified, less-specified, event-specified. It can be normalized to an age range (e.g. START 70, END 79) and time range (e.g. START 10/10/1950, END 10/10/1959).

Ex. “Patient had [childhood]_A diabetes”, Type: event-specified.
SUBJECT_CLASS (S): describes any span of text that refers to a subject that is not the patient. While the ShARe schema only annotates explicitly-mentioned entities experiencing a DISEASE_DISORDER, we expand SUBJECT_CLASS to include any subject that is not the patient. We also include implicit references to other subjects, commonly found within the family history sections. Ex: Subject explicitly referenced: “[father]_S had [CAD]_DD”. Subject implicitly referenced: “[family]_S history: notable for [CAD]_DD”. Following from the ShARe schema, SUBJECT_CLASS can be normalized to: family_member, donor_other, donor_family_member, other.

A novel attribute type in our schema describes the subject’s degree of biological relation, degree_relation: 0, 1, 2, 3, not_biologically_related, unknown. In clinical and translational research studies, capturing the degree of relation is important for studying and determining disease heritability within families [1]. Degree of relation is based on genetic similarity to the patient [23] e.g., 0th = identical twin, 1st = parent, siblings, offspring.

Ex. “[Sister]_S had breast cancer”, Degree_relation: 1.
DISEASE_DISORDER (DD): describes any span of text that can be mapped to a concept in the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) terminology, which belongs to the Disorder semantic group. DISEASE_DISORDER has the attribute DocTimeRel: after, overlap, before_overlap, before, and unknown which specifies the temporal relation between the entity and the time of document creation. The DocTimeRel attribute is critically important for encoding the relative time of a clinical event when more explicit and informative temporal expressions are not provided in the text. The entity has the attribute associatedCode, which specifies the Unified Medical Language System (UMLS) Concept Unique Identifiers (CUI) that best describes the entity.

Ex. “Patient with [end-stage renal disease]_DD”, DocTimeRel: before_overlap, associatedCode: C2316810: chronic kidney disease stage 5.
PROCEDURE (P): describes any span of text that can be mapped to a concept in the SNOMED-CT terminology, which belongs to the Procedure semantic group. Similar to DISEASE_DISORDER, PROCEDURE has the attribute DocTimeRel: after, overlap, before_overlap, before, and unknown which specifies the temporal relation between the entity and the document creation time. PROCEDURE also has the attribute associatedCode, which specifies the UMLS CUI that best describes the entity.

Ex. “[appendectomy]_P scheduled for next week”, DocTimeRel: after, associatedCode: C0003611: appendectomy.
OTHER_EVENTS (OE): describes any social determinant of health (SDOH) or other events that could be clinically relevant. OTHER_EVENTS has attribute type with values: martial status, death, good health, substance use, occupation, exposure, living situation, outcome of procedures, and other. This novel entity will help us understand how existing annotation standards can be expanded to include more clinically relevant events. OTHER_EVENTS also has the attribute DocTimeRel and the optional attribute associatedCode, which specifies the UMLS CUI for mentions of substance use only.

Ex. “Patient is a [factory worker]_OE”, DocTimeRel: before_overlap, type: occupation.

Annotation study

Each document was pre-annotated with certain NEs and attributes (TIMEX3, DISEASE_DISORDER, SUBJECT_CLASS) from the 2014 ShARe/CLEF eHealth Challenge. The aim of this annotation task is to expand these existing annotations according to our more detailed annotation schema. Specifically, annotators were instructed to do the following:

1.
Annotate new NEs (AGE, PROCEDURE, and OTHER_EVENTS)
2.
Identify new spans of text under the expanded definition of certain NEs (e.g. annotating “wife” as a relevant SUBJECT_CLASS)
3.
Add new attributes for existing NEs (degree_relation for SUBJECT_CLASS)
4.
Link NEs with relationships when multiple NEs are required to fully capture the description and semantic meaning of a clinical event, e.g. what was the type of clinical event, who experienced it, when it was experienced.

In the sentence, “[Father]_S [died]_OE of [MI]_DD at [age 69]_A.”, the annotated relationships include:

OTHER_EVENT (OE)-to-SUBJECT_CLASS (S)
DISEASE_DISORDER (DD)-to-SUBJECT_CLASS (S)
OTHER_EVENT (OE)-to-AGE (A)

Annotations were carried out by DM and AD from the Semantic Analysis of Text to Inform Clinical Action (SemAnTICA) laboratory of the University of Pennsylvania using the extensible Human Oracle Suite of Tools (eHOST) annotation tool [24]. Over the course of one week, JH trained both annotators with the annotation schema and reviewed how to apply the schema to clinical notes leveraging the annotation software. To reduce the likelihood of annotator fatigue due to the schema’s complexity, we assigned the majority attribute value for the previously annotated conditions as default values. Annotators were instructed to change default values to semantically represent the mention in the text. Annotators were trained with batches of 5 documents each, and annotator performance inter-annotator agreement (IAA) was measured using the F1-score, the harmonic mean of recall and precision. The F1-score was calculated between each annotator and the reference standard using the eHOST built-in IAA report generator. Annotation performance was assessed using three levels of agreement determination: (1) NE, (2) NE + attributes, (3) NE + attributes + relationships.

Then, the annotators were given two weeks to independently annotate mutually-exclusive note sets each (n = 59 discharge summaries), to produce an annotated corpus of 138 documents in total (inclusive of the development set of 20 discharge summaries).

To assess the utility of novel elements in our annotation schema, we report the distribution of annotated classes and attributes. We also report the distribution of other events including martial status, death, good health, substance use, occupation, exposure, living situation, outcome of procedures, and other mentions observed in our corpus.

The resulting annotations were leveraged to develop and evaluate an NLP pipeline for the automated extraction of these semantic and temporal classes as a traditional named entity recognition (NER) task.

Automated named entity recognition

The annotated corpus was randomly split into development, training, test, and future holdout sets in a 15:45:20:20 ratio, respectively. The development set facilitated annotator training and schema development. The training set was used for manual rule engineering and NLP system development, with validation on the test set. The holdout set was set aside to validate deep learning components in future work. This workflow is illustrated in Fig. 2.

Leveraging the training set, we developed two NLP modules to extract NEs from the clinical texts. The first module extracted the AGE, SUBJECT_CLASS and TIMEX3 entities using rule-based matching; the second module extracted the DISEASE_DISORDER and PROCEDURE entities using QuickUMLS [25]. Both modules were built and integrated using spaCy V2.1, an open-source software library for advanced NLP. Integration within a spaCy pipeline supports future integration with deep learning packages and fast information extraction leveraging a Cython compiler.

In the first module, we extracted non-medical NEs: AGE, SUBJECT_CLASS, and TIMEX3 using a rule-based system. This system relies on a number of features, namely the section in which the entity is found and the pattern of its text mention. More specifically, section labels were identified using a set of keywords (e.g. “past medical history”, “family history”). We trained spaCy’s EntityRuler to extract new NEs based on pattern dictionaries and regular expressions. For example, AGE can be identified with patterns combined with regular expressions: “##-year-old, ## yo”, where # is a numeric. The trained EntityRuler was added to the spaCy pipeline using nlp.add_pipe.

In the second module, we extracted medical NEs: DISEASE_DISORDER and PROCEDURE using QuickUMLS against the UMLS Metathesaurus [26].^{Footnote 4} QuickUMLS is a fast, unsupervised, approximate dictionary-matching algorithm for medical concept extraction [25]. Compared to other state-of-the-art entity extraction tools including MetaMap and cTAKES, QuickUMLS achieves similar precision and recall, but is 135 times faster; thus, scalable to large datasets. The QuickUMLS module receives notes as input and returned a set of spans in the notes as well as UMLS concepts associated with each span. We integrated the output from this module as a post-processing step after applying spaCy’s EntityRuler. The training set was used to tune QuickUMLS to extract any concepts from the UMLS semantic types within Table 1. Notably, the inclusion or exclusion of semantic types resulted in trade-offs between precision and recall (to be discussed further in the “Discussion” section).

Table 1 Included UMLS semantic types for entity extraction

Full size table

For this study, attribute normalization and relation detection were outside of scope and will be left to future work. We integrated the EntityRuler and QuickUMLS modules as the final components of the spaCy NLP pipeline, which consists of section tagger, sentence segmentor, tokenizer, named entity recognizer, and assertion detector modules (Fig. 3).

Evaluation

We evaluated the performance of the automated NER pipeline on the validation set of 30 notes, using a customized python (v3.7.0) script. Specifically, we defined matches between the NER pipeline extractions and the reference standard annotations using overlapping spans. For example, a true positive was defined as overlapping annotations assigned to the same NE type. We counted the number of true positives (TP: system’s span occurs in the annotated corpus), false positives (FP: system’s span does not occur in the annotated corpus), and false negatives (FN: system did not identify a span in the annotated corpus). We computed recall to determine the proportion of the annotated corpus spans that the system identified, and precision to determine the proportion of correctly-identified spans by the system. We also measured the F1-score (i.e., the harmonic mean of precision and recall) to quantify overall agreement for the NER tasks [27].

Lung cancer demonstration study

As part of an ongoing translational research study, we aim to determine associations between clinical histories (past medical, family, and social histories) and genetic biomarkers with lung cancer outcomes e.g., progression and mortality. To demonstrate that the schema can represent and capture pertinent historical information for determining age and dates of past and current cancer diagnoses, pertinent lung cancer diagnostic procedures and therapies, familial cancers, and social histories/exposures, authors SY, a clinical oncologist and DM, applied the schema with consensus review to 49 history and physical (H&P) notes from the University of Pennsylvania Health System for patients with confirmed stage IIIB+ non-small cell lung cancer. We report initial descriptive statistics for historical events with age/time specifications and degree of relation for family histories when applicable.

Results

Annotator training

We report inter-annotator agreement with the initial reference standard (n = 20 documents) according to the three levels of match determination for NEs, their attributes, and relationships between them in Fig. 4.

Annotator agreement generally decreases as match criteria become stricter. Notably, agreement for the AGE and TIMEX3 classes remain unchanged even after attributes and relationships are added. This indicates that the annotation schema is well-designed for these classes and/or that these classes are easier to annotate. For other classes, a noticeable drop in agreement occurs when attributes are included.

For the annotation of NEs only, annotators achieved nearly 100% agreement with the reference standard across all classes. When the annotation of attributes and relationships is included, annotators achieved high agreement (>91%) for the AGE, TIMEX3, and SUBJECT_CLASS, but agreement was poorer (<70%) for DISEASE_DISORDER, PROCEDURE and OTHER_EVENTS. For DISEASE_DISORDER and PROCEDURE, this is largely attributable to differing allocation of UMLS CUIs. For OTHER_EVENTS, which captures relevant SDOH, more training is required to achieve a consistent allocation of attribute values.

We compared our performance to the widely used ISO-TIMEML standard. When applied on the THYME corpus, the standard achieved IAA of 96%^{Footnote 5} for NE annotation, and 80% and 80% for attribute annotation for events^{Footnote 6} and TIMEX3, respectively [16]. Our schema achieves similar or better annotator agreement for NE annotation, but lags in agreement for attribute annotation. In future work, we aim to expand annotator training and further refine the annotation schema to promote consistency between annotators. For the purpose of developing an automated NER extraction system, annotators need to annotate NEs consistently. Based on the IAA achieved at the end of the training period for annotation of NEs, annotators were able to proceed with single-annotation of the full dataset.