The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records

Background Electronic health records are invaluable for medical research, but much information is stored as free text rather than in a coded form. For example, in the UK General Practice Research Database (GPRD), causes of death and test results are sometimes recorded only in free text. Free text can be difficult to use for research if it requires time-consuming manual review. Our aim was to develop an automated method for extracting coded information from free text in electronic patient records. Methods We reviewed the electronic patient records in GPRD of a random sample of 3310 patients who died in 2001, to identify the cause of death. We developed a computer program called the Freetext Matching Algorithm (FMA) to map diagnoses in text to the Read Clinical Terminology. The program uses lookup tables of synonyms and phrase patterns to identify diagnoses, dates and selected test results. We tested it on two random samples of free text from GPRD (1000 texts associated with death in 2001, and 1000 general texts from cases and controls in a coronary artery disease study), comparing the output to the U.S. National Library of Medicine’s MetaMap program and the gold standard of manual review. Results Among 3310 patients registered in the GPRD who died in 2001, the cause of death was recorded in coded form in 38.1% of patients, and in the free text alone in 19.4%. On the 1000 texts associated with death, FMA coded 683 of the 735 positive diagnoses, with precision (positive predictive value) 98.4% (95% confidence interval (CI) 97.2, 99.2) and recall (sensitivity) 92.9% (95% CI 90.8, 94.7). On the general sample, FMA detected 346 of the 447 positive diagnoses, with precision 91.5% (95% CI 88.3, 94.1) and recall 77.4% (95% CI 73.2, 81.2), which was similar to MetaMap. Conclusions We have developed an algorithm to extract coded information from free text in GP records with good precision. It may facilitate research using free text in electronic patient records, particularly for extracting the cause of death.

This document describes tests comparing the U.S. National Library of Medicine's MetaMap natural language processing system [1] with the Freetext Matching Algorithm (FMA). MetaMap incorporates Wendy Chapman's Negex algorithm for detection of negation [2] whereas FMA uses a custom algorithm. By default, MetaMap maps free text to terms from a range of medical source vocabularies supplied with the program ('USAbase'), but it can be restricted to a subset or configured to use a custom vocabulary.
In order to compare the output of the two algorithms, we configured MetaMap so that it would map diagnoses in free text to Read terms, similar to the output from FMA. We used two alternative source vocabularies: either the same set of GPRD Read and OXMIS terms as FMA, or the full Read Clinical Terms Version 3.
We tested the detection of diagnoses and other clinical concepts on a set of 1000 previously anonymised GPRD free text records from cases and controls in a study on coronary artery disease (500 from cases, 500 from controls). This was denoted the 'general' test set. The gold standard was manual review by a clinician (ADS). We also tested the detection of negation in the publicly available Negex test set of annotated anonymised clinical reports [3]. The clinician annotation of 'Affirmed' or 'Negated' was considered the gold standard. We did not test MetaMap on the GPRD 'Death' test set because MetaMap could not recognise death certificate categories in formats such as "1a (diagnosis) 1b (diagnosis) ...", which was common in GPRD and which FMA was specifically programmed to interpret. We wrote a script in R 2.14.1 [4] (see Appendix on page 12) to tabulate the results from MetaMap analyses in order to facilitate manual review.

Metamap using Full Read source vocabulary
We used the 2011 Linux version of MetaMap with the 2011AA lexicon. We used the Unified Medical Language System (UMLS) MetaMorphoSys program (2011 version) [5] to generate a UMLS subset containing only the Read Clinical Terms Version 3 (UMLS source code RCD99); 347,577 terms representing 186,073 unique clinical concepts. We suppressed terms of type AA (attribute type abbreviation), AB (abbreviation in any source vocabulary), IS (obsolete synthesized term), OA (obsolete abbreviation) and OP (obsolete preferred term), and specified the order of preference for the remaining term types which MetaMap could map to: Order Source Term type code Description --composite_phrases 3 causes MetaMap to construct longer, composite phrases from the simple phrases produced by the parser. This allows phrases such as "pain on the left side of the chest" to map to a single concept 'Left sided chest pain' rather than separate concepts as it would without the option. The integer option is the maximum number of prepositional phrases that will be added to a noun, and we used the value of 3 as advised in the MetaMap 2011 Release Notes [8] --ignore_word_order allows MetaMap to ignore the order of words in the phrases it processes --word_sense_disambiguation causes MetaMap to attempt to disambiguate among concepts scoring equally well in matching input text (e.g. whether the word "cold" refers to a low temperature or coryzal illness) --allow_concept_gaps allows MetaMap to retrieve candidates with gaps (such as 'Unspecified childhood psychosis' for 'unspecified psychosis') --unique_acros_abbrs_only allows the generation of acronym/abbreviation variants if they have unique expansions The 32 sets of output Read/OXMIS terms from these combinations were merged into a single results table for comparison with output from FMA. Any discrepancies were checked manually by a clinician (ADS). As with the FMA tests described in the main article, the aim was to assess recall and precision in the detection of positive diagnoses. Other Read terms matched (e.g. symptoms, referrals) were assessed for precision only.
Manual review was against the standard that a term without any other context information should represent a current or past diagnosis for that patient. A term was considered 'strictly correct' if it was the best availble term for the concept. A term was considered 'correct' but not 'strictly correct' if it was correct but not the best available map (e.g. "breast cancer" mapped to the OXMIS term 'CANCER'). If the correct interpretation of a term is negative, or it does not apply to the current patient (e.g. the term refers to a family member or advice), it was considered to be a false positive.
The combination of options which produced best performance was used in testing MetaMap with the full Read dictionary.

MetaMap using Full Read source vocabulary
As well as diagnoses, the full Read dictionary contains terms for temporality, laterality, body parts etc. which can match fragments of text in isolation but may not convey clinically useful information (for example, the Read term 'Disease' could match any mention of the word "disease" in the text). Therefore we restricted the output to Read terms with the following semantic types and other Read terms extracted from the same phrase (which might give additional contextual information): • Acquired Abnormality For example, if the text 'chest pain' was analysed as a single phrase and mapped to the concepts 'Chest [body part]' and 'Pain [Sign or Symptom]', they were assessed as jointly conveying the meaning of the text. We ignored any output Read terms which did not fit these criteria, and also ignored terms consisting of numbers / fractions or those which were ambiguous or did not convey useful information: 'Afraid', 'Awareness', 'Carries', 'Confidence', 'Drive', 'Feelings', 'Finding', 'Forgetful', 'Gifted', 'Happy', 'Hopelessness', 'Nothing', 'Opposition', 'Palpation', 'Planning', 'Recognition', 'Related', 'Runs', 'Sad'. For example, the term 'Forgetful' may be extracted from texts such as "he forgot to pick up the prescription", which does not necessarily imply that the patient is generally forgetful.
We used the MetaMap analysis options: --composite _ phrases 3 --word _ sense _ disambiguation --ignore _ word _ order --unique _ acros _ abbrs _ only. This combination of options was chosen because it had the best performance when tested using the Read/OXMIS source vocabulary, apart from word sense disambiguation which had no effect.
The results were reviewed manually by a clinician (ADS) and classified as 'correct' if there was no error and 'strictly correct' if the term or combination of terms represented the entire meaning of the diagnosis in the text (including context). The precision of detection of positive diagnoses, positive non-diagnosis terms and negated terms was assessed separately. Recall was assessed for positive diagnoses only, to be consistent with the standard of testing for FMA.

Detection of negation in the Negex test set
The Negex algorithm classifies word or phrases in the text as negated or affirmed. However, FMA and MetaMap output a list of mapped terms, and negation applies to terms in the output rather than words in the text. Some sentences did not contain any concepts that were mapped to Read or OXMIS terms, so the program could not produce any information about negation for this sentence.
We tested the detection of negation in the publicly available Negex test set of 2376 anonymised sentences from 120 clinical reports of 6 types (emergency department, discharge summaries, surgical pathology, radiology, operative notes, echocardiograms) [3]. This dataset already contains a clinician annotation of 'Negated' or 'Affirmed' for each sentence. The negation status applies to a particular 'condition' (e.g. a diagnosis, symptom or examination finding) in the text, and the rest of the sentence might give clues as to its negation status. The algorithm (FMA or MetaMap) was initially applied to the 'condition' to map it to one or more Read or OXMIS terms. If a map was found, the whole sentence was interpreted and the negation states of terms which were also extracted from the 'condition' text were processed as follows: • No match -manual review by a clinician, as the terms detected from the whole sentence may be different to those from the text of the condition in isolation MetaMap using our custom Read/OXMIS source vocabulary was unable to detect negation, so MetaMap was tested using the 'Full Read' and 'USAbase' source vocabularies. The MetaMap analysis options were: --composite _ phrases 3 --word _ sense _ disambiguation --ignore _ word _ order --unique _ acros _ abbrs _ only, as per the tests on GPRD free text. The FMA attributes 'negative', 'negpmh' (negative past medical history) and 'negfamily' (negative family history) were considered negative, and all other attributes or a blank attribute were considered positive.
The results for FMA and MetaMap were presented both separately (for all terms detected by the algorithm), and as a matched analysis restricted to terms mapped by both algorithms. For the matched analysis we tested the null hypothesis that if the algorithms gave different results, the correct algorithm was equally likely to be FMA as MetaMap (McNemar's test). Hypothesis tests and confidence intervals for proportions were calculated using the exact binomial distribution in R 2.14.1 [4].

Metamap using custom Read / OXMIS source vocabulary
Allowing composite phrases and ignoring word order improved recall when MetaMap with the custom Read/OXMIS vocabulary was tested on the GPRD 'general' test set (Table 1). Allowing concept gaps resulted in worse precision. Allowing unique abbreviations and acronyms resulted in two fewer incorrect non-diagnosis terms when analysed with composite phrases and allowing concept gaps, but made no difference otherwise. The results were identical whether or not the word sense disambiguation option was used.
The set of options with the highest F-score was to allow composite phrases, ignore word order and not allow concept gaps. This combination achieved 64% recall and 69% precision on positive diagnoses (see Table 3 in the main article).

MetaMap using Full Read source vocabulary
Performance was better using the 'Full Read' source vocabulary, with precision 94% and recall 61% on detection of positive diagnoses, and precision 92% for positive non-diagnosis terms (see Table 3 in the main article). Negated terms were detected with precision 74%.

Detection of negation in the Negex test set
Out of the terms extracted by each algorithm, both FMA and MetaMap correctly detected the negation status of the term in about 95% of cases, but MetaMap extracted more terms than FMA (   In this example MetaMap suggests a more specific term than FMA for bone metastases, but Negex does not classify the 'Cord compression' concept as negated.

Read/OXMIS term Attribute Readscore
Read J11y.00: Unspecified gastric ulcer 92 Read 1A25.00: Urgency negative 100 OXMIS 7860C: DYSURIA negative 100 The word "GU" is incorrectly interpreted as an abbreviation for 'gastric ulcer', but the readscore is low because the synonym entry has a low 'priority' denoting ambiguity. Urinary frequency is not detected because 'urinary' is not stated explicitly in the text. FMA recognised 'denies' but not 'denied'. This can be corrected by amending the attrib2 table of (patterns for context detection) to recognise the word 'denied' for negation. However Negex correctly detects negation of the headache symptom.