Skip to main content

Table 2 A subset of the 35 rules used by the system to detect relevant allergy information

From: Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Rule

Description

Comments

1. Document filtering

Documents must contain concept-related words, or else they are filtered out [7, 32]

E.g., for the clinical concept of “allergy”, documents must contain concept-related words associated with e.g. “allergy”, “allergen” “allergic reaction” or “symptom”

2. Paragraph Boundary

Concept-related words must be located within the same paragraph [81, 82]

E.g., in case an “allergy”, “reaction” or “symptom” is identified in another sentence, a check for conformity with rule 2 and 3 is initiated again

3. Window of context

Allergy concept-related words must be located within the same sentence, or if located in adjacent sentences must be in proximity (within a ± 6 word distance), of other identified allergy concept-related words [9, 69]

Distance tolerance can easily be adjusted in the system. We experimented with different scopes. As also reported by Afzal et al. [5], we found a six to ten word distance to be optimal

4. Dependency

Concept-related words can be of type [32]

1) Exist alone

2) Primary (exist when supported by 1 or 3)

3) Secondary (depend on 2 for existence)

E.g., for the clinical concept of “allergy”: while words of type 1 (strong indicators like e.g. “Anaphylaxis”) are allowed to “exist alone” in a sentence, other types must conform to rules 2 and 3

5. Part of compound words

In case concept-related words are found as part of seldom-used compound words, they are also highlighted [69, 83, 84]

E.g., “Cave information”

6. Header detection

Specific rules apply for relevant text detected below headers until the start of the next detected paragraph [9, 69]

E.g., in case of “Allergies” header, all allergy concept-related words (with certain limitations) should be highlighted

7. Highlight color

The degree of word concept-relatedness determines (from low to high) text highlight color yellow, orange or red [81]

E.g., allergy concept-related words are highlighted in the text

8. Disambiguation

Often repeated words where non-conceptual meaning (“word sense disambiguation”) is alluded are filtered out [9, 18]

E.g., «the patient reacts to light» in eye examination reports

9. Negation

Detection of positive/negative contexts is handled by checking for the existence of negations in the text [6, 33]

E.g., “reacts to Penicillin” versus “does not react to Penicillin”

10. Permutations

Use of the word permutations dictionary may be enabled or disabled [13, 69]

An algorithm detects and stores the most used misspellings not already covered by the clinical knowledge base

11. Omitted documents

EHR documents of certain types or with certain headers or contents may be left out [28, 83]

Documents which, e.g., contain specific sensitive information may be left out of the results for some users

12. Concept search access control

Performing search for specific clinical concepts may be assigned or restricted to a group of users

Group of users may be defined in a flexible way

E.g., a clinical department or individual users