Skip to main content

Table 18 Results from a Cohort Identification Experimenta

From: Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Phrase 1 (containing the Arabic numerical variant)

Number of patients with Phrase 1 only

% of patients missed if searching only for Phrase 1

Number of patients with both Phrase 1 and Phrase 2

Number of patients with Phrase 2 only

% of patients missed if searching only for Phrase 2

Phrase 2 (containing the Roman numerical variant)

citrullinemia type 1

2

25.0

1

1

50.0

citrullinemia type I

type 2 diabetes mellitus

43,777

10.5

7919

6053

75.8b

type II diabetes mellitus

type 1 neurofibromatosis

181

24.5

56

77

57.6b

type I neurofibromatosis

Tanner Stage 3

7639

57.8b

1373

12,367

35.7

Tanner Stage III

grade 3 anaplastic astrocytoma

42

36.7

27

40

38.5

grade III anaplastic astrocytoma

stage 3 chronic kidney disease

615

67.4b

446

2190

18.9

stage III chronic kidney disease

factor 9 deficiency

14

68.1b

51

139

6.9

factor IX deficiency

class 3 malocclusion

135

81.2b

115

1079

10.2

class III malocclusion

phase 1 clinical trial

320

66.5b

263

1158

18.4

phase I clinical trial

Mallampati score: 4

121

27.8

1

47

71.6b

Mallampati score: IV

  1. aReesults from a cohort identification exercise for 10 diagnoses and clinical findings in the clinical notes, including counts of the number of patients identified by searching for phrases containing either the Arabic or Roman numeral variants, or both. The percentage of patients potentially missed by searching for only one of the variants is displayed
  2. b Cells with percentages > 50%