Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification

Table 18 Results from a Cohort Identification Experiment^a

(a)	(b)	(c)	(d)	(e)	(f)	(g)
Phrase 1 (containing the Arabic numerical variant)	Number of patients with Phrase 1 only	% of patients missed if searching only for Phrase 1	Number of patients with both Phrase 1 and Phrase 2	Number of patients with Phrase 2 only	% of patients missed if searching only for Phrase 2	Phrase 2 (containing the Roman numerical variant)
citrullinemia type 1	2	25.0	1	1	50.0	citrullinemia type I
type 2 diabetes mellitus	43,777	10.5	7919	6053	75.8^b	type II diabetes mellitus
type 1 neurofibromatosis	181	24.5	56	77	57.6^b	type I neurofibromatosis
Tanner Stage 3	7639	57.8^b	1373	12,367	35.7	Tanner Stage III
grade 3 anaplastic astrocytoma	42	36.7	27	40	38.5	grade III anaplastic astrocytoma
stage 3 chronic kidney disease	615	67.4^b	446	2190	18.9	stage III chronic kidney disease
factor 9 deficiency	14	68.1^b	51	139	6.9	factor IX deficiency
class 3 malocclusion	135	81.2^b	115	1079	10.2	class III malocclusion
phase 1 clinical trial	320	66.5^b	263	1158	18.4	phase I clinical trial
Mallampati score: 4	121	27.8	1	47	71.6^b	Mallampati score: IV

^aReesults from a cohort identification exercise for 10 diagnoses and clinical findings in the clinical notes, including counts of the number of patients identified by searching for phrases containing either the Arabic or Roman numeral variants, or both. The percentage of patients potentially missed by searching for only one of the variants is displayed
^b Cells with percentages > 50%

ISSN: 1472-6947