Skip to main content

Advertisement

Table 1 Dataset statistics: number of reports, sentences, entity, modifier and phenotype (label) annotation per data set (ESS dev/test vs Tayside dev/test) for annotator 1

From: A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

  ESS Dev ESS Test Tayside Dev* Tayside Test*
Reports 364 266 362 700
Sentences 3837 2855 2791 3948
Tokens 32,229 22,842 50,522 48,519
Total Entities 4332 2924 2997 2986
Disease Entities 2373 1494 1361 1501
Modifier Entities 1959 1430 1636 1485
Total Phenotypes (Labels) 792 518 558 506
Atrophy 187 122 90 164
Small vessel disease 245 159 60 145
Stroke, underspecified 24 15 16 < 5
Haemorrhagic stroke, deep, old 2 4 < 5 < 5
Haemorrhagic stroke, deep, recent 2 2 < 5 < 5
Haemorrhagic stroke, lobar, old 4 3 7 < 5
Haemorrhagic stroke, lobar, recent 1 4 < 5 < 5
Haemorrhagic stroke, underspecified 7 10 94 15
Ischaemic stroke, cortical, old 112 61 27 26
Ischaemic stroke, cortical, recent 21 14 19 12
Ischaemic stroke, deep, old 140 85 60 41
Ischaemic stroke, deep, recent 7 4 < 5 < 5
Ischaemic stroke, underspecified 5 12 85 15
Haemorrhagic transformation 1 1 10 < 5
Subdural haematoma 9 6 20 8
Subarachnoid haemorrhage, aneurysmal 1 0 < 5 < 5
Subarachnoid haemorrhage, other 6 6 21 7
Microbleed, deep 2 1 < 5 < 5
Microbleed, lobar 2 1 < 5 < 5
Microbleed, underspecified 0 1 < 5 < 5
Tumour, glioma 0 0 < 5 < 5
Tumour, meningioma 2 4 < 5 < 5
Tumour, metastasis 2 0 22 37
Tumour, other 10 3 21 12
  1. *Small numbers suppressed in the NHS Tayside table due to data governance requirements