Skip to main content

Table 1 Dataset statistics: number of reports, sentences, entity, modifier and phenotype (label) annotation per data set (ESS dev/test vs Tayside dev/test) for annotator 1

From: A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

 

ESS Dev

ESS Test

Tayside Dev*

Tayside Test*

Reports

364

266

362

700

Sentences

3837

2855

2791

3948

Tokens

32,229

22,842

50,522

48,519

Total Entities

4332

2924

2997

2986

Disease Entities

2373

1494

1361

1501

Modifier Entities

1959

1430

1636

1485

Total Phenotypes (Labels)

792

518

558

506

Atrophy

187

122

90

164

Small vessel disease

245

159

60

145

Stroke, underspecified

24

15

16

< 5

Haemorrhagic stroke, deep, old

2

4

< 5

< 5

Haemorrhagic stroke, deep, recent

2

2

< 5

< 5

Haemorrhagic stroke, lobar, old

4

3

7

< 5

Haemorrhagic stroke, lobar, recent

1

4

< 5

< 5

Haemorrhagic stroke, underspecified

7

10

94

15

Ischaemic stroke, cortical, old

112

61

27

26

Ischaemic stroke, cortical, recent

21

14

19

12

Ischaemic stroke, deep, old

140

85

60

41

Ischaemic stroke, deep, recent

7

4

< 5

< 5

Ischaemic stroke, underspecified

5

12

85

15

Haemorrhagic transformation

1

1

10

< 5

Subdural haematoma

9

6

20

8

Subarachnoid haemorrhage, aneurysmal

1

0

< 5

< 5

Subarachnoid haemorrhage, other

6

6

21

7

Microbleed, deep

2

1

< 5

< 5

Microbleed, lobar

2

1

< 5

< 5

Microbleed, underspecified

0

1

< 5

< 5

Tumour, glioma

0

0

< 5

< 5

Tumour, meningioma

2

4

< 5

< 5

Tumour, metastasis

2

0

22

37

Tumour, other

10

3

21

12

  1. *Small numbers suppressed in the NHS Tayside table due to data governance requirements