Skip to main content

Table 4 Corpus statistics

From: Fine-grained information extraction from German transthoracic echocardiography reports

Name

Description

Filter

#

%

 

all TTE reports

 

70441

100.0

 

only relevant sites

f site

68915

97.8

T d

dominant layouts

fsite, fchar≥800, \(\bar {f}_{\text {li}}\)

63489

90.1

T u

mostly unstructured

fsite, fchar≥100, fchar<800, \(\bar {f}_{\text {li}}\)

2712

3.9

T c

uncommon layout

fsite, fli

1041

1.5

 

mostly defective

fsite, fchar<100

1673

2.4

  1. fsite: filter that excludes three sites of the hospital. fchar≥n: require at least n non white space characters. fli: at least 5 list elements