Skip to main content

Table 5 Precision and recall for all data elements using the NLP system

From: Natural language processing for populating lung cancer clinical research data

Data elements

Number of patients in existing Dataset (A)

Number of patients with true NLP results (B)

Number of patients with NLP results (C)

Precision1 (B/A)

Precision2 (B/C)

Recall

Time window

Stage

2127

1330

1883

0.625

0.706

0.885

90 days

2127

1328

1883

0.624

0.705

0.885

60 days

2127

1325

1883

0.623

0.704

0.885

30 days

Histology

2208

1918

1989

0.869

0.885

0.982

90 days

2208

1914

2164

0.867

0.884

0.980

60 days

2208

1889

2154

0.856

0.877

0.976

30 days

Tumor grade

1635

1182

1203

0.723

0.902

0.801

90 days

1635

1170

1300

0.716

0.900

0.795

60 days

1635

1143

1274

0.700

0.897

0.779

30 days

Chemotherapy

1674

1674

1674

1

1

1

365 days

Radiotherapy

769

769

769

1

1

1

365 days

Surgery

312

312

312

1

1

1

365 days