Skip to main content

Table 6 Classifier performance on an independent validation set of 25,000 complex clusters

From: An efficient record linkage scheme using graphical analysis for identifier error detection

Status

Predicted not bad

Predicted bad

Total

% predicted bad

Unknown

5623

2501

8124

30.7

Good

15975

901

16876

5.33

Bad

337

11698

12035

97.2

  1. The logistic classifier derived to identify bad clusters (not bad refers to a single individual within a cluster, bad refers to more than one individual), shown in Table 4, was applied to a further random sample of 25,000 clusters obtained after initial record linkage. These were classified into 'good' 'unknown status' and 'bad' using rules, as described in Table 4 Legend and methods. The classifier performance on this validation set is shown.