Table 6 Classifier performance on an independent validation set of 25,000 complex clusters

From: An efficient record linkage scheme using graphical analysis for identifier error detection

Status Predicted not bad Predicted bad Total % predicted bad
Unknown 5623 2501 8124 30.7
Good 15975 901 16876 5.33
Bad 337 11698 12035 97.2
  1. The logistic classifier derived to identify bad clusters (not bad refers to a single individual within a cluster, bad refers to more than one individual), shown in Table 4, was applied to a further random sample of 25,000 clusters obtained after initial record linkage. These were classified into 'good' 'unknown status' and 'bad' using rules, as described in Table 4 Legend and methods. The classifier performance on this validation set is shown.