Skip to main content
Fig. 4 | BMC Medical Informatics and Decision Making

Fig. 4

From: De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation

Fig. 4

Matching performance for pairwise linkages between medical records databases, at different decision thresholds. In each panel, rows show the “from”/source/proband [p] database, and columns show the “to”/destination/sample [s] database (see Table 3). A Performance based on calculated log odds only: receiver operating curves using the software’s calculated log odds as the predictor (ignoring δ, i.e. taking δ = 0). The response variable was whether the proband was in the sample. (This does not guarantee that the correct candidate has been identified, for which see panels D, E.) Crosses indicate the default value of θ; diagonal lines represent random classification; AUROC, area under the receiver operating curve. Plots are not shown for databases matched to themselves, for which FPR is not calculable since all probands are in the sample. B, C True positive rate (TPR, declaring a match when the proband is in the sample, regardless of whether the correct person is identified), based on the two-stage decision process using θ and δ. Note the non-zero baseline, and that the TPR can include misidentification (see Methods). Panel B plots against θ, the minimum log odds for a match to be declared. The number of people overlapping between the two databases, o, is shown. Note that PCMIS TPR values are lower when it is a sample database than a proband database, as it contained NHS number duplication (see Table 3); a priori, this might reduce the TPR slightly when this database is the sample. Panel C plots against delta, δ, the additional log-odds threshold by which the leading candidate must beat the next-best candidate for a match to be declared. D, E Misidentification rates (MID): the probability that a declared match was for the wrong person. Note the difference of scale. Graphical conventions as for B, C. Vertical lines and black segments on the colour spectra show the software’s default thresholds

Back to article page