Skip to main content

Table 1 Systems for full and partial matching, applicable to different identifier types. D, data; H, hypothesis that the proband and candidate under consideration are the same person. For each system, columns sum to 1, since the options for D are mutually exclusive. (A) Two-state system: a match occurs or does not occur. There is a probability pc (c, correct) that an identifier is correctly represented (is the same) when the proband and candidate are the same person, and a probability pe = pen (e, any error; en, error yielding no match) that an error or mismatch occurs. In the population, there is a probability pf (f, full match) that a randomly selected other person shares the proband’s identifier, and a probability pn (n, no match) that they do not. (B) Three-state (“fuzzy”) system. The nature of a partial match is specific to the identifier type. For example, for date of birth (DOB), the partial match is a DOB with 2/3 of year/month/day correct. Probabilities pc, pf, pen, and pn are as before, but now there is a probability pep (ep, error yielding partial match) that when the proband and candidate are the same person, the identifiers match only partially (so pe = pep + pen), and a probability ppnf that that a random other person will share the partial but not the full identifier. It may be easier to measure pp, the total probability of a partial or full match, than ppnf. (C) Four-state system. Partial matches now occur in two variants, hierarchically. (D) Adjustments for unordered pick-the-best comparisons between multiple identifiers of the same type (e.g. surnames, postcodes). “Positive” comparisons are those for which the log likelihood ratio is > 0. (E) Adjustments for ordered pick-the-best comparisons between multiple identifiers of the same type (e.g. forenames), using the probability po (o, ordered) that, given H, for ≥ 2 candidate identifiers, the candidate’s order strictly matches the proband’s, and its converse probability pu (u, unordered). Positive comparisons are strictly ordered when each proband identifier’s index matches the corresponding candidate’s identifier. † For P(D | ¬H), adjustments use the Bonferroni correction (see text)

From: De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation

Data, D

P(D | H, same person)

P(D | ¬H, different person)

A. Two-state comparison

 Match

pc = 1 − pe = 1 − pen

pf

 No match

pe = pen

pn = 1 − pf

B. Three-state comparison

 Full match

pc = 1 − pe = 1 − pep − pen

pf

 Partial (but not full) match

pep

ppnf = pp − pf

 No match

pen

pn = 1 − pp

C. Four-state comparison

 Full match

pc = 1 − pe = 1 − pep1 − pep2np1 − pen

pf

 Partial match type 1 (but not full)

pep1

pp1nf = pp1 − pf

 Partial match type 2 (but not full or partial type 1)

pep2np1

pp2np1

 No match

pen

pn = 1 − pp = 1 − pp2np1 − pp2nf

D. Adjustments for unordered multi-identifier comparison

 For 1 ≤ c ≤ min(n, m) “positive” comparisons between proband identifiers 1…n and candidate identifiers 1…m

 × 1, no correction

\(\times \prod^{c-1}_{i=0} \left(m-i\right)\)

E. Adjustments for ordered multi-identifier comparison

 For c ≥ 1 “positive” comparisons, m > 1, and strict order match

 × po

 × 1, no correction

 For c ≥ 1 “positive” comparisons, m > 1, and order mismatch †

 × pu = 1 − po

\(\times \left(\left[\prod^{c-1}_{i=0} \left(m-i\right)\right]-1\right)\)