Setting | Value | Comment |
---|---|---|
§ SECURITY | ||
Hash method | HMAC-MD5 | HMAC-MD5 has a space of 1632 = 3.40 × 1038. The software also offers HMAC-SHA-256 (space size 1664 = 1.16 × 1077) and HMAC-SHA-512 (space size 16128 = 1.34 × 10154) |
Number of significant figures for rounding frequencies in hashed version | 5 | Rounding reduces the identifiability of numbers. Some precision is required to distinguish metaphone from name frequencies |
§ POPULATION PRIORS: NATIONAL | ||
Name/metaphone/F2C frequencies for forenames, by gender | [many] | ¶ From US baby name frequencies 1880–2015 [58], covering ~345 M people, processed via CRATE [59]. UK data by year is also available [60] |
Name/metaphone/F2C frequencies for surnames | [many] | From US 1990 and 2010 Census surname frequencies [61, 62], processed via CRATE [59] |
Minimum frequency for forenames, fminforename. (If a frequency was less than this, this minimum was used instead.) | 5 × 10−6 | A minimum is required for unknown names. For the US forename data cited, the floor frequency is ~2.9 × 10−8; however, allowing extremely low frequencies (e.g. much below 1/np) increases the chances of a spurious match, because a name match can add up to ln(1/fmin) to the log odds |
Minimum frequency for surnames, fminsurname | 5 × 10−6 | As above. For the US surname data cited, the lowest frequency reported is 3 × 10−7, but we used a threshold above 1/np |
P(female | female or male) | 0.51 | With a binary sex choice, the UK is 51% female and 49% male [63] |
P(not female or male) | 0.004 | Approximately 0.4% of the UK consider their gender neither male nor female [64] |
Postcode data, for pfpostcode, ppnfpostcode, and pnpostcode | – | From UK Office for National Statistics data [65], licensed under the Open Government Licence version 3.0 |
POPULATION PRIORS: LOCAL | ||
Population size, np | 852,523 | Population estimate of Cambridgeshire and Peterborough for 2018 [66] |
Birth year “range” b | 30 | † The prior probability of two people sharing a DOB was taken as 1/365.25⋅b. A value of 90 may be reasonable for a full UK population with few long-deceased people [67], but we used an empirical value reflecting the subsampled age composition of one of our databases |
Postcode frequency multiple kpostcode | nUK/np | Where nUK is the 2017 UK population, 66,040,000 [67] |
Population proportion assumed to be assigned a pseudopostcode (e.g. ZZ99 3VZ, no fixed abode; ZZ99 3CZ, England/UK not otherwise specified) or a postcode unknown to the postcode database (including typographical errors creating an invalid postcode), ppseudopostcode_unit. Taken as an estimate for each unknown/pseudopostcode unit frequency | 0.00201 | † Based on the proportion of people in the SystmOne database with a ZZ99 3VZ (no fixed abode) postcode. This is higher than an estimate from national data (see Results), potentially reflecting a bias from a healthcare environment, so this value may need alteration in other contexts |
Pseudopostcode multiple kpseudopostcode such that ppseudopostcode_sector = kpseudopostcode × ppseudopostcode_unit | 1.83 | † Based on an empirical value for ZZ993:ZZ993VZ (see Results). This number cannot be < 1 and should be > 1 to avoid ppnfpostcode = 0 |
ERROR RATES (given proband/candidate are the same person) | ||
pep1forename | F: 0.00894 M: 0.00840 | †¶ Probability that a forename pair exhibits partial 1 (metaphone) match but not a full (name) match |
pep2np1forename | F: 0.00881 M: 0.00688 | †¶ Probability that a forename pair exhibits a partial 2 (F2C) match, but not a partial 1 (metaphone) or full (name) match |
penforename | F: 0.00572 M: 0.00625 | †¶ Probability that a forename pair exhibits no match at all |
puforename | 0.00191 | † Probability, amongst a set of ≥ 2 forenames, of an error that shuffles the names out of strict order |
pep1surname | F: 0.00551 M: 0.00471 | †¶ Probability that a surname pair exhibits a partial 1 (metaphone) match but not a full (name) match |
pep2np1surname | F: 0.00378 M: 0.00247 | †¶ Probability that a surname pair exhibits a partial 2 (F2C) match, but not a partial 1 (metaphone) or full (name) match |
pensurname | F: 0.0567 M: 0.0134 | †¶ Probability that a surname pair exhibits no match at all |
pepdob | 0.00459 | † Probability of a DOB error causing a partial (year/month, month/day, or year/day) match |
pendob | 0 | The probability of a DOB error causing no match at all. Using 0 rather than the empirical value of 0.00033 produces a major speed advantage; see Results |
pegender | 0.0033 | † The probability that proband/candidate (when the same person) do not match on gender |
peppostcode | 0.0097 | † The probability that a proband/candidate postcode pair (when the same person) exhibits a partial (postcode sector) match but not a full (postcode unit) match, e.g. due to error or because someone has moved within a postcode sector |
penpostcode | 0.300 | † The probability that two postcodes for the same person mismatch completely |