Skip to main content

Table 5 Effects of demographic factors on linkage accuracy, for the RiO (proband) to SystmOne (sample) comparison. Logistic regression predicting correct linkage (declaring a match to the correct person) amongst probands known to be in the sample, excluding those without a known deprivation centile (final n = 126,179), at default decision thresholds of θ = 5 and δ = 0. Coefficient: change in log odds of linkage for every unit change in the predictor (positive coefficient, greater likelihood of linkage; negative coefficients, lesser likelihood). F tests are from analysis of variance using Type III sums of squares (“over and above” all other predictors). Z tests are simple tests of coefficients. All values are shown to three significant figures. *** p < 0.001; ** p < 0.01; * p < 0.05; ICD-10, World Health Organization International Classification of Diseases, tenth revision; MH, mental health; NS, not significant; SMI, severe mental illness

From: De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation

Term

F

pF

Coefficient

Standard error

Z

pZ

(Intercept)

+29.9

1.63

+18.4

< 2 × 10−16 ***

Birth year (≈ inverse age)

F1,126166 = 277

 < 2.2 × 10−16 ***

−0.0133

0.000823

−16.2

< 2 × 10−16 ***

Sex

F2,126166 = 129

 < 2.2 × 10−16 ***

    

 Female

  

(reference)

   

 Male

  

 +0.618

0.0406

+15.2

< 2 × 10−16 ***

 Other or unknown

  

−1.39

0.364

−3.81

0.000137 ***

Ethnicity

F5,126166 = 20.5

 < 2.2 × 10−16 ***

    

 White

  

(reference)

   

 Asian

  

+0.0910

0.147

+0.617

0.537, NS

 Black

  

+ 0.564

0.262

+2.16

0.0312 *

 Mixed

  

+0.506

0.175

+2.89

0.00392 **

 Other

  

+0.0618

0.208

+0.297

0.767, NS

 Unknown

  

−0.340

0.0387

−8.79

< 2 × 10−16 ***

Deprivation centile (0 least, 100 most)

F1,126166 = 17.4

3.02 × 10−5 ***

−0.00287

0.000681

−4.21

2.54 × 10−5 ***

Diagnostic group

F2,126166 = 76.7

< 2.2 × 10−16 ***

    

 No MH ICD-10 codes

  

(reference)

   

 MH code but not SMI

  

+0.672

0.0627

+10.7

< 2 × 10−16 ***

 SMI

  

+0.779

0.141

+5.52

3.40 × 10−8 ***

Presence of a pseudopostcode

F1,126166 = 3.70

0.0544, NS

−1.03

0.461

−2.24

0.0249 *