Skip to main content

Table 2 Levels of matched records using a variety of techniques.

From: The SAIL databank: linking multiple health and social care datasets

 

Levels of matched records

 

Primary Care General Practice

(GP dataset)

Secondary Care Hospital Admissions

(PEDW dataset)

Social Services

(PARIS database)

 

Number

%

Number

%

Number

%

Sample size

229,127

 

290,650

 

18,540

 

Valid NHS Number

229,117

99.996%

264,868

91.13%

-

0.00%

Valid NHS Number plus DRL:

229,123

99.998%

280,729

96.59%

14,158

76.36%

Valid NHS Number plus PRL (99% cut off):

229,125

99.999%

287,572

98.94%

17,095

92.21%

Valid NHS Number plus PRL (95% cut off):

229,125

99.999%

288,186

99.15%

17,431

94.02%

Valid NHS Number plus PRL (90% cut off):

229,125

99.999%

288,424

99.23%

17,553

94.68%

Valid NHS Number plus PRL (50% cut off):

229,125

99.999%

288,670

99.32%

17,639

95.14%

Overall combining Valid NHS, DRL & PRL (50%):

229,125

99.999%

288,683

99.32%

17,642

95.16%

  1. The numbers (and percentages) of records that could be matched using deterministic record linkage (DRL) and a various thresholds of probabilistic record linkage (PRL) were assessed for each of three test datasets: the GP dataset, the PEDW dataset and the PARIS database. Records with a valid NHS number were accepted. The matching rate achieved by applying DRL followed by PRL (to the 50% threshold) was also assessed, and the final row shows this result of operating the MACRAL algorithm as illustrated in Figure 1.