|  |  | Model: Any Females | Model: No females |
---|
Parameter | Example inputs | Distance function | Coefficient |
p
| Parameter | Example inputs |
---|
constant
|  |  | -3.33 | <1 × 10-16
|
constant
| Â |
Date of birth, day
| 02, 24 | Levensthein | 0.25 | 5 × 10-4
|
Date of birth, day
| 02, 24 |
Date of birth, month
| 01, 11 | Levensthein | 0.43 | 1 × 10-4
|
Date of birth, month
| 01, 11 |
Date of birth, year
| 1969, 2007 | Levensthein | 5.12 | <1 × 10-16
|
Date of birth, year
| 1969, 2007 |
Forename
| John, Chris | Jaro-Winkler | 2.45 | <1 × 10-16
|
Forename
| John, Chris |
Hospital Number
| 110111 or 223456 | Jaro-Winkler | -0.56 | <1 × 10-16
|
Hospital Number
| 110111 or 223456 |
Surname
| Smith, Jones | Jaro-Winkler | not present | - |
Surname
| Smith, Jones |
Sex
| M, F | Levensthein | 0.80 | <1 × 10-16
|
Sex
| M, F |
- A random sample of 25,000 clusters was obtained after initial record linkage. These clusters were divided into those which, on the basis of a series of rules, were thought to represent one individual ('good'), or the others ('uncertain'). The uncertain records were not used in model generation. Good clusters were then combined randomly creating a new set of clusters ('bad'). Maximal distances were computed by pairwise comparison of good and bad clusters, and a logistic model was fitted modelling bad cluster status relative to good cluster status for clusters without females, or for clusters including at least one record identified as being from a female, with backwards selection based on AIC. In the female model, surname was omitted; in the non-female model, there is only one level for the Sex field, which was therefore omitted. A model fitted is shown; very similar estimates were obtained from a large number of other builds with different random samples. p refers to the null hypothesis that the coefficient is zero.