Skip to main content

Table 2 Identifier cleaning

From: An efficient record linkage scheme using graphical analysis for identifier error detection

All fields Fields converted to uppercase blanks (e.g. whitespace) deleted All fields
Forename & Surname remove of forenames containing baby/infant/twins, or synonyms.
Remove all symbols, e.g. '.
deletion of records matching internal hospital test individuals.
removal of non-alphabetic values
remove values containing only one letter
reverse forename and surname if stated forename does not exist in any patient administration records as a forename
Forename & Surname
Sex Remove unless M, F, U characters, representing male, female or unknown, respectively Sex
Hospital numbers Remove checkdigits
Remove out-of-range numeric values
Deleted, along with all other identifiers, if the patient is from a Genito Urinary medicine clinic, or from the Occupational Health Department.
Hospital numbers
NHS numbers Delete out-of-range values
Delete values not conforming to checkdigit requirement as described:
http://www.datadictionary.nhs.uk/data_dictionary/attributes/n/nhs_number_de.asp
NHS numbers
Birthdate & Deathdate Conversion to SQL date format
remove dates before 1860-01-01
remove dates in the future
Birthdate & Deathdate
  1. The steps taken in cleaning data items are described.