From: Clinical records anonymisation and text extraction (CRATE): an open-source software system
Method | Source information | Example of case-insensitive regular expression(s) for scrubbing |
---|---|---|
Words | John Al’Rahem | ► \bJohn\b►\bRahem\b |
Phrase | 4 Privet Drive | â–º \b4\W+Privet\W+Drive\b |
Number | (01223) 123456 | ► (?< !\d)0\W*1\W*2\W*2\W*3\W*1\W*2\W*3\W*4\W*5\W*6(?!\d) |
Alphanumeric code | CB12 3DE | â–º \bC\W*B\W*1\W*2\W*3\W*D\W*E\b |
Date | 31 Dec 2016 | â–º 0*31(?:st|nd|rd|th)?\W*(?:0*12|Dec(?:ember)?)\W*(?:20)?16 â–º (?:0*12|Dec(?:ember)?)\W*0*31(?:st|nd|rd|th)?\W*(?:20)?16 â–º (?:20)?16\W*(?:0*12|Dec(?:ember)?)\W*0*31(?:st|nd|rd|th)? |
Nonspecific: 10-digit numbers | – | ► (?< !\d)[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9][ \t]*[0-9](?!\d) |
Nonspecific: UK postcodes | – | ► \b[A-Z][0-9]\s*[0-9][A-Z][A-Z]\b ► \b[A-Z][0-9][0-9]\s*[0-9][A-Z][A-Z]\b ► \b[A-Z][A-Z][0-9]\s*[0-9][A-Z][A-Z]\b ► \b[A-Z][A-Z][0-9][0-9]\s*[0-9][A-Z][A-Z]\b ► \b[A-Z][0-9][A-Z]\s*[0-9][A-Z][A-Z]\b ► \b[A-Z][A-Z][0-9][A-Z]\s*[0-9][A-Z][A-Z]\b |