Skip to main content

Table 6 Examples of single variable changes in predictive ability for individual cleaning techniques in hospital admission data

From: The effect of data cleaning on record linkage quality

Hospital admissions data

   
 

Precision

Recall

F-measure

Percentage difference from original variable

Given name original

0.006575

0.946085

0.013059

Given name with removed punctuation

0.006573 b 0.03%

0.947188↑0.11%

0.013056↓0.02%

Given name with nicknames removed

0.004357↓33.7%

0.953738↑0.81%

0.008675↓33.5%

Surname original

0.025265

0.98824

0.049271

Soundex of surname

0.008845↓65%

0.994926↑0.67%

0.017533↓64.4%

Address original

0.687066

0.669649

0.678246

Address with alternate missing values and uninformative values removed

0.687398↑0.05%

0.709426↑5.9%

0.698238↑2.9%

  1. b Down arrow symbol () refers to decreased percentage change, up arrow () refers to increased percentage change.