Skip to main content

Advertisement

Table 3 Keyed hash bigram subset communication volume. Overhead for various minimum bigram score thresholds compared to the unencrypted communication of the original values. The average (unencrypted) surname and suburb name lengths were 6.4 and 9.3 characters, giving rise to an average of 166 and 2,521 bigram subsets respectively. A total of 2,323,355 records were processed.

From: Some methods for blindfolded record linkage

Minimum bigram score threshold Surnames Suburb names
  Megabytes communicated Overhead Megabytes communicated Overhead
0 7540 520 114399 5435
0.1 7540 520 114384 5435
0.2 7484 516 113787 5406
0.3 7132 492 109679 5211
0.4 6287 434 97300 4623
0.5 4238 292 63892 3036
0.6 2836 196 35091 1667
0.7 1242 86 12154 577
0.8 511 35 3056 145
0.9 185.0 12.7 442.4 21.0
1 45.4 3.13 45.4 2.16
Original values 14.5 1 21.1 1