Skip to main content

Table 3 Keyed hash bigram subset communication volume. Overhead for various minimum bigram score thresholds compared to the unencrypted communication of the original values. The average (unencrypted) surname and suburb name lengths were 6.4 and 9.3 characters, giving rise to an average of 166 and 2,521 bigram subsets respectively. A total of 2,323,355 records were processed.

From: Some methods for blindfolded record linkage

Minimum bigram score threshold

Surnames

Suburb names

 

Megabytes communicated

Overhead

Megabytes communicated

Overhead

0

7540

520

114399

5435

0.1

7540

520

114384

5435

0.2

7484

516

113787

5406

0.3

7132

492

109679

5211

0.4

6287

434

97300

4623

0.5

4238

292

63892

3036

0.6

2836

196

35091

1667

0.7

1242

86

12154

577

0.8

511

35

3056

145

0.9

185.0

12.7

442.4

21.0

1

45.4

3.13

45.4

2.16

Original values

14.5

1

21.1

1