Skip to main content

Table 10 Results of windowing variants (Dataset-A and Dataset-C)

From: A proficient cost reduction framework for de-duplication of records in data integration

Dataset

Dataset-A

Dataset-C

Window size

3

6

12

21

30

3

6

12

21

30

Record Pairs : SKW-SDX

9551

17151

32679

54322

75768

8476

14777

27629

47081

65030

Matches : SKW-SDX

469

478

481

485

486

764

864

918

965

978

F-Score : SKW-SDX

0.959

0.961

0.948

0.929

0.906

0.832

0.886

0.904

0.908

0.895

Record Pairs : SKW-SB4

10271

18591

34882

58769

81591

9808

16380

30259

50809

70322

Matches : SKW-SB4

479

482

483

484

484

900

949

967

979

981

F-Score : SKW-SB4

0.969

0.963

0.948

0.923

0.898

0.910

0.930

0.926

0.911

0.891

Record Paris : CKW-SDX

3539

7186

14437

24966

35314

3342

6624

13033

22469

31713

Matches : CKW-SDX

469

477

482

488

490

519

662

783

862

912

F-Score : CKW-SDX

0.965

0.970

0.968

0.963

0.954

0.656

0.765

0.840

0.878

0.897

Record Paris : CKW-SB4

3651

7430

14884

25905

36655

3803

7409

14240

24426

34359

Matches : CKW-SB4

487

488

491

491

492

750

863

922

955

969

F-Score : CKW-SB4

0.983

0.981

0.976

0.965

0.954

0.826

0.892

0.918

0.925

0.923

Record Paris : MPW-SDX

13858

25968

49722

82615

114045

12158

21982

41884

70643

96960

Matches : MPW-SDX

496

496

496

496

496

889

976

1015

1032

1034

F-Score : MPW-SDX

0.982

0.970

0.944

0.907

0.868

0.902

0.938

0.936

0.912

0.883

Record Paris : MPW-SB4

15614

29191

54977

91679

125521

14434

25208

46783

78219

107261

Matches : MPW-SB4

494

494

494

494

494

1022

1031

1032

1034

1035

F-Score : MPW-SB4

0.978

0.964

0.936

0.894

0.852

0.968

0.961

0.939

0.905

0.870