From: A proficient cost reduction framework for de-duplication of records in data integration
Dataset | Dataset-A | Dataset-C | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Window size | 3 | 6 | 12 | 21 | 30 | 3 | 6 | 12 | 21 | 30 |
Record Pairs : SKW-SDX | 9551 | 17151 | 32679 | 54322 | 75768 | 8476 | 14777 | 27629 | 47081 | 65030 |
Matches : SKW-SDX | 469 | 478 | 481 | 485 | 486 | 764 | 864 | 918 | 965 | 978 |
F-Score : SKW-SDX | 0.959 | 0.961 | 0.948 | 0.929 | 0.906 | 0.832 | 0.886 | 0.904 | 0.908 | 0.895 |
Record Pairs : SKW-SB4 | 10271 | 18591 | 34882 | 58769 | 81591 | 9808 | 16380 | 30259 | 50809 | 70322 |
Matches : SKW-SB4 | 479 | 482 | 483 | 484 | 484 | 900 | 949 | 967 | 979 | 981 |
F-Score : SKW-SB4 | 0.969 | 0.963 | 0.948 | 0.923 | 0.898 | 0.910 | 0.930 | 0.926 | 0.911 | 0.891 |
Record Paris : CKW-SDX | 3539 | 7186 | 14437 | 24966 | 35314 | 3342 | 6624 | 13033 | 22469 | 31713 |
Matches : CKW-SDX | 469 | 477 | 482 | 488 | 490 | 519 | 662 | 783 | 862 | 912 |
F-Score : CKW-SDX | 0.965 | 0.970 | 0.968 | 0.963 | 0.954 | 0.656 | 0.765 | 0.840 | 0.878 | 0.897 |
Record Paris : CKW-SB4 | 3651 | 7430 | 14884 | 25905 | 36655 | 3803 | 7409 | 14240 | 24426 | 34359 |
Matches : CKW-SB4 | 487 | 488 | 491 | 491 | 492 | 750 | 863 | 922 | 955 | 969 |
F-Score : CKW-SB4 | 0.983 | 0.981 | 0.976 | 0.965 | 0.954 | 0.826 | 0.892 | 0.918 | 0.925 | 0.923 |
Record Paris : MPW-SDX | 13858 | 25968 | 49722 | 82615 | 114045 | 12158 | 21982 | 41884 | 70643 | 96960 |
Matches : MPW-SDX | 496 | 496 | 496 | 496 | 496 | 889 | 976 | 1015 | 1032 | 1034 |
F-Score : MPW-SDX | 0.982 | 0.970 | 0.944 | 0.907 | 0.868 | 0.902 | 0.938 | 0.936 | 0.912 | 0.883 |
Record Paris : MPW-SB4 | 15614 | 29191 | 54977 | 91679 | 125521 | 14434 | 25208 | 46783 | 78219 | 107261 |
Matches : MPW-SB4 | 494 | 494 | 494 | 494 | 494 | 1022 | 1031 | 1032 | 1034 | 1035 |
F-Score : MPW-SB4 | 0.978 | 0.964 | 0.936 | 0.894 | 0.852 | 0.968 | 0.961 | 0.939 | 0.905 | 0.870 |