Skip to main content

Table 2 A comparison of the most common fields in the created synthetic data and the original data it was based on

From: The effect of data cleaning on record linkage quality

Surname (top 5)

Synthetic

Per cent

Original

Per cent

Male forename (top 5)

Synthetic

Per cent

Original

Per cent

Missing value

1.98

 

Missing value

1.99

 

Smith

0.92

0.94

John

3.44

3.47

Jones

0.55

0.55

David

3.09

3.09

Brown

0.46

0.46

Michael

2.95

2.95

Williams

0.46

0.46

Peter

2.87

2.88

Taylor

0.44

0.44

Robert

2.47

2.47

Female forename (top 5)

Synthetic

Per cent

Original

Per cent

Postcode (top 5)

Synthetic

Per cent

Original

Per cent

Missing value

1.99

 

Missing value

1.01

 

Margaret

1.57

1.56

6210

2.84

2.84

Susan

1.35

1.34

6163

2.33

2.34

Patricia

1.22

1.22

6027

2.06

2.05

Jennifer

1.19

1.20

6155

2.02

2.02

Elizabeth

1.05

1.05

6065

2.00

1.98