Skip to main content

Table 1 Partition characteristics

From: Creating a medical dictionary using word alignment: The influence of sources and resources

Partition

Content

Word correlation

Character correlation

Word ratio difference

Rubrics

English rubric average number (standard deviation) of words

Swedish rubric average number (standard deviation) of words

English unique words

Swedish unique words

English unique words per rubric

Swedish unique words per rubric

All

All terminology systems

0.78

0.79

 

38,575

3.7 (2.9)

3.3 (3.0)

17,679

25,848

0.5

0.7

1

MeSH, one word in either English or Swedish rubric

  

0.56

13,514

1.5 (0.7)

1.0 (0.1)

11,267

13,581

0.8

1.0

2

MeSH, more than one word in both English and Swedish rubrics

0.52

0.71

0.30

5,568

2.6 (0.8)

2.3 (0.7)

5,434

6,443

1.0

1.2

3

ICF, whole

0.69

0.79

0.53

1,496

4.7 (2.5)

4.2 (2.8)

991

1,263

0.7

0.8

4

KSH97-P, whole

0.70

0.67

0.49

968

4.0 (2.5)

3.5 (2.4)

1,324

1,382

1.4

1.4

5

ICD-10, except chapter 2 level 4

0.77

0.75

0.37

10,791

5.2 (3.0)

5.2 (3.4)

5,144

7,219

0.5

0.7

6

NCSP, except chapter N

0.64

0.63

0.38

4,137

5.8 (2.7)

5.0 (2.5)

1,758

2,347

0.4

0.6

7

ICD-10, chapter 2 level 4

0.38

0.45

0.71

713

3.6 (2.2)

6.3 (2.7)

443

535

0.6

0.8

8

NCSP, chapter N

0.55

0.48

0.25

1,388

9.4 (2.6)

7.7 (2.3)

249

285

0.2

0.2

  1. Content of the partitions.
  2. Kendall's tau-b correlation between the English rubrics and corresponding Swedish rubrics according to number of words and number of characters and average absolute differences between the ratio for all rubrics in the partition and the grand mean of the different terminology partitions.
  3. Number of parallel rubrics, average number and standard deviation of words per rubrics, number of unique words, and number of average unique words per rubric of the different terminology partitions.