Evaluating alignment quality between iconic language and reference terminologies using similarity metrics
© Griffon et al.; licensee BioMed Central Ltd. 2014
Received: 17 June 2013
Accepted: 26 February 2014
Published: 11 March 2014
Visualization of Concepts in Medicine (VCM) is a compositional iconic language that aims to ease information retrieval in Electronic Health Records (EHR), clinical guidelines or other medical documents. Using VCM language in medical applications requires alignment with medical reference terminologies. Alignment from Medical Subject Headings (MeSH) thesaurus and International Classification of Diseases – tenth revision (ICD10) to VCM are presented here. This study aim was to evaluate alignment quality between VCM and other terminologies using different measures of inter-alignment agreement before integration in EHR.
For medical literature retrieval purposes and EHR browsing, the MeSH thesaurus and the ICD10, both organized hierarchically, were aligned to VCM language. Some MeSH to VCM alignments were performed automatically but others were performed manually and validated. ICD10 to VCM alignment was entirely manually performed. Inter-alignment agreement was assessed on ICD10 codes and MeSH descriptors, sharing the same Concept Unique Identifiers in the Unified Medical Language System (UMLS). Three metrics were used to compare two VCM icons: binary comparison, crude Dice Similarity Coefficient (DSC crude ), and semantic Dice Similarity Coefficient (DSC semantic ), based on Lin similarity. An analysis of discrepancies was performed.
MeSH to VCM alignment resulted in 10,783 relations: 1,830 of which were manually performed and 8,953 were automatically inherited. ICD10 to VCM alignment led to 19,852 relations. UMLS gathered 1,887 alignments between ICD10 and MeSH. Only 1,606 of them were used for this study. Inter-alignment agreement using only validated MeSH to VCM alignment was 74.2% [70.5-78.0]CI95%, DSC crude was 0.93 [0.91-0.94]CI95%, and DSC semantic was 0.96 [0.95-0.96]CI95%. Discrepancy analysis revealed that even if two thirds of errors came from the reviewers, UMLS was nevertheless responsible for one third.
This study has shown strong overall inter-alignment agreement between MeSH to VCM and ICD10 to VCM manual alignments. VCM icons have now been integrated into a guideline search engine (http://www.cismef.org) and a health terminologies portal (http://www.hetop.eu).
KeywordsTerminology as topic International classification of diseases Medical subject headings Vocabulary Controlled Alignment Iconic language Compositional language Semantic distances Inter-alignment agreement
Finding pertinent medical information in a complex Electronic Health Record (EHR)[1, 2] or inside guidelines is a time-consuming task for physicians. Visualization of Concepts in Medicine (VCM) is a compositional iconic language created by Lamy et al. to ease this burden. VCM language has previously been used in a graphical interface for accessing drug knowledge, allowing physicians faster access to drug knowledge than with textual interface, and with fewer errors. VCM can represent various signs, diseases, physiological states, risks, antecedents, drug and non-drug treatments, laboratory tests, and medical follow-up procedures by combining a small number of graphical primitives: colors, shapes and pictograms. For instance, the icon symbolizing “renal failure” is composed of a “kidney” pictogram, a downward arrow representing “diminished function”, and a red color standing for “current patient status”. VCM does not aim to achieve the same level of detail as natural language texts, but rather a broader level of detail. VCM icons can be used in medical applications for visually filtering information or for graphical summary. It has been implemented by Vidal®, the leader in drug databases in France, in its on-line guidelinesa and it is used by Sherbrooke Health Expertise Center for e-learning.
To allow this, the terminology used in the medical application has to be aligned to VCM language, i.e. each concept of the terminology has to be aligned to one or more VCM icon. For example, associating VCM icons with patient conditions coded in EHR with the tenth revision of the International Classification of Diseases (ICD10), requires iconic representation of each ICD10 code using VCM language. These alignments may also ease indexing and information retrieval, EHR visualization, as well as reading of Summary of Product Characteristics etc.
Alignment errors could lead to false display in the medical application and, possibly, to medical error. It is therefore important to limit these errors. The subjectivity of alignment makes quality evaluation difficult and time-consuming. A potential method for performing such evaluation is inter-alignment agreement, as in indexing. Several similarity metrics may be used to compare two alignments: icon comparison (are two icons identical?), elementary comparisons (is each compositional element of two icons identical?) and semantic comparison (do two icons share the same meaning?).
This study presents the alignment of two commonly used terminologies: ICD10 and Medical Subject Heading (MeSH), to VCM. The aim of this work was to evaluate alignment quality before integrating VCM in EHR. Based on a small proportion of MeSH to VCM alignment that had already been manually validated, three inter-alignment consistency measures were used: crude concordance and two measures based on Dice index, with or without semantics.
VCM iconic language (v2.07)
In this study, two reference terminologies were aligned to VCM iconic language: MeSH Thesaurus of the US National Library of Medicine (NLM) in its 2011 version, mostly used for indexing and information retrieval of medical literature in MEDLINE, and the French version for Diagnosis Related Group of ICD10, built for mortality statistics, but frequently used to code medical visits for budget allocation. These terminologies are widely used in the health domain.
The MeSH thesaurus has two different levels. The first one is the descriptor level, which is for users, and the focus of this work. It consists in a “small” (n ≈ 27,000) set of terms used for indexing and information retrieval. The second one is the concept level: each MeSH descriptor is the union of one or more MeSH concepts (n ≈ 50,000b). MeSH concept meaning may differ slightly from MeSH descriptors. It is a poly-hierarchic thesaurus, whereas ICD10 is a mono-hierarchic classification.
The V2010AB of the Unified Medical Language System Metathesaurus (UMLS) was also used for this study. It is an NLM project that integrates several health terminologies and ontologies. Terms belonging to different terminologies but sharing the same meaning are gathered under the same Concept Unique Identifier (CUI). ICD10 and MeSH are both integrated into the UMLS and some concepts shared the same CUI.
Alignments between terminologies
MeSH descriptor to VCM alignment
Automatic approaches were first used to align MeSH to VCM. Natural language processing, stemming and lemmatization techniques were tried but led to disappointing results. Only 1.6% of MeSH descriptors of interest were aligned. It was therefore necessary to perform this alignment manually. This task was performed by GK, a medical librarian. It was an iterative process leading to the addition of new icons and guidelines regarding VCM use.
ICD10 to VCM alignment
NG, a public health resident, performed ICD10 to VCM alignment. Each ICD10 code was manually aligned to VCM.
Alignment between MeSH and ICD10
To compare VCM icons aligned to MeSH and VCM icons aligned to ICD10, alignment between MeSH and ICD10 was necessary. The latter was provided by UMLS, and more specifically by selecting ICD10 codes and MeSH descriptors sharing the same CUI.
Only manual MeSH to VCM alignments were already validated, and used to evaluate ICD10 to VCM alignments, which could in turn be used to validate automatic MeSH to VCM alignments. For each alignment between MeSH and ICD10, the following information was extracted: the MeSH descriptor, the relationship between the MeSH descriptor and the VCM icon, the VCM icon aligned to the MeSH descriptor, the ICD10 code, and the VCM icon aligned to the ICD10 code. Only alignments concerning one VCM icon for both ICD10 codes and MeSH descriptors were used, because of difficulties comparing more than two icons. Therefore, if one ICD10 code or one MeSH descriptor was aligned to more than one VCM icon, it was discarded from the study.
Measuring inter-alignment agreement
Concordance was defined as the proportion of alignments in which the ICD10 code icon and the MeSH descriptor icon were identical. To refine this rough measure of inter-alignment agreement, the Dice Similarity Coefficient (DSC) was used to compare icons based on their primitives. DSC is equivalent to Fleiss’ positive specific agreement, and as there are many primitives (n = 221), it is also equivalent to kappa coefficient[21, 22].
where Pr(I j ) is the set of primitives for icon I j .
Where S(Pri,Prj) represents the set of ancestor primitives shared by both Pri and Prj, “max” represents the maximum operator, and p(Pr) is the probability of finding Pr in a reference corpus (here, the probability of finding Pr as a primitive in the entire set of MeSH to VCM and ICD10 to VCM alignment). Lin similarity lies between 0 (when the only common ancestor is the root tree) and 1 (when Pri = Prj).
Where sim(Pri,Prj) is computed using equation (2), and i and j are the number of primitives in I1 and I2, respectively.
Thyroiditis, subacute (MeSH)
E06.1 - Subacute thyroiditis (ICD10)
Best similarities for “MeSH primitives”
Best similarities for “ICD10 primitives”
For these two different icons, DSC crude = 4/7 and DSC semantic = 6.05/7.
The three metrics were compared between icons according to the relationship between MeSH descriptors and VCM icons (automatic vs. manual), using Wilcoxon/Fisher tests.
A random sample of 35 discordances, involving MeSH descriptors that were manually aligned to VCM, has been reviewed by experts (GK and NG) to assess the reasons for discordance.
Number of VCM icons by ICD10 code or MeSH descriptor, according to the relationship
VCM icons (N)
There were 1,887 alignments between ICD10 and MeSH using UMLS concepts. For 1,606 of them, there was one icon for the ICD10 code and one icon for the MeSH descriptor (85.1%). This study focused on these 1,606 concepts, since comparing more than two icons would have been too complex. There were 528 manual alignments and 1,078 automatic alignments between MeSH descriptors and VCM icons.
Figure 3 shows an example of disagreement between two terms sharing the same CUI: “Thyroiditis, subacute” from MeSH and “Subacute thyroiditis” from ICD10.
Results from comparison of ICD10 code VCM icons and MeSH descriptor VCM icons
MeSH to VCM relationship
Total (n = 1,606)
Manual (n = 528)
Automatic (n = 1,078)
* < 10-4
$ < 10-4
$ < 10-4
Inter-alignment agreement showed a concordance of 74.2% for fully manual alignments. The results are even better using Dice Similarity Coefficient: mean DSCCrude = 0.93 and mean DSCSemantic = 0.96. Both can be interpreted, like Cohen’s Kappa, as excellent or almost perfect. The results are less satisfying with automatic alignments: concordance dropped to 60.5%, and there was a decrease in both DSC to 0.88 and 0.92 respectively. Discordance analysis shows that discrepancies resulted mostly from experts (60%) or UMLS (31%).
Comparing automatic alignment to gold standard alignment (manually created by an expert) is frequent in the literature[27, 28]. Conversely, few studies to date have compared two manually created alignments. Wieteck compared inter-alignment agreement between two nursing terminologies: the European Nursing care Pathway, which is mono-axial, and the International Classification for Nursing Practice (ICNP), which is multi-axial. Agreement was measured for each of the eight ICNP axes and ranged from 73% to 100%. This led to an estimated overall inter-alignment agreement ranging from 53% to 70% for fully manual alignment. The results presented here are better than Wieteck’s for manual alignment, especially for similarity metrics.
One explanation for these improved results could be the relatively low granularity of VCM iconic language with a maximum of six hierarchy levels, whereas the MeSH thesaurus has a maximum of 11 hierarchy levels. Nevertheless, the compositionality of VCM allows the creation of more icons than existing MeSH terms: according to VCM ontology, there are millions of coherent, consistent icons. This does not mean that each of these icons is meaningful. Today, more than 2,500 different icons have been created and linked to MeSH, ICD10, ATC or SNOMED.
Analysis of discrepancies revealed that alignment differences between VCM to ICD10 and VCM to MeSH may be the result of:
Firstly, VCM to MeSH alignment was performed by a medical librarian (GK), whereas VCM to ICD10 alignment was performed by a medical resident (NG). Consequently, alignment differences could be explained by different education and point of view regarding the disease. The purpose of the semantic similarity measure (DSC semantic ) is to decrease the weight of such differences.
Secondly, sharing the same UMLS CUI is sometimes questionable based on the different contexts that led to the creation of the different terminologies (e.g. medical literature for MeSH, mortality statistics for ICD10). It is often the result of UMLS CUI linking an ICD10 code and a MeSH concept with narrower meaning than the MeSH descriptor used in this study. Nevertheless, those approximate links provide results of similar quality to more regular links, i.e. when MeSH concept and MeSH descriptor have exactly the same meaning (data not shown).
Lastly, differences in alignment could be explained by the different contexts of terminology in current use (e.g. billing for ICD10, indexing and information retrieval for MeSH).
This study has potential limitations. Firstly, it was based on a rather uncommon situation, with three different coexisting manual alignments: (1) MeSH to ICD10 alignment through UMLS (same CUI), (2) VCM to MeSH alignment and, (3) VCM to ICD10 alignment. VCM to MeSH alignment was performed first, then VCM to ICD10 thereafter. NG was not totally blind in performing the VCM to MeSH alignment. In case of doubt, he was able to use HeTOP[13, 14], which had integrated VCM to MeSH alignment. Overall, the portal was used for a limited number of alignments. Such bias could therefore be considered as minimal. A second possible source of bias was the exclusion of ICD10 to MeSH alignment when more than one VCM icon was used for MeSH descriptor or for ICD10 code. Agreement in these cases might be lower than that observed here. However, from the 281 alignments concerned (i.e. MeSH descriptor or ICD10 code aligned to more than one VCM icon), only 42 involved an already validated MeSH to VCM alignment – i.e. manual MeSH to VCM alignment. Assuming those 42 were all erroneous, this would have led to a concordance of 68.8%, a DSCcrude of 0.86 and a DSCsemantic of 0.89. It is still an excellent inter-alignment agreement, especially compared to the literature. Lastly, our results concerned only about 20% of MeSH diseases and 10% of ICD10. Those terms were not chosen randomly but rather based on whether they were mappable to a UMLS CUI that was also mapped to the other terminology. Also, the remaining terms may have some systematic characteristics: being more specific, with nuances that make them incomplete matches etc. This implies that for those terms alignment to VCM might require more work, more detailed icons (with more primitives) and therefore be more prone to coder errors, show lower levels of concordance, similarity and, finally, validity. Such differences between UMLS linked and non-UMLS linked MeSH descriptors and ICD10 codes are difficult to quantify.
For research and development purposes, both alignments will be maintained in HeTOP, allowing VCM to MeSH available in 16 languages (e.g. Japanese and Swedish) and VCM to ICD10 in 11 languages (e.g. Arabic and Italian). However, industrial partners in the L3IM consortium (one small French company and one French subsidiary of a north-American company) have different perspectives: the same medical concept should have the same VCM icon for the end-user, no matter which terminology or classification it was aligned from. Such recommendations require a considerable amount of expert validation and, probably, some changes in VCM hierarchy.
The high inter-alignment agreement involving already validated MeSH to VCM alignments demonstrates the validity of ICD10 to VCM alignment, allowing its use in ICD10 based EHR to summarize patient conditions, with minor modification from editors. Two companies have already shown enough interest in VCM to introduce it in their products (Silk and McKesson). VCM can therefore be considered as a sort of interface terminology, which was defined by Rosenbloom et al. as a terminology that “facilitates display of computer-stored patient information to clinician-users as simple human-readable text”.
The literature suggests that enhanced consistency between MeSH to VCM and ICD10 to VCM alignment could increase alignment validity. Therefore, finding an approach for MeSH to VCM automatic alignment leading to consistency similar to that found in “manual” relationship would probably facilitate validation of industrial recommendations. L3IM intends working on such an approach using the ontological version of VCM iconic language.
This study has shown excellent overall inter-alignment semantic agreement between MeSH to VCM and ICD10 to VCM manual alignments. ICD10 to VCM alignment seems of sufficient quality to be used in medical applications.
aSee http://www.vidal.fr/recommandations/3398/diverticulose_colique/la_maladie/, for example.
bExcluding MeSH supplementary concepts, which are not used for this study.
Anatomical therapeutic chemical classification system
Concept Unique Identifier
Dice Similarity Coefficient
Electronic Health Records
Health terminology/ontology portal
International Classification of Diseases – tenth revision
International Classification for Nursing Practice
Iconic language and interactive user interfaces in medicine
Medical Subject Headings
National Library of Medicine
Systematized NOmenclature of MEDicine
Unified Medical Language System
Visualization of Concepts in Medicine.
This work was partially granted by the L3IM Project funded by the French National Agency (Technologies for Health program) (ANR-08-TECS-007-02).
The authors are grateful to Nikki Sabourin-Gibbs, Rouen University Hospital, for editing the manuscript.
- Payne TH, TenBroek AE, Fletcher GS, Labuguen MC: Transition from paper to electronic inpatient physician notes. J Am Med Inform Assoc. 2010, 17 (1): 108-10.1197/jamia.M3173.View ArticlePubMedPubMed CentralGoogle Scholar
- Christensen T, Grimsmo A: Instant availability of patient records, but diminished availability of patient information: a multi-method study of GP’s use of electronic patient records. BMC Med Inform Decis Mak. 2008, 8: 12-10.1186/1472-6947-8-12.View ArticlePubMedPubMed CentralGoogle Scholar
- Francke AL, Smit MC, de Veer AJ, Mistiaen P: Factors influencing the implementation of clinical guidelines for health care professionals: a systematic meta-review. BMC Med Inform Decis Mak. 2008, 8: 38-10.1186/1472-6947-8-38.View ArticlePubMedPubMed CentralGoogle Scholar
- Coumou HC, Meijman FJ: How do primary care physicians seek answers to clinical questions?. J Med Libr Assoc. 2006, 94 (1): 55-60.PubMedPubMed CentralGoogle Scholar
- Lamy JB, Duclos C, Bar-Hen A, Ouvrard P, Venot A: An iconic language for the graphical representation of medical concepts. BMC Med Inform Decis Mak. 2008, 8: 16-10.1186/1472-6947-8-16.View ArticlePubMedPubMed CentralGoogle Scholar
- Lamy JB, Venot A, Bar-Hen A, Ouvrard P, Duclos C: Design of a graphical and interactive interface for facilitating access to drug contraindications, cautions for use, interactions and adverse effects. BMC Med Inform Decis Mak. 2008, 8: 21-10.1186/1472-6947-8-21.View ArticlePubMedPubMed CentralGoogle Scholar
- McCulloch E, Shiri A, Nicholson D: Challenges and issues in terminology mapping: a digital library perspective. Electron Libr. 2005, 23: 671-677. 10.1108/02640470510635755.View ArticleGoogle Scholar
- Leonard LE: Inter-indexer consistency and retrieval effectiveness: measurement of relationships. PhD Thesis. 1975, University of IllinoisGoogle Scholar
- Lamy JB, Soualmia LF, Kerdelhué G, Venot A, Duclos C: Validating the semantics of a medical iconic language using ontological reasoning. J Biomed Inform. 2013, 46 (1): 56-67. 10.1016/j.jbi.2012.08.006.View ArticlePubMedGoogle Scholar
- National Library of Medicine: Medical Subject Headings. [http://www.nlm.nih.gov/mesh/]
- World Health Organization: International Classification of Diseases, 10th revision. [http://www.who.int/classifications/icd/en/index.html]
- Lindberg DAB, Humphreys BL, McCray AT: The unified medical language system. Methods Inf Med. 1993, 32: 281-291.PubMedGoogle Scholar
- CISMeF: Cross lingual multiple health-terminologies ontologies portal. [http://www.hetop.eu]
- Grosjean J, Merabti T, Griffon N, Dahamna B, Darmoni SJ: Teaching Medicine with a Terminology/Ontology Portal. 24th European Medical Informatics Conference:. 2012, Pisa, AugustGoogle Scholar
- Schmid H: Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing. 1994, Manchester, UK: University of Manchester, 44-49.Google Scholar
- Kerdelhué G, Lamy JB, Venot A, Duclos C, Darmoni SJ: An Iconic Language for the “CISMeFBonnespratiques” Website. Proceedings of the 12th European Association for Health Information and Libraries Conference (EAHIL). 2010, LisbonGoogle Scholar
- CISMeF: CISMeF BP. [http://cisdev.chu-rouen.fr/servlets/CISMeFBPvcm]. Login and password are available on demand
- Fung KW, Bodenreider O: Utilizing the UMLS for Semantic Mapping Between Terminologies. AMIA Annual Symposium Proceedings. 2005, Austin, 266-270.Google Scholar
- Dice LR: Measures of the amount of ecologic association between species. Ecology. 1945, 26: 297-302. 10.2307/1932409.View ArticleGoogle Scholar
- Fleiss JL: Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975, 31: 651-659. 10.2307/2529549.View ArticlePubMedGoogle Scholar
- Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46. 10.1177/001316446002000104.View ArticleGoogle Scholar
- Hripcsak G, Rothschild AS: Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005, 12: 296-298. 10.1197/jamia.M1733.View ArticlePubMedPubMed CentralGoogle Scholar
- Lin D: An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning. 1998, San Francisco, CA: Morgan Kaufmann, 296-304.Google Scholar
- Neveol A, Zeng K, Bodenreider O: Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE. AMIA Annu Symp Proc. 2006, 589-93.Google Scholar
- Fleiss JL: Statistical Methods for Rates and Proportions. 1981, New York: John Wiley, 2Google Scholar
- Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics. 1977, 33: 159-174. 10.2307/2529310.View ArticlePubMedGoogle Scholar
- Fung KW, Bodenreider O, Aronson AR, Hole WT, Srinivasan S: Combining lexical and semantic methods of inter-terminology mapping using the UMLS. Stud Health Technol Inform. 2007, 129: 605-609.PubMedPubMed CentralGoogle Scholar
- Cantor MN, Sarkar IN, Gelman R, Hartel F, Bodenreider O, Lussier YA: An evaluation of hybrid methods for matching biomedical terminologies: mapping the gene ontology to the UMLS. Stud Health Technol Inform. 2003, 95: 62-67.PubMedPubMed CentralGoogle Scholar
- Wieteck P: Furthering the development of standardized nursing terminology through an ENP-ICNP cross-mapping. Int Nurs Rev. 2008, 55: 296-304. 10.1111/j.1466-7657.2008.00639.x.View ArticlePubMedGoogle Scholar
- Erdogan H, Erdem E, Bodenreider O: Exploiting UMLS semantics for checking semantic consistency among UMLS concepts. Stud Health Technol Inform. 2010, 160 (Pt 1): 749-753.PubMedPubMed CentralGoogle Scholar
- Lamy JB, Duclos C, Hamek S, Beuscart-Zéphir MC, Kerdelhué G, Darmoni S, Favre M, Falcoff H, Simon C, Pereira S, Serrot E, Mitouars T, Hardouin E, Kergosien Y, Venot A: Towards iconic language for patient records, drug monographs, guidelines and medical search engines. Stud Health Technol Inform. 2010, 160: 156-160.PubMedGoogle Scholar
- Silk informatique: Présentation du Projet L3IM.http://www.silk-info.com/medical-social/72-recherche-medicale.html,
- Rosenbloom ST, Miller RA, Johnson KB, Elkin PL, Brown SH: Interface terminologies: facilitating direct entry of clinical data into electronic health record systems. J Am Med Inform Assoc. 2006, 13: 277-288. 10.1197/jamia.M1957.View ArticlePubMedPubMed CentralGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/14/17/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.