Formal representation of complex SNOMED CT expressions
© Schulz et al; licensee BioMed Central Ltd. 2008
Published: 27 October 2008
Definitory expressions about clinical procedures, findings and diseases constitute a major benefit of a formally founded clinical reference terminology which is ontologically sound and suited for formal reasoning. SNOMED CT claims to support formal reasoning by description-logic based concept definitions.
On the basis of formal ontology criteria we analyze complex SNOMED CT concepts, such as "Concussion of Brain with(out) Loss of Consciousness", using alternatively full first order logics and the description logic .
Typical complex SNOMED CT concepts, including negations or not, can be expressed in full first-order logics. Negations cannot be properly expressed in the description logic underlying SNOMED CT. All concepts concepts the meaning of which implies a temporal scope may be subject to diverging interpretations, which are often unclear in SNOMED CT as their contextual determinants are not made explicit.
The description of complex medical occurrents is ambiguous, as the same situations can be described as (i) a complex occurrent C that has A and B as temporal parts, (ii) a simple occurrent A' defined as a kind of A followed by some B, or (iii) a simple occurrent B' defined as a kind of B preceded by some A. As negative statements in SNOMED CT cannot be exactly represented without a (computationally costly) extension of the set of logical constructors, a solution can be the reification of negative statments (e.g., "Period with no Loss of Consciousness"), or the use of the SNOMED CT context model. However, the interpretation of SNOMED CT context model concepts as description logics axioms is not recommended, because this may entail unintended models.
Improving semantic interoperability by a structured representation of clinical procedures, findings and diseases constitutes the main rationale for the development of clinical terminologies and classification systems. The applicability of a clinical reference terminology such as SNOMED CT is critically coupled to the extent of how it can express the language-independent meaning of complex and often idiosyncratic medical terms in a principled way. Such pre-coordinated terms, which are typical for medical classification systems, should be definable in terms of the primitives provided by the reference terminology.
SNOMED CT (Clinical Terms) is a clinical terminology which was built by merging, restructuring, and enhancing the previous SNOMED version RT (Reference Terminology) with the former UK Read Codes. Already SNOMED RT had claimed to be a "set of concepts and relationships that provides a common reference point for comparison and aggregation of data about the entire health care process" [1, 2]. To fulfill this requirement, SNOMED CT contains now (May 2008) over half a million concepts. SNOMED CT relational statements are provided as concept – relation – concept triples. However, there is no clear explanation about what SNOMED CT "concepts" and "relations" exactly stand for. SNOMED CT "concepts", for instance, encompass individuals like Denmark, Greater London, Association of Anesthesia Clinical Directors, etc., and relations are allowed to link both concepts and individuals. We therefore prefer the term "class" wherever categories of individual entities are meant (which constitutes the standard case). Furthermore, we distinguish specialization relations which hold between classes (here the subclass-superclass relation is-a) from instantiation relations which relate individuals to their categorizing classes, using the instantiation relation inst. Moreover, domain-specific relations, such as "associated morphology" or "is part of", are intuitively used to relate individuals. These are to be differentiated from the former two relations and are usually referred to as roles. Wherever a role (domain-specific relation) is used in a class definition, it needs to be quantified, as shown below.
SNOMED RT and SNOMED CT (partially) follow a formal semantics based on the description logic specification KRSS . The standard semantics of the Krss specification has been enhanced in SNOMED CT to cater for so-called right-identity rules which are shown essentially important for certain part-whole reasoning tasks.
Description logics are a family of decidable fragments of the first-order logic, which have a clean and intuitive syntax (without the need for free variables), cf. . They have become increasingly popular, in particular, by the W3C recommendation of the Semantic Web language OWL-DL , several description logic based ontology editors, such as Protégé , and description logic reasoners such as CEL and FaCT++ which already proved feasibly useful in classifying SNOMED CT . The description logic – the underlying logical formalism for the SNOMED CT terminology – provides conjunctions and existential quantifications which can be conceived as building blocks for complex class descriptions. Conjunctions are most easily to be understood as intersections of sets. For instance, the expression FractureOfBone ⊓ LegInjury denotes the intersection of all entities belonging to the class FractureOfBone with those belonging to the class LegInjury. The resulting class LegFracture therefore contains all injuries which are both leg injuries and bone fractures. To give an example for existential quantification, ∃part-of.Body denotes the class of all entities which are part of some body. The equivalence between a named class A and a class description D can be asserted into the terminology by using a class definition (denoted by ≡). Such a class definition provides both necessary and sufficient conditions, specified in terms of D, for being qualified as instance of the class A, e.g. BodyPart ≡ ∃part-of.Body. When only necessary conditions are known for a certain named class, a primitive class definition (denoted by the subsumption operator ⊑) can be used instead. The SNOMED CT class definitions are partly primitive, partly fully defined, with a few additional axioms on roles, e.g. part-of is declared transitive.
According to  we subscribe to the ontological upper-level distinction between continuants (also called endurants) and occurrents (also called perdurants). Continuants are characterized as those entities that are wholly present in time, i.e. all their proper parts are present at any time of their existence. Typical continuants are physical objects and spaces, e.g. organisms and anatomical structures. In contrast, occurrents are entities that "happen in time": They extend in time by accumulating different temporal parts, so that, at any time instant t at which they exist, only their temporal parts at t are present. Typical occurrents are processual entities and events, such as surgical or diagnostic procedures. Whereas there is no doubt that medical prodecures are always occurrents (they have a well-defined beginning and end as temporal parts), diseases and other body phenomena are often ontologically ambiguous. On the one hand, they can be considered as states und thus being categorized as (immaterial) continuants. On the other hand, the focus can be laid on their temporal course, what characterizes them as occurrents. In this paper, this distinction will be neglected.
In the following we will analyze and discuss the formalization of complex SNOMED CT definitions.
Case studies of complex definitions
Using description logic constructors, SNOMED CT is able to define complex classes which are composed of simpler ones. To this end, the conjoints are grouped in terms of so-called relationship groups .
Thus misinterpretations such as "removal of stomach" and "incision of foreign body" can be prevented. As we have recently analyzed in , complex medical procedures or findings are characterized by a mereological structure, i.e. they can be described in terms of their (temporal) parts. As a consequence, rg can be considered equivalent to the partitive relation has-part, which would improve the semantic clarity in these cases. The expression ∃rg.(A ⊓ B) ⊓ ∃rg.(C ⊓ D) is then equivalent to ∃has-part.(A ⊓ B) ⊓∃has-part.(C ⊓ D).
We emphasize that this is a central issue, since complex definitions are frequent in SNOMED CT. Among all active SNOMED CT concepts, approximately 25,000 textual descriptions include one of the words "and", "with", "without". In about one third of these cases, relationship groups are used in the definitions associated.
Complex definitions with implicit time sequences
According to , rg was substituted by has-part for reasons of clarification. Obviously, Extraction of Foreign Body from Stomach by Excision can be fairly well represented using the constructors as introduced for SNOMED CT. However, an in-depth ontological analysis reveals that there are hidden assumptions regarding time that are not adequately expressed.
This representation is semantically richer insofar as it states that all pairs of subprocesses have the same anatomical structure s as a participant and that they occur within a pre-defined time interval, e.g. during a hospital stay. In contrast, the description logic representation for this and similar clinical situations would hold true even if there were two different stomachs and also if the two subprocesses were part of two completely different surgical interventions. However, the pragmatics of clinical coding is supposed to prevent such misinterpretations, since the assignment of a clinical code is used to be attributed to a single surgical procedure or a single disease under treatment.
Complex definitions with explicit time sequences
Such a definition is, however, affected by the same shortcomings as already discussed with the first example. Firstly, the time interval is not at all specified, and secondly, the definition is also compatible with two separate LossOfConsciousness subprocesses. For the latter problem there is no solution in the description logics used. The first one could pragmatically be tackled by subdividing the temporal relations precedes and follows into two relations which relate to different time intervals. For instance, precedescurr/followscurr might be introduced do denote the time frame of the current treatment episode, whereas precedeshist/followshist would then refer to processes in the context of clinical history only.
Finally, "Brain concussion with loss of consciousness" may only represent a snapshot-like current state of a patient in which no statement about the underlying time-dependent processes are made. This is, however, not sufficient for a final diagnosis, for which a more complete characterization of the disorder or trauma should be expected. The description logic representation used in SNOMED CT does not allow for this distinction. Again, first order logics would be required to describe this in the same way as illustrated in Formula 3, introducing time variables.
Complex definitions with exclusions
Again, here is nothing stated about the referents of "Brain tissue structure" and "Loss of Consciousness". However, the practical usage of clinical terminologies rules out the interpretation that one occurrent has body parts of different patients as participants.
The introduction of the full negation into the set of allowed constructors for SNOMED CT would seriously jeopardize the usability and scalability of terminological reasoning .
Nevertheless, it would be crucial not only for correctly defining numerous complex procedures and diseases but also for correctly mapping clinical classification systems such as ICD-10. Here, many classes have clearly defined exclusions (e.g. the general class "Thrombosis" excludes "Thrombosis in Pregnancy") and logical complements (e.g. the ubiquitous "not elsewhere classified" classes). Such categories cannot be adequately represented without full negation .
Reification could be a partial solution to the negation problem: with "Period with no loss of consciousness" – replacing the class "Awake" from Formula 7 – a class is introduced which paraphrases a negative statement without resorting to explicit negation. This class represents a period in the life of a patient during which a loss of consciousness can be ruled out. Again, there is a hidden assumption, viz. that a "Period with no loss of consciousness" exactly starts with the traumatic event and ends with the moment in which the observation was made.
SNOMED CT addresses the negation problem by its context model. Although incomplete in some logical respects, it enables support for negation in the sense of "absence of a condition" and "procedure not done", etc. However, we must be aware that this is placed outside the description logics framework. Ignoring this, we would get erroneous conclusions, such as interpreting "ECG not done – Associated Procedure – Electrocardiographic Procedure" as "∃AssociatedProcedure.ElectrocardiographicProcedure". In this case, the quantifier ∃ would assert the existence of some Electrocardiographic Procedure, which is exactly the contrary of what we want to express. Hence, the interpretation of SNOMED CT context model concepts as description logics axioms is not recommended.
The description of complex medical occurrents such as diseases and procedures is complicated and ambiguous. One source of ambiguity is related to conceptual scope: The same situations can be described as
• a complex occurrent C that has A and B as temporal parts
• a simple occurrent A' defined as a kind of A followed be some B
• a simple occurrent B' defined as a kind of B preceded be some A
In such cases, semantic interoperability would be only achieved in cases that all those mutually dependent classes were instantiated.
Another intricate problem is the reference to occurrents of the type A which are characterized by the absence of some occurrent of type B. This would require negation, which is not included in the SNOMED CT syntax. A possible way out is the reification of negative statements in the sense of "period with no occurrence of any B". Nevertheless, the temporal scope of such complex expressions remains fuzzy. This may be less problematic in practice where clinical codes are generally attributed to well-defined time intervals of treatment or hospitalization episodes. The SNOMED CT context model already addresses this kind of problems.
A more principled way of defining complex occurrents will help improve the quality of the terminology itself in terms of reducing errors and assuring its internal consistency. An assessment of the impact of such improvements on the quality of terminological classification is still speculative for two reasons. Firstly, large scale experiences of the use of description logic-based classification and reasoning are still missing. Secondly, most SNOMED CT descriptions of complex occurrents are still primitive ones, so that misclassifications are not to be supposed.
However, we expect that the issues addressed in this paper will gain relevance along with the maturing of the terminology and the development of more knowledge-intensive applications.
This work was supported by the EU Network of Excellence Semantic Interoperability and Data Mining in Biomedicine (NoE 507505).
This article has been published as part of BMC Medical Informatics and Decision Making Volume 8 Supplement 1, 2008: Selected contributions to the First European Conference on SNOMED CT. The full contents of the supplement are available online at http://www.biomedcentral.com/1472-6947/8?issue=S1.
- Spackman Kent, Campbell Keith: Compositional concept representation using SNOMED: Towards further convergence of clinical terminologies. AMIA'98 – Proc. of the 1998 AMIA Annual Fall Symposium. 1998, Philadelphia, PA: Hanley & Belfus, 740-744.Google Scholar
- Kent A: Spackman and Guillermo Reynoso. Examining SNOMED from the perspective of formal ontological principles: Some preliminary analysis and observations. KR-MED 2004 – Proc. of the 1st International Workshop on Formal Biomedical Knowledge Representation. 2004, Bethesda, MD: American Medical Informatics Association (AMIA), 72-80. [http://CEUR-WS.org/Vol-102/]Google Scholar
- Patel-Schneider Peter, Swartout Bill: Description logic knowledge representation system specification from the KRSS group of the ARPA knowledge sharing effort. Technical report, AT&T Bell Laboratories Report. 1993Google Scholar
- Baader Franz, Calvanese Diego, McGuinness Deborah, Nardi Daniele, Patel-Schneider Peter, editors: The Description Logic Handbook. Theory, Implementation, and Applications. 2003, Cambridge, U.K.: Cambridge University PressGoogle Scholar
- Horrocks Ian, Patel-Schneider Peter, van Harmelen Frank: From and RDF to OWL: The making of a Web ontology language. Journal of Web Semantics. 2003, 1 (1): 7-26.View ArticleGoogle Scholar
- Knublauch Holger, Dameron Olivier, Musen Mark: Weaving the biomedical Semantic Web with the protégé OWL plugin. KR-MED 2004 – Proc. of the 1st International Workshop on Formal Biomedical Knowledge Representation. 2004, Bethesda, MD: American Medical Informatics Association (AMIA), 39-47. [http://CEUR-WS.org/Vol-102/]Google Scholar
- Baader Franz, Lutz Carsten, Suntisrivaraporn Boontawee: CEL – a polynomial-time reasoner for life science ontologies. Proc. of the 3rd International Joint Conference on Automated Reasoning (IJCAR'06), volume 4130 of Lecture Notes in Artificial Intelligence. 2006, Springer-Verlag, 287-291.Google Scholar
- Grenon Pierre, Smith Barry, Goldberg Louis: Biodynamic ontology: applying BFO in the biomedical domain. Studies in health technology and informatics. Edited by: Pisanelli D. 2004, 102: 20-38.Google Scholar
- Spackman KA, Dionne R, Mays E, Weis J: Role grouping as an extension to the description logic of Ontolog, motivated by concept modeling in SNOMED. AMIA 2002 – Proc. of the Annual Symposium of the American Medical Informatics Association. Biomedical Informatics: One Discipline. San Antonio, TX, November 9–13, 2002. Edited by: Isaac S Kohane. 2002, Philadelphia, PA: Hanley & Belfus, 712-716.Google Scholar
- SNOMED Clinical Terms. Technical Implementation Guide. 2005, Northfield, IL: College of American PathologistsGoogle Scholar
- Schulz Stefan, Hanser Susanne, Hahn Udo, Rogers Jeremy: Semantic clarification of diseases and procedures in SNOMED CT. Methods of Information in Medicine. 2006, 45:Google Scholar
- Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector A, Rosse C: Relations in biomedical ontologies. Genome Biol. 2005, 6 (5): R46-10.1186/gb-2005-6-5-r46.PubMed CentralView ArticlePubMedGoogle Scholar
- Baader Franz, Brandt Sebastian, Lutz Carsten: Pushing the envelope. Proc. of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05. 2005, Edinburgh, UK, Morgan-Kaufmann Publishers,Google Scholar
- Schulz Stefan, Zaiss Albrecht, Brunner Ralph, Spinner Daniel, Klar Rüdiger: Conversion problems concerning automated mapping from ICD-10 to ICD-9. Methods of Information in Medicine. 1998, 37: 254-259.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.