CMDO was developed using one of the appropriate methodologies for conceptual modeling, the General Formal Ontology (GFO) method [27], which is a manual iterative process comprising five steps: (1) defining the scope of CMDO by conceptualizing its first-level terms (or classes), (2) identifying CMDO concepts, (3) assigning hierarchical relationships among CMDO concepts, (4) developing CMDO properties (e.g., synonyms, preferred terms, and definitions) for each CMDO concept, and (5) evaluating the utility of CMDO. All metadata used in our work registered in the CHMR (http://chmr2.snubi.org:8083/chmr/).
Defining the scope for CMDO
A clinical document is a record of a patient’s medical history and care. Every piece of evidence and background data related to the care can also be documented. It is the most-important source of information for clinical decision-making, communicating between healthcare providers, and addressing legal issues.
Clinical data can be captured, stored, accessed, displayed, and transmitted in clinical practices using clinical documents, which can be designed as a complex structure that comprises a multitude of data elements. We analyzed clinical documents to identify the key concepts that represented the DECs, which became the classes of our ontology. The detailed identifying process is described in the next section.
The typical process of clinical practice can be summarized as follows: The patient is registered at the time of initial contact, with information about his/her health-related problem (history) gathered while also focusing on the current illness, symptoms, and chief complaint. Healthcare providers then perform diagnostic or therapeutic procedures based on the information provided by the patient. This process involving procedures, observations, and testing is repeated until the end of treatment. Events such as admission, discharge, or adverse drug reactions can occur during this interaction process, and the characteristics of these events usually vary between the different general environments of healthcare. By analyzing this series of clinical processes we found that clinical information could be categorized using the following four main terms: (1) Procedure, (2) Finding, (3) Event, and (4) Description. These were used as the first-level terms of our ontology: Procedure includes all treatments or actions taken to prevent or treat disease, or improve health in other ways; Finding includes the collected total of physical and psychological measurements of the patient surveyed or acted on by a medical doctor; Event includes all things that happen at a given place and time in a medical situation; and Description includes a detailed account of the particular characteristics or symptoms of a patient.
Identifying CMDO concepts
The CDE is the atomic unit of data and is associated with a DEC (an abstract unit of knowledge for representing semantics) and a VD (representation of data including the data type and permissible values) according to the ISO/IEC 11179 standard.
We identified CMDO concepts using a representative concept (DEC) of data elements (CDEs) from the metadata registry (CHMR). In particular, we selected clinical documents from Seoul National University Hospital (SNUH) related to CDEs from among all of the CDEs in the CHMR in order to query and examine DECs. The frequency of clinical document usage was determined, and only SNUH clinical documents that had been used more than 10 times between January and August 2010 in each hospital department were selected so that the results would be applicable to as many medical disciplines as possible. This approach resulted in 27,109 CDEs being extracted from 663 SNUH clinical documents.
We manually extracted common concepts that were counted more than twice from the DECs while considering whether it was reasonable to subordinate them to first-level terms of CMDO, and chose them as CMDO concepts, which are the child terms of each first-level term. These concepts were reviewed and selected by two medical doctors and two medical informatics researchers. These individuals had an average of 5 years of experience working in family medicine, laboratory medicine, and psychiatry, and were guided to select reasonable subordinate concepts under the four first-level terms of CMDO. For example, we classified Description into the following 10 child terms that are readily accepted by most clinicians in SNUH as representing this class: Advance Directives, Alerts, Assessment, Chief Complaint, Demographics, Encounter, Immunization, Past Medical History, Present Illness, and Vital Signs. We performed this process of identifying child terms repeatedly until optimal semantic granularity was achieved.
Assigning relationships among CMDO concepts
CMDO is formally structured as a hierarchical tree structure, with a root value and subtrees of child nodes with a parent node. We assigned an is-a relationship between CMDO concepts by applying the following process: Terms that appeared to be in a subordinates–superiors relationship were determined to be in an is-a relationship involving two medical doctors and two medical informatics researchers. Figure 1 presents a graphical representation of the CMDO classification showing the Allergy Test from Finding as a parent concept being assigned to Allergy History derived largely from Description.
Development of CMDO properties
We created two CMDO properties (synonyms and definitions) for each CMDO concept by referencing the UMLS Metathesaurus and Wikipedia. UMLS has Concept Unique Identifier (CUI), and terms with the same CUI can be grouped together since they are semantically equivalent [28].
When using a UMLS CUI we found synonyms that were flagged in the relationship (REL = ‘same-as’ or ‘possibly-equivalent-to’) column of the MRREL table and in the Term Type in the Source (TTY = ‘SY’) column of the MRCONSO table. We also found definitions that were flagged in the definition (DEF) column of the MRDEF table. For CMDO concepts that were not assigned to a UMLS CUI, we either used Wikipedia or manually described synonyms and definitions used by expert medical doctors.
We also created synonyms for each CMDO concept by reflecting hierarchical structure. During the process of developing hierarchical relationships, identified CMDO concepts were modified to have synonyms to reflect superordinate terms. For example, Result of Physical Examination has Breast as a child term. In this hierarchical structure, Breast refers to a result from a physical examination of the breast, and not to the anatomical structure of the breast. We therefore added the CMDO synonym term as Breast to Result of Physical Examination of Breast.
Evaluation scheme
We used two clinical document sets to evaluate CMDO: (1) 6 documents from HL7 templates [Operation Note (2009), Consultation Note (2008), Discharge Summary (2009), History and Physical (2008), Procedure Note (2010), and Progress Note (2010)] and (2) 25 clinical documents, of which 5 were Admission Note, Outpatient Note, Discharge Note, Emergency Note, and Operation Note documents from 5 teaching hospitals in South Korea (SNUH, Pusan National University Hospital, Ajou University Hospital, Chonnam National University Hospital, and Gachon University Gil Hospital). These 5 documents from SNUH and 663 clinical documents mentioned in the Methods section were mutually exclusive. Additional file 1: Table S1 lists the names of the clinical documents that were used for constructing and evaluating CMDO.
To evaluate the suitability of CMDO for facilitating the classification and integration of CDEs, we first applied CMDO annotations to the 96 and 559 CDEs extracted from the 2 clinical document sets. The CMDO annotation process was performed by two independent nurses while considering the most-granular terms in CMDO (where this was possible). Each CDE could be annotated with multiple CMDO concepts. The two nurses who performed the evaluation were certified medical record administrators who had an average of 5 years of work experience. We allowed all cases of agreement or disagreement among these two annotators as following examples. For example, the two annotators chose similar results for the data element ‘Secondary Sexual Character of Adolescent Type Category’ in an Admission Note at Ajou University Hospital, with one nurse choosing Description|Past history|Developmental history and the other choosing Description|Past history|Developmental history and Description|Past history|Social history. However, there was also a case of disagreement, in that for the data element ‘Estimated Blood Loss’ in an Operation Note from the HL7 template, one nurse chose Procedure|Surgery and the other chose Finding|Surgery|Problem.
Two administrators of medical records separately validated the two CMDO annotation sets. To complete the CMDO annotation process, at least two medical informatics researchers confirmed the above-four CMDO annotation sets and rated their coverage of CMDO into the following categories: adequate, too broad (i.e., first-level terms or general second-level terms), or too specific (i.e., terminal-node terms that were used infrequently). We also examined whether one kind of clinical document (the PHR) could be classified by CMDO.