- Research article
- Open Access
A markup language for electrocardiogram data acquisition and analysis (ecgML)
BMC Medical Informatics and Decision Making volume 3, Article number: 4 (2003)
The storage and distribution of electrocardiogram data is based on different formats. There is a need to promote the development of standards for their exchange and analysis. Such models should be platform-/ system- and application-independent, flexible and open to every member of the scientific community.
A minimum set of information for the representation and storage of electrocardiogram signals has been synthesised from existing recommendations. This specification is encoded into an XML-vocabulary. The model may aid in a flexible exchange and analysis of electrocardiogram information.
Based on advantages of XML technologies, ecgML has the ability to present a system-, application- and format-independent solution for representation and exchange of electrocardiogram data. The distinction between the proposal developed by the U.S Food and Drug Administration and ecgML model is given. A series of tools, which aim to facilitate ecgML-based applications, are presented.
The models proposed here can facilitate the generation of a data format, which opens ways for better and clearer interpretation by both humans and machines. Its structured and transparent organisation will allow researchers to expand and test its capabilities in different application domains. The specification and programs for this protocol are publicly available.
Electrocardiogram (ECG) data are acquired, stored and analysed using different formats and software platforms. Medical informatics will fully exploit the benefits from its research only when data can be openly shared and interpreted. Therefore, there is a need to develop cross-platform solutions to support biomedical training, decision-making and telemedicine applications .
An important goal is to describe these data independently on the number of channels, instrumentation platform or type of experiments. Moreover, an ECG record should also include annotations relating to the acquisition protocols, patient information and analysis results. These data modelling tasks should consist of flexible and inexpensive tools to enhance pattern recognition capabilities.
The development of these systems will depend on the existence of information that clearly specifies domain terminologies, functional hierarchies and decision rules. The availability of such ontological representations  will allow the emergence of standards, which will facilitate the integration of information on a global communication infrastructure.
ECG data have been traditionally recorded using flat file formats, such as the MIT-BIH file library . This type of data format lacks the information necessary to support a meaningful analysis, interoperability and integration of multiple resources. Different governmental, academic and private organisations have proposed minimum requirements for the representation and storage of biomedical information, including signals and images . These efforts aimed to promote the application of standards for message exchange and data integration. In 1993, for example, the CEN/TC251 WG3 (Comité Européen de Normalisation European, Committee for Standardisation, Technical Committee 251) reviewed several data exchange formats for healthcare applications. It includes Abstract Syntax Notation (ASN.1) and Health Level Seven (HL7) . The former defines norms to describe an electronic message based on different data types. One of the disadvantages of ASN.1 is that it does not fully support scalable solutions and query processing. HL7 has been a Standards Development Organisation affiliated to the ANSI (American National Standards Organisation) since 1997 and has become the standard for electronic exchange of historical and administrative data in health services worldwide. The next generation of the messaging standard (V3) has been under development since.
CORBAmed, the Healthcare Domain Task Force of the Object Management Group (OMG) , deals with interoperability problems between heterogeneous information systems. To facilitate the seamless and automated data exchange between numerous applications, a common interface architecture was developed that serves in a number of today's information systems. Liaisons have been established with other organisations such as HL7.
The Digital Imaging and Communications in Medicine (DICOM) standards committee supports the achievement of data compatibility between imaging systems and other healthcare information at different levels. This standard has been applied by many private organisations, which need to incorporate diverse bio-signals associated to medical imaging. The DICOM standard is a useful resource that also provides guidelines on how to represent ECG features .
More recently, the eXtensible Markup Language (XML)  has been suggested as a promising approach to representing biomedical data. Developed as a subset of SGML in 1996 to "be straightforwardly usable over the Internet" and published as a first recommendation by the W3C (World Wide Web Consortium) in 1998, XML soon became a ubiquitous syntax for data and data-exchange over the Internet. Since then, XML-based Markup Languages, specified as e.g. Document Type Definitions (DTD) or XML Schemas (XSD) have been emerging in unlimited numbers and in nearly every imaginable domain . Advantages of XML syntax include platform-, vendor- and application independence as well as an easy-to-follow hierarchical data structure and wide support. "XML's greatest advantage is that it is a user-driven, open standard for exchanging data both over corporate networks and between different enterprises, notably over the Internet. XML's biggest potential lies undoubtedly in its ability to mark up mission-critical document elements self-descriptively" . By following a strict separation of content and presentation information, XML technologies increase the re-usability of information in its purest way as access to the original (raw) data is always given. The use of XML syntax for the exchange of electronic patient records was shown in all aspects in Synapses  and SynEx  project implementations [12–14].
The U.S Food and Drug Administration (FDA) Centre for Drug Evaluation and Research has proposed recommendations for the exchange of time-series data. It includes a hierarchical structure for the representation of signals, including ECG data, which may be encoded as an XML file. This protocol focuses on the acquisition of multiple records from different subjects within a single file [15, 16]. The HL7 committee has been actively cooperating with the World Wide Web Consortium (W3C) to define XML guidelines to represent medical information . HL7 has endorsed the Clinical Document Architecture (CDA), which supports the generation and exchange of clinical messages . Other XML-based initiatives for the representation and distribution of biomedical information are: The ASTM E31.25 subcommittee , the CEN/TC251 Task Force on XML Applications in Healthcare  and the Clinical Data Interchange Standards Consortium (CDISC) . However, these efforts have not focused on ECG data. Some of them place a greater emphasis on the administrative and financial transactions associated with a clinical environment.
Recent advances include I-Med, which is an XML-based format for clinical data . This project consists of a domain-independent interface for exchanging several types of medical information. Its major goal is to provide a unique platform for clinical transactions. These messages can include ECG records, which may be described by basic features, such as QRS duration and text-based interpretations. One major limitation of this solution is that it partially addresses important ECG data content-definitions.
This article introduces a markup language for supporting ECG data exchange and analysis (ecgML). It synthesises key recommendations specified by the initiatives presented above.
There is a need to harmonise the representation of digital ECG data originating from the full spectrum of devices along with annotations for events, and to include necessary associated information, such as patient identification, interpretation and other clinical data. The hierarchical data tree structures depicted in Figures 1 to 6 are proposed to address such concerns. Tables 1 to 8 describe the elements and attributes defined in this model. In this paper terms written in bold and italic prints represent either XML element or attribute names. Element names should be words concatenated with the first letter of each word capitalised (UpperCamelCase, http://searchwebservices.techtarget.com/sDefinition/0,290660,sid26_gci824363,00.html). Attribute names satisfy the same rule except for the first word (lowerCamelCase, http://searchwebservices.techtarget.com/sDefinition/0,,sid26_gci824366,00.html).
Each patient record starts with a root element ECGRecord, which is uniquely identified by its attribute studyID. The StudyDateand StudyTimeelements represent the latest time record of the study of the ECG recording. Diagnosiscontains a text version of the latest diagnostic interpretation of the ECG, while MedicalHistoryis a description of medical history of patient's clinical problems and disgnoses. There are two main components for each record: one PatientDemographicand one or more Recordcomponents. It is worth noting that each record can have only one PatientDemographicelement, which would be kept updated all the time; while multiple Recordelements are allowed to be held in one patient record. This opens up every opportunity to keep track of the history of the patient's diagnoses.
PatientDemographiccontains information of general interest concerning the person from whom the recording is obtained, such as demographic data (e.g. patientID, Name, etc.) and contact information (e.g. Address, etc.). This component is required in each record.
Recordrepresents the physical storage for the basic content of an ECG recording. The AcquisitionDateand AcquisitionTimeattributes specify the acquisition date and time for each record, which makes it possible to include multiple time-related ECG recordings within a file. investigatorIDand siteIDare used to identify who is responsible for the recording and where it is acquired. There are three main components: zero-or-one RecordingDevice, zero-or-one ClinicalProtocol, and one-or-more RecordDate. Such flexible structure allows each recording to have its own characteristics.
RecordingDeviceis an optional element, which describes the device that generated the data. It should support the full spectrum of ECG devices, including standard 12-lead ECGs, Holter monitors, transtelephonic monitors and implanted devices. The main components in this section include deviceID, Type, Manufacturer, Modeland a description of filtering technique used during the ECG acquisition (e.g. BaselineFilterand LowpassFilter).
ClinicalProtocolis an optional element, which may include information relating to a patient's clinical report. The unitattribute of each element is used to describe the measurement unit of each observation. Currently, this section only includes basic clinical dimensions, such as DiastolicBPand HeartRate. However, other variables can be easily added.
RecordDatais a key ecgML element. There can be multiple RecordDataelements within a file, which are identified by their Channelelement names. The DICOM lead labelling format is recommended for this purpose. RecordDataincludes three main sub-components: Waveforms, Annotationsand Measurements.
Based on the FDA-recommended PlotGroup format , Waveformsare represented by a series of values along two dimensions X, Y (XValuesand YValues). Based on these values, a plot of voltage vs. time may be generated with a viewer program. The XValues(time) are evenly spaced. Xoffsetrepresents the initial value. SampleRaterepresents the sampling frequency measured in Hz. The duration of a channel signal is represented by the element Duration. ecgML supports three formats to represent YValues: a RealValueelement, a BinaryDataelement (associated with a specified encodingscheme, which may be base64 or hexadecimal), and a FileLinkto refer to an external file.
The elements Fromand To, which are encoded into the elements BinaryDataand RealValue, illustrate the beginning and ending values of the corresponding waveform. The Scaleassociated with BinaryDataindicates how to convert the binary YValuesinto real values. The element Datain RealValuecontains a list of float data separated by delimiters, representing the real value of each sample ECG data.
Annotationswould typically be used to describe events specific to the corresponding channel. It defines a time point or interval, which can be used for performing the measurements. This consists of a collection of PointNotationand WaveNotationelements. Each PointNotationcan be specified with a PointLabel(the name of the specific point, e.g. P wave onset), a XValue(time, expressed as HH:MM:SS.SSS format), YValue(amplitude in mV) and any relevant comment. WaveNotationincludes descriptions for basic ECG waves, such as Pwave, QRSwave, Twave, Uwave, and other events that can be defined by the user (OtherWave). Wave descriptions are based on the following five elements: Onset(the beginning value), Peak(the peak value, for a T wave, it is possible to have two Peakvalues), Offset(the ending value), Annotation(annotation for the specified wave, such as "normal" or "abnormal"), and any comments on the annotation are given using the Commentelement. The value of Onset, Peak, and Offsetcan be expressed as either time or sample values.
The Measurementselement contains a list of Values(the measurements of each recorded channel). Each Valueselement may be associated with a labeland a measurement unit.
There are different levels at which a record can define supplementary information. A Commentat the ECGRecordlevel can be used to indicate additional acquisition information, for example, place and technical conditions of the acquisition process. A Commentat the YValueslevel may typically be used to define the format of the representation of the YValues, e.g. which delimiter is used. A Commentat the Measurementlevel may be used to describe, for example, whether a measurement is a global average or an instantaneous value.
This research applies the DICOM recommendation for defining ECG channel names, fiducial point markers and waveform encoding details. Moreover, it applies the Unified Code for Units of Measure (UCUM) scheme for defining measurement units, such as cm for Heightand mV for YValues when appropriate.
This specification has been encoded into an XML-based data protocol. Additional files 1 and 2 are the DTD and XSD files for ecgML respectively. Additional file 3 is an ECG record, which has been generated using ecgML.
Evaluation of the model
It is fundamental to demonstrate the system-, application- and format-independence of ECG data when using ecgML. Special importance should be given to illustrate the autonomy of content from its presentational scheme, e.g. printed graphs, tabular data to be imported into data mining systems for further analysis or audio files. Figure 7 illustrates the distinction separation of the five important components in XML publishing. Based on advantages of XML technologies, ecgML exhibits a remarkable advantage over existing systems where every information system has its own internal information-model and information is merged and intertwined with its representation format. Figure 8 exemplifies a scenario where the raw ECG data is kept in an ecgML data file and therefore independently from possible presentation information. Various XSLT transformations (stored as XSL files and applied on the fly, transparent to the user) convert the ecgML source into user- and/or application-specific data formats, such as MPEG (audio), MatLab (text) and SVG/PNG (graphics). The centralised storage of the ECG record and dynamic creation of data representations avoids redundancy.
The FDA, together with a number of other institutions, has developed and published an XML vocabulary  to represent collected time-series data. However, there are some significant differences between the FDA proposal and ecgML. The FDA proposal is intended to represent collected biological data, including ECG, electroencephalogram (EEG), or other time series data such as temperature, pressure and oxygen saturation. The main goal is to facilitate the submission of the biological data and to make sure that accuracy and consistency of the measurements made from the collected biological data is achieved. It is important for the FDA to view the biological data in an appropriate way. Thus, the data model (specified in a DTD) includes some presentation information, such as elements MinorTickInterval, MajorTickInterval and LogScale. On the other hand, the purpose of ecgMLis to develop anopen and transparent way of representing, exchanging and mining ECG data. Therefore, ecgML not only consists of basic components, which may be used to perform knowledge discovery in ECG data (e.g. ClinicalProtocol, Diagnosisand Measurements) but also follows the principle of separating content and presentation information, which will exhibit great advantages when using ecgML in combination with inter-media transformation.
A series of tools are being developed to assist users in exploiting ecgML-based applications. These include an XML-based ECG record generator, ECG parser and ECG viewer. The generator will automatically produce XML-based ECG records from existing ECG databases, e.g. the MIT-BIH database . The ECG parser allows the user reading the ECG records and access their contents and structure, whereas the ECG viewer provides onscreen display of the required waveform data (Figure 9). It shows all annotation information of the individual waveform. The hierarchical structure of the XML-based ECG record is displayed. It can be expanded and shrunk at any level. This interface can also show individual episodes of the ECG waveform chosen from the ecgML structure. The viewer tool graphically locates boundaries (i.e. beginning, peak, and end) of the P, QRS and T waveforms for each selected QRS complex.
ecgML will enable the seamless integration of ECG data into electronic patient records (EPRs) and medical guidelines. This protocol can support data exchange between different ECG acquisition and visualisation devices. Similarly, it may enable data mining using heterogeneous software platforms and applications. The data and metadata contained in an ecgML record may be useful to improve pattern recognition in ECG applications. It would also aid the implementation of automated decision support models such as case-based reasoning. Figure 10 illustrates the utilisation of map files to convert "raw" ecgML files into customised output formats, which will be imported into data mining systems for further analysis. ecgML may also be significant for problems such as future proof storage, context-sensitive (textual) search of patterns in ECG data, and its native inclusion into medical guidelines. Further research will address the following issues.
• How does ecgML affect storage capacity?
• Does on-the-fly compression (as used by HTTP 1.1) make a difference in terms of transmission speed?
• Is it feasible to use ecgML in applications such as 24 hour monitoring?
• Does ecgML data contain all the significant information required for ECG analysis?
Värni A, Kemp B, Penzel T, Schlögl A: Standards for biomedical signal databases. IEEE Engineering in Medicine and Biology. 2001, 20 (3): 33-37. 10.1109/51.932722.
Ontology Homepage. [http://www.ontology.org/index.html]
Physionet Homepage. [http://www.physionet.org/]
NEMA's OFFICIAL DICOM WEB Page. [http://medical.nema.org/dicom.html]
Health Level Seven Homepage. [http://www.hl7.org]
CORBAmed Homepage. [http://www.acl.lanl.gov/OMG/]
Extensible Markup language (XML). [http://www.w3.org/xml/]
XML Core Standards. [http://xml.coverpages.org/coreStandards.html]
Schroeter G: How XML is improving data exchange in healthcare. [http://www.softwareag.com/xml/library/schroeter_healthcare.htm]
Synapses Homepage. [http://www.cs.tcd.ie/synapses/public/]
SynEx Homepage. [http://www.gesi.it/synex/]
Jung B, Grimson J: Synapses/SynEx goes XML. In Proceedings of the Medical Informatics Europe '99 Conference (MIE99): August, 1999; Slovenia, Ljubljana. Edited by: Peter K, Blaz Z, Janez S, Marjan P, Rolf E. 1996, IOS Press, 906-911.
Jung B, Andersen EP, Grimson J: Using XML for Seamless Integration of Distributed Electronic Patient Records. In Proceedings of XML Scandinavia conference: May 2000; Gothenburg, Sweden. 2000
Grimson J, Stephens G, Jung B, Grimson W, Berry D, Pardon S: Sharing Health-Care Records over the Internet. IEEE Internet Computing. 2001, 5 (3): 49-58. 10.1109/4236.935177.
FDA application: Proposed Standard for Exchange ofElectrocardiographic and Other Time-Series Data. [http://www.fda.gov/cder/regulatory/ersr/ECGdata.htm]
FDA XML Data Format Design Specification. [http://www.cdisc.org/discussions/EGC/FDA_XML_Data_Format_Design_Specification_DRAFT_B.pdf]
Health Level Seven XML Patient Record Architecture. [http://xml.coverpages.org/hl7PRA.html]
HL7 to release first XML-based standard for healthcare. [http://xml.coverpages.org/hl7CDA-Ann.html]
ASTM, subcommittee E31.25. [http://www.astm.org/COMMIT/COMMITTEE/E31.htm]
Dudeck J: TC 251 task force on XML application in healthcare. CEN/TC251 Task Force XML-Final Report. 1999, [http://www.centc251.org/TCMeet/Doclist/TCdoc99/N99-067.doc]
Clinical Data Interchange Standards Consortium. [http://www.cdisc.org/]
I-Med Homepage. [http://www.hnbe.com/healthweb/imedpub/]
Schadow G, McDonald CJ: The Unified Code for Units of Measure. [http://aurora.rg.iupui.edu/~schadow/units/UCUM/ucum.html]
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/3/4/prepub
HW co-designed and implemented ecgML (DTD and XSD files), developed support tools and drafted the manuscript. FA conceived the study, participated in the design of the model and drafted the manuscript. BJ helped to refine ecgML, brought expertise in XML and EPRs, and help to draft the paper. NB participated in the coordination of this study and contributed to the preparation of this manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Wang, H., Azuaje, F., Jung, B. et al. A markup language for electrocardiogram data acquisition and analysis (ecgML). BMC Med Inform Decis Mak 3, 4 (2003). https://doi.org/10.1186/1472-6947-3-4
- Electronic Patient Record
- Document Type Definition
- Clinical Document Architecture
- Data Mining System
- Clinical Data Interchange Standard Consortium