BMC Medical Informatics and Decision Making

Background: The storage and distribution of electrocardiogram data is based on different formats. There is a need to promote the development of standards for their exchange and analysis. Such models should be platform-/ system-and application-independent, flexible and open to every member of the scientific community.


Background
Electrocardiogram (ECG) data are acquired, stored and analysed using different formats and software platforms. Medical informatics will fully exploit the benefits from its research only when data can be openly shared and interpreted. Therefore, there is a need to develop cross-platform solutions to support biomedical training, decisionmaking and telemedicine applications [1].
An important goal is to describe these data independently on the number of channels, instrumentation platform or type of experiments. Moreover, an ECG record should also include annotations relating to the acquisition protocols, patient information and analysis results. These data modelling tasks should consist of flexible and inexpensive tools to enhance pattern recognition capabilities.
The development of these systems will depend on the existence of information that clearly specifies domain terminologies, functional hierarchies and decision rules. The availability of such ontological representations [2] will allow the emergence of standards, which will facilitate the integration of information on a global communication infrastructure.
ECG data have been traditionally recorded using flat file formats, such as the MIT-BIH file library [3]. This type of data format lacks the information necessary to support a meaningful analysis, interoperability and integration of multiple resources. Different governmental, academic and private organisations have proposed minimum requirements for the representation and storage of biomedical information, including signals and images [4]. These efforts aimed to promote the application of standards for message exchange and data integration. In 1993, for example, the CEN/TC251 WG3 (Comité Européen de Normalisation European, Committee for Standardisation, Technical Committee 251) reviewed several data exchange formats for healthcare applications. It includes Abstract Syntax Notation (ASN.1) and Health Level Seven (HL7) [5]. The former defines norms to describe an electronic message based on different data types. One of the disadvantages of ASN.1 is that it does not fully support scalable solutions and query processing. HL7 has been a Standards Development Organisation affiliated to the ANSI (American National Standards Organisation) since 1997 and has become the standard for electronic exchange of historical and administrative data in health services worldwide. The next generation of the messaging standard (V3) has been under development since.
CORBAmed, the Healthcare Domain Task Force of the Object Management Group (OMG) [6], deals with interoperability problems between heterogeneous information systems.
To facilitate the seamless and automated data exchange between numerous applications, a common interface architecture was developed that serves in a number of today's information systems. Liaisons have been established with other organisations such as HL7.
The Digital Imaging and Communications in Medicine (DI-COM) standards committee supports the achievement of data compatibility between imaging systems and other healthcare information at different levels. This standard has been applied by many private organisations, which need to incorporate diverse bio-signals associated to medical imaging. The DICOM standard is a useful resource that also provides guidelines on how to represent ECG features [4].
More recently, the eXtensible Markup Language (XML) [7] has been suggested as a promising approach to representing biomedical data. Developed as a subset of SGML in 1996 to "be straightforwardly usable over the Internet" and published as a first recommendation by the W3C (World Wide Web Consortium) in 1998, XML soon became a ubiquitous syntax for data and data-exchange over the Internet. Since then, XML-based Markup Languages, specified as e.g. Document Type Definitions (DTD) or XML Schemas (XSD) have been emerging in unlimited numbers and in nearly every imaginable domain [8]. Advantages of XML syntax include platform-, vendor-and application independence as well as an easy-to-follow hierarchical data structure and wide support. "XML's greatest advantage is that it is a user-driven, open standard for exchanging data both over corporate networks and between different enterprises, notably over the Internet. XML's biggest potential lies undoubtedly in its ability to mark up mission-critical document elements self-descriptively" [9]. By following a strict separation of content and presentation information, XML technologies increase the re-usability of information in its purest way as access to the original (raw) data is always given. The use of XML syntax for the exchange of electronic patient records was shown in all aspects in Synapses [10] and SynEx [11] project implementations [12][13][14].
The U.S Food and Drug Administration (FDA) Centre for Drug Evaluation and Research has proposed recommendations for the exchange of time-series data. It includes a hierarchical structure for the representation of signals, including ECG data, which may be encoded as an XML file. This protocol focuses on the acquisition of multiple records from different subjects within a single file [15,16]. The HL7 committee has been actively cooperating with the World Wide Web Consortium (W3C) to define XML guidelines to represent medical information [17]. HL7 has endorsed the Clinical Document Architecture (CDA), which supports the generation and exchange of clinical messages [18]. Other XML-based initiatives for the representation and distribution of biomedical information are: The ASTM E31.25 subcommittee [19], the CEN/TC251 Task Force on XML Applications in Healthcare [20] and the Clinical Data Interchange Standards Consortium (CDISC) [21]. However, these efforts have not focused on ECG data. Some of them place a greater emphasis on the administrative and financial transactions associated with a clinical environment.
Recent advances include I-Med, which is an XML-based format for clinical data [22]. This project consists of a domain-independent interface for exchanging several types of medical information. Its major goal is to provide a unique platform for clinical transactions. These messages can include ECG records, which may be described by basic features, such as QRS duration and text-based interpretations. One major limitation of this solution is This article introduces a markup language for supporting ECG data exchange and analysis (ecgML). It synthesises key recommendations specified by the initiatives presented above.

Methods
There is a need to harmonise the representation of digital ECG data originating from the full spectrum of devices along with annotations for events, and to include necessary associated information, such as patient identification, interpretation and other clinical data. Each patient record starts with a root element ECGRecord, which is uniquely identified by its attribute studyID. The StudyDate and StudyTime elements represent the latest time record of the study of the ECG recording. Diagnosis contains a text version of the latest diagnostic interpretation of the ECG, while MedicalHistory is a description of medical history of patient's clinical problems and disgnoses. There are two main components for each record: one PatientDemographic and one or more Record components. It is worth noting that each record can have only one PatientDemographic element, which would be kept updated all the time; while multiple Record elements are allowed to be held in one patient record. This opens up every opportunity to keep track of the history of the patient's diagnoses.
PatientDemographic contains information of general interest concerning the person from whom the recording is obtained, such as demographic data (e.g. patientID, Name, etc.) and contact information (e.g. Address, etc.). This component is required in each record.
Record represents the physical storage for the basic content of an ECG recording. The AcquisitionDate and Acqui-sitionTime attributes specify the acquisition date and time for each record, which makes it possible to include multiple time-related ECG recordings within a file. inves-tigatorID and siteID are used to identify who is responsible for the recording and where it is acquired. There are three main components: zero-or-one RecordingDevice, RecordingDevice is an optional element, which describes the device that generated the data. It should support the full spectrum of ECG devices, including standard 12-lead ECGs, Holter monitors, transtelephonic monitors and implanted devices. The main components in this section include deviceID, Type, Manufacturer, Model and a description of filtering technique used during the ECG acquisition (e.g. BaselineFilter and LowpassFilter).
ClinicalProtocol is an optional element, which may include information relating to a patient's clinical report.
The unit attribute of each element is used to describe the measurement unit of each observation. Currently, this section only includes basic clinical dimensions, such as DiastolicBP and HeartRate. However, other variables can be easily added.
RecordData is a key ecgML element. There can be multiple RecordData elements within a file, which are identified by their Channel element names. The DICOM lead labelling format is recommended for this purpose. RecordData includes three main sub-components: Waveforms, Annotations and Measurements.      ), Annotation (annotation for the specified wave, such as "normal" or "abnormal"), and any comments on the annotation are given using the Comment element. The value of Onset, Peak, and Offset can be expressed as either time or sample values.
The Measurements element contains a list of Values (the measurements of each recorded channel). Each Values el-

Annotations
Annotations for each ECG record. Based on FDA XML Data Format Specification (revision C).

PointNotation
A set of fiducial points with an X and Y position.

PointLabel
Name of the fiducial points. Annotations for interval measurements.

Pwave
The annotations of P wave (onset, offset, peak, annotation, comment) Optional See Table 9 Normal QRSwave The annotations of QRS wave (onset, offset, peak, annotation, comment) Required See Table 9 PVC Twave The features of T wave (onset, offset, peak, annotation, comment) Optional See Table 9 inverted Uwave The annotations of U wave (onset, offset, peak, annotation, comment) Optional See Table 9 Normal OtherWave The annotations for other duration (onset, offset, peak, annotation, comment).
Optional See Table 9   This specification has been encoded into an XML-based data protocol. Additional files 1 and 2 are the DTD and XSD files for ecgML respectively. Additional file 3 is an ECG record, which has been generated using ecgML.

Evaluation of the model
It is fundamental to demonstrate the system-, applicationand format-independence of ECG data when using ecgML. Special importance should be given to illustrate the autonomy of content from its presentational scheme, e.g. printed graphs, tabular data to be imported into data mining systems for further analysis or audio files. Figure 7 illustrates the distinction separation of the five important components in XML publishing. Based on advantages of XML technologies, ecgML exhibits a remarkable advantage over existing systems where every information system has its own internal information-model and information is merged and intertwined with its representation format. Figure 8 exemplifies a scenario where the raw ECG data is kept in an ecgML data file and therefore independently from possible presentation information. Various XSLT transformations (stored as XSL files and applied on the fly, transparent to the user) convert the ecgML source into user-and/or application-specific data formats, such as MPEG (audio), MatLab (text) and SVG/PNG (graphics). The centralised storage of the ECG record and dynamic creation of data representations avoids redundancy.
The FDA, together with a number of other institutions, has developed and published an XML vocabulary [16] to represent collected time-series data. However, there are some significant differences between the FDA proposal and ecgML. The FDA proposal is intended to represent collected biological data, including ECG, electroencepha-logram (EEG), or other time series data such as temperature, pressure and oxygen saturation. The main goal is to facilitate the submission of the biological data and to make sure that accuracy and consistency of the measurements made from the collected biological data is achieved. It is important for the FDA to view the biological data in an appropriate way. Thus, the data model (specified in a DTD) includes some presentation information, such as elements MinorTickInterval, Ma-jorTickInterval and LogScale. On the other hand, the purpose of ecgMLis to develop anopen and transparent way of representing, exchanging and mining ECG data. Therefore, ecgML not only consists of basic components, which may be used to perform knowledge discovery in ECG data (e.g. ClinicalProtocol, Diagnosis and Measurements) but also follows the principle of separating content and presentation information, which will exhibit great advantages when using ecgML in combination with inter-media transformation.

Accompanying tools
A series of tools are being developed to assist users in exploiting ecgML-based applications. These include an XML-based ECG record generator, ECG parser and ECG viewer. The generator will automatically produce XMLbased ECG records from existing ECG databases, e.g. the MIT-BIH database [3]. The ECG parser allows the user reading the ECG records and access their contents and structure, whereas the ECG viewer provides onscreen display of the required waveform data (Figure 9). It shows all annotation information of the individual waveform. The hierarchical structure of the XML-based ECG record is displayed. It can be expanded and shrunk at any level. This interface can also show individual episodes of the ECG waveform chosen from the ecgML structure. The viewer tool graphically locates boundaries (i.e. beginning, peak, and end) of the P, QRS and T waveforms for each selected QRS complex.

Conclusions
ecgML will enable the seamless integration of ECG data into electronic patient records (EPRs) and medical guidelines. This protocol can support data exchange between different ECG acquisition and visualisation devices. Similarly, it may enable data mining using heterogeneous software platforms and applications. The data and metadata contained in an ecgML record may be useful to improve pattern recognition in ECG applications. It would also aid the implementation of automated decision support models such as case-based reasoning. Figure 10 illustrates the utilisation of map files to convert "raw" ecgML files into customised output formats, which will be imported into data mining systems for further analysis. ecgML may also be significant for problems such as future proof storage, context-sensitive (textual) search of patterns in ECG data, and its native inclusion into medical guidelines. Further research will address the following issues.
• How does ecgML affect storage capacity?
• Does on-the-fly compression (as used by HTTP 1.1) make a difference in terms of transmission speed?
• Is it feasible to use ecgML in applications such as 24 hour monitoring?
• Does ecgML data contain all the significant information required for ECG analysis? design of the model and drafted the manuscript. BJ helped to refine ecgML, brought expertise in XML and EPRs, and help to draft the paper. NB participated in the coordination of this study and contributed to the preparation of this manuscript. All authors read and approved the final manuscript.

Figure 10
Converting XML-based ECG record into tabular data using map files. Notations for all tree diagrams are illustrated as follows. Lines of descriptive text outside an element box indicate attributes that the element should have. Default value is shown underlined.