Data transfer between electronic systems for data capture is a crucial functionality. S2O converts the statistical spreadsheet-based format IBM SPSS into a standard format for electronic data capture in clinical trials. The tool supports and promotes the manual transformation process. SPSS is a very popular format and supported by several statistic programs. For instance, statistic courses are held in front of medical students mainly in SPSS to prepare them for performing scientific data collections and different analyses. In addition, SPSS allows importing data from several applications such as Excel or Lotus spreadsheets, STATA, dBASE and SAS. On the other hand applications like SAS or R are capable to export data into SPSS format. For these reasons SPSS was chosen as a source format for the conversion with S2O.
In S2O the IBM SPSS internal library was used for the development of the converter and to access the SPSS values. Promising approaches from database research like schema or ontology matching [18] could not or only tediously be applied since SPSS offers no semantic annotation or ontology capabilities.
When integrating an existing SBDC into a common EDC system, the S2O converter eliminates the drawback of cumbersome and error-prone manual transformation of variables and clinical values by the transformation of SPSS into the CDISC ODM format. Furthermore, it fosters the use of regulatory-compliant EDC systems with key benefits like access for multiple users, data security and traceability of entered data. Nevertheless, data from SBDC applications needs to be examined carefully before upload into EDC systems.
Overall, we would advise researchers to refrain from utilizing spreadsheet software like Excel or OpenOffice and statistics software with spreadsheet-based data collection like SPSS or SAS as a primary tool for data capture in any research project. Open-source EDC systems like OpenClinica [27] or REDCap [28] as well as commercial EDC tools are available and allow importing subject data via ODM. These tools need some efforts but are eligible avoiding problems and drawbacks of SBDC software.
Strength and weaknesses
S2O covers the transformation of all relevant meta-information regarding SPSS variables and the values itself into the CDISC ODM format. SBDC systems usually contain a flat list of variables, whereas the ODM-format is hierarchically constructed. Hence, data elements of spreadsheets are inserted into a default structure of protocol, study events, forms and item groups in ODM. An automatic recognition of the patient identifier variable in SPSS is not possible. Due to the fact that a subject key must be given in ODM to identify the clinical cases, a parameter in S2O can be used to indicate the SPSS variable name that will not be converted as a separate ODM variable but set as SubjectKey to identify the record. Otherwise, if no variable is available or given, a default iterator for subject identification is placed instead.
The mapping of variables, labels, data types and value lists is possible without any detriment. Apart from statistical information, such as role, measure and missing values, the structure of research variables and SPSS data values are fully convertible into the CDISC ODM format.
Depending on the data collection scheme, spreadsheet-based solutions often contain several cases per patient for follow-up visits, which results in multiple rows of data per patient. Currently, the S2O-application is not capable to identify and handle multiple cases per patient. A dynamic list of repeating variables might be applied to include those cases into multiple repeating FormData or ItemGroupData-elements within the ClinicalData-hierarchy. A further minor weakness is the loss of date format and alignment information during the conversion process.
ODM is only able to process the XML-date format and does not store country-specific display formats.
The role of ODM in electronic data capture
According to the FDA’s Data Standards Catalog, this authority accepts Define-XML as communication format for the definition of clinical study data, which is an extension of the ODM standard [29] and currently, the FDA is performing a pilot evaluation project to identify a new standard for the electronic submission of trial data [30]. This pilot project comprises the evaluation for the applicability of the ODM-Dataset-XML standard (also an extension of the ODM format) as an alternative for the ageing 8bit SAS XPORT format.
ODM on the other hand, is a distinguished standard for exchange and archiving of clinical trial metadata as well as clinical data [10, 31]. With the aid of official CDISC extensions ODM is also capable to process and communicate trial protocol information [16]. Thus, several EDC systems accept CDISC ODM as a data modeling and exchange format, the communication of converted study-related data can be established and fosters the model-driven-architecture approach for creating the trial database. EDC systems usually fulfill the regulatory requirements such as GCP [32]. Metadata from many CRFs in ODM format are available for example in the portal of medical data models
Clinical data models
Data models in healthcare and research need to be kept interoperable for data exchange between different applications. In this regard, Legaz-García et al. have developed a mapping model between the Clinical Element Model and the openEHR Archetypes [33]. A converter for transformations between CDISC ODM and the Archetype Description Language was described previously [34]. The advantage of this approach is that the data structure is the same in both systems and captured data can easily be merged for statistical analyses. In addition, a mapping scheme for transformations between the ISO11179 standard for metadata registries and ODM was created [35]. This approach has been validated by converting all released CRFs from the NCI caDSR repository and uploading the result into the portal of Medical Data Models. In ODM it is possible to enrich medical concepts with codes of common terminologies. Semantically annotated forms allow comparison and frequency analyses if a large amount of forms is available in a structured way [36, 37]. It has also been shown, that ODM is eligible for the exchange of clinical data between different medical applications for instance electronic health record systems and EDC [38–40] systems or research platforms like i2b2 [41, 42].
Future work
The aim of a further release of the S2O converter will be the improvement of the algorithm towards the capability to handle multiple rows of values per patient from the SPSS file. Although it is rather a minor limitation, a future release of the converter should work without the SPSS internal library that requires SPSS to be installed on the computer.
An XML vendor extension of ODM could be applied to map the missing SPSS parameters such as alignment, role, missing values or measure. Then it would be possible to establish a full bidirectional conversion.