Description of the Pulmonary Function Test Data Mart
The PFT data mart contained 43,364 PFTs as of 6/26/2012 and has 107 measures for each PFT (Table 1). Ninety-six data elements are discrete numeric measurements. Eleven data elements are non-numeric fields such as name, patient type (inpatient/outpatient), and date. Six columns are keys to other tables in the Medical Enterprise Data Warehouse to facilitate integration with other data marts such as the master patient index, the encounter data mart, and orders. For instance, the encntr_id field links to the main encounter data mart where admit, discharge, and some billing information is stored.
Patients with systemic sclerosis/scleroderma, a rare connective tissue disease that causes skin and internal organ fibrosis especially in the lungs, vascular disease, and autoantibody production, undergo PFTs to screen for the development and progression of lung disease. Scleroderma predominately affects middle-aged women, and lung disease is the leading cause of death . As a result, aggregate PFT data are required for many scleroderma translational research projects and provide a measure of disease progression in patients with scleroderma. In the past, scleroderma research assistants printed PFT reports and manually entered data into data capture tools such as spreadsheets. Because this process was time consuming, ecologically unfriendly and error prone, a new system was developed in conjunction with Medical Enterprise Data Warehouse data architects to generate a PFT data mart within the Warehouse that can readily be queried and maintained through automated processes.
To validate the integrity of data within the PFT data mart, 100 subjects from a cohort >500 patients in the Northwestern Scleroderma Registry were randomly selected. All participants met the American College of Rheumatology criteria for scleroderma and had consented to partake in medical research . One PFT per participant was chosen, with the test performed closest to the date of consent to the registry selected for inclusion in the analysis (12/2004-9/2010). Eleven data elements (forced vital capacity and forced vital capacity% predicted from spirometry tests; total lung capacity, total lung capacity% predicted from lung volume tests; and diffusion capacity for carbon monoxide and diffusion capacity for carbon monoxide% predicted from diffusion tests, as well as medical record number, height, weight, gender, and exam date) were selected for each research participant (Additional file 1: Figure S1 and Table 1). Medical record number, and exam date were required in order to conduct the validation study with manual chart review. Clinically relevant variables included three anthropometric variables (height, weight and gender) that directly influence pulmonary function, as well as six clinically relevant discrete pulmonary function parameters (forced vital capacity, forced vital capacity% predicted; total lung capacity, total lung capacity% predicted, diffusion capacity for carbon monoxide and diffusion capacity for carbon monoxide% predicted.
For the manual chart abstraction, a scleroderma research assistant printed the PFT report and manually entered relevant data into a Research Electronic Data Capture (REDCap) database housed at Northwestern . To ensure an error rate < 5%, the same research assistant reentered the data into the database one week after the initial entry. From the automated data mart, Medical Enterprise Data Warehouse report writers generated a PFT report for the same patient cohort that was securely delivered using SQL Server Reporting Services. Researchers downloaded these data through the Medical Enterprise Data Warehouse intranet web portal in a statistics package friendly format (comma separated values). Statistical analyses to determine the correlation between data obtained using the automated and manual abstraction methods were conducted using STATA version 10.1 (College Station, TX).