Patients
All patients who were enrolled in the bariatric surgery program in the Center for Nutrition and Weight Management at Geisinger Clinic were offered participation in an ongoing research program in obesity using clinical data accessed through the electronic health record that was approved by the Geisinger Clinic Institutional Review Board. For this study, a total of 2028 patients who underwent RYGB gastric bypass surgery from 01/01/2004 through 07/02/2010 were included in the database. The bariatric surgery program consisted of a 6 to 12 month pre-operative assessment and preparation period that included a diet-induced weight loss target of 10% of body weight. Patients were followed at approximately 1, 3, 5, and 12 months following RYGB surgery and every 12 months thereafter. All clinical data were entered into the EpicCare® EHR (Verona, WI). The EpicCare® EHR integrates information from a variety of sources into a common interoperable database that includes patient demographics, vitals, clinical measures, problem list (based on ICD-9 codes), medical history, medication history, personal and family histories, encounters (e.g. office visits, hospitalizations, nurse encounters, telephone inquiries and specialty consultations), orders (e.g. labs, medications, imaging and procedures), appointments, digital imaging (e.g. MRI, CT, X-ray, medical photography), results (e.g. procedure reports, lab results, pathology reports), and billing and claims databases (detailed financial transactions associated with each clinical encounter). All data except laboratory results, which were fed directly to the EHR by the laboratory information system, were entered at the point-of-care including age, sex, height, and weight, lifestyle factors (e.g., smoking, alcohol, etc.), clinical measures (e.g., blood pressure), all orders (i.e., lab requests, prescriptions, imaging, and procedures) which require at least one indication (i.e., ICD-9 code), active use of all medications, and all co-morbidities. The schema for data acquisition is shown in Figure 1.
EHR and data warehouse
Data were extracted from an EHR-fed comprehensive enterprise-level data warehouse, the Clinical Decision Intelligence System (CDIS), which is partially comprised of the EHR. EpicCare EHR modules feeding data to CDIS included ambulatory, inpatient, surgery, emergency department, e-prescribing, computerized physician order entry, pharmacy, registration, scheduling, and reporting, thus data derived from clinical care provided not only at the Weight Management Clinic was included but also from other sites within the Geisinger Health System, including over 40 primary care sites and other specialty clinics. Other source systems, including financial decision support, insurance claims, patient satisfaction, and high-use third-party reference datasets were also deposited into CDIS.
CDIS was built on the IBM InfoSphere (DB2®) Warehouse 9 platform. The input data were extracted from several sources, predominantly the EpicCare EHR, and were transformed by selecting as needed by data type, and then loaded into the warehouse software by overwriting existing data with cumulative information. Clarity, an Oracle relational database with a data schema consisting of several thousand tables was used to report data out of EPIC using the extract-transform-load process. Extracts from Clarity consisted of preselected tables and data elements. These selected tables were transferred and loaded into CDIS using the IBM DB2 database software. The load into CDIS was then a data replication.
An analytics engine was built around the IBM Balanced Warehouse® for AIX® technology that enabled data mining, text analytics, reporting, and data analysis. In addition to the IBM InfoSphere Warehouse Information Server software and analytics engine, a reporting tool from IBM Business Partner Business Objects was used to provide user-friendly query and analysis, and data integration interface. The data in CDIS was stored as a relational database at the most granular level to allow for an effectively unlimited number of reporting, analysis, and application outputs. A copy of the CDIS data warehouse was stored on a separate server that allowed for direct data extraction into SAS using a Microsoft (Bellevue, WA) Open Database Connectivity (ODBC) interface. For example, the RYGB research datasets are refreshed on a regular basis (i.e. minimum every 3 months or sooner as needed). CDIS is updated on a nightly basis from EpicCare and less frequently from the other source systems (e.g., monthly).
Data extracted
The extracted data for the RYGB database study include the following:
-
Demographics: date of birth, gender, race, death status, and death date (one row per patient)
-
Problem list: current and historical list of medical co-morbidities maintained and entered by treating providers (one row per diagnosis code per unit time)
-
Outpatient office visit encounters: date of encounter, diagnoses assigned on that encounter, clinic that encounter took place, measurements taken at that encounter (e.g. weight, height, blood pressure, pulse, temperature) (one row per encounter)
-
Inpatient admissions: admit and discharge dates, diagnosis codes, and name of admitting clinic (one row per admission).
-
Medication list: current and historical list of active medications maintained and entered by treating providers (one row per diagnosis code per unit time)
-
Medication prescription orders: date, name of medication, associated diagnosis for medication order, name of clinic that ordered medication (one row per medication order)
-
Procedures: all inpatient and outpatient medical procedures including date, procedure code, and name of clinic that conducted procedure (one row per procedure)
-
Laboratory test results: all laboratory results including dates, lab type, lab name, and lab resulted value (one row per laboratory result)
-
Social history: historical and current alcohol and smoking history that are maintained and entered by treating providers (one row for each change in status over time)
-
RYGB flowsheets: data collection tools that are entered for specific encounter types (e.g. RYGB surgical evaluation visit) that included but are not limited to dietician evaluation status, resting energy expenditure, surgeon, and weight loss goals
-
Surveys: item responses to each survey question and date of survey completion
Most of the data were obtained as coded fields except for waist circumference, dietician evaluation status, psychologist evaluation status, resting energy expenditure, surgeon, tobacco and alcohol use at time of pre-surgical visit, and weight loss goals. These were free text but were recorded with pre-defined structures/guidelines that enabled consistent retrieval of the data. The extracted data were stored as SAS dataset files. Supporting data were gathered through departmental tracking databases (including landmark dates such as date of consent, date of initial visit, date of surgery, etc.) and through chart review (e.g. to validate the date and type of surgery).
Survey data acquisition
At each new patient visit, the following surveys were obtained: Beck Depression Inventory (BDI) [10], Family Emotional Involvement and Criticism Scale (FEICS) [11], Impact Of Weight On Quality Of Life-Lite (IWQOL) [12], Weight Loss Readiness Test [13], Sleep Scale for Medical Outcomes [14], Work Limitations [15], and Questionnaire on Eating and Weight Patterns (QEWP) [16]. The Beck Depression Inventory and the Impact Of Weight On Quality of Life-Lite were initiated near the start of the program (i.e. patients with surgeries occurring since 10/01/2004) and were also administered during the post-operative period. The remaining surveys were administered to patients since 01/10/2006 but only in the pre-operative period.
Surveys were self-administered using one of two collection methods. The most common method was a paper survey formatted for electronic scanning using optical character recognition (OCR). These surveys were collected, batched for scanning using Kofax Capture software (Irvine, CA) and OCR processing using Sungard version 5.0 (Birmingham, AL), exported into delimited text files, and stored in SAS (Statistical Analysis System version 9.2, Cary, NC) datasets. The second method was an internet based EHR patient portal, MyGeisinger, allowing some patients to complete the surveys on-line. These on-line surveys were stored within the EHR and were available in the data warehouse described below. All survey responses were collected, imported into SAS, scored per validated algorithms (when applicable), and stored in SAS datasets.
Dataset creation and cleaning
Datasets containing all of the clinical and survey data were created in SAS format, each containing a patient identifier used to link patient data between sources. SAS was used to manipulate, clean, and merge data for summary descriptive information and complex statistical analyses. Data cleaning algorithms were created to identify obvious errors and implausible values. An example of the algorithm used for weight and BMI is described below.
Clinically implausible values were identified and flagged for removal (e.g. weight <50 lbs. or weight >1000 lbs.). To identify other obvious data entry errors, i.e. those that fell within clinically acceptable ranges but were likely erroneous when compared to the patient’s other weight measures, a series of simple linear regression models (pre-surgery, 0–1 year post RYGB surgery, and 1+ years post RYGB surgery) were used. Due to the large number of weight measures for each patient, the linear models were run independently for each subject, and the distribution of residuals across all subjects was evaluated. Weight measures with residuals in the extreme tails of the distribution, i.e., +/− 5 standard deviations were flagged for consideration for exclusion. The flagged values were then manually evaluated and data entry errors were removed as indicated. BMI was calculated for all weight measures using the height recorded at the initial Weight management clinic visit. The presence of clinically implausible BMI values was investigated (e.g. BMI < 15 kg/m2 or BMI > 100 kg/m2) but none were found due to the initial weight cleaning algorithm.
Statistical analysis
A descriptive summary was completed using means, standard deviation, and percentages, as appropriate. Kaplan-Meier curves were used to evaluate time until loss of follow-up. Patients were defined as lost to follow-up when they had a 24 month period without a weight measurement in the EHR. The date of surgery was used as the anchor point for the Kaplan Meier analysis. For those in active follow-up, the censor time was calculated by using date of last BMI measurement. The Kaplan Meier curves were stratified by initial BMI (35.0–39.9, 40.0–49.9, and 50.0+ kg/m2). The log rank test was used to evaluate whether length of follow-up was associated with initial BMI, year of surgery, surgeon, or surgical approach (open versus laparoscopic).