Skip to main content

Table 3 Critical appraisal of the included study populations and discussion of contextual data quality issues in patient characteristics (see orange boxes, Fig. 1)

From: Feasibility analysis of conducting observational studies with the electronic health record

DPS study
Issues in the study population
To ensure the condition had been treated during the considered encounter, we only included subjects with a primary diagnosis of Dupuytren disease. The sensitivity of this approach may be limited insofar as some patients with another primary diagnosis also might have been eligible.
Patient characteristics DQ violation Issue
Recurrence Granularity A label for a recurrence of Dupuytren’s contracture was not available. We therefore implemented a computed phenotype by considering the diagnosis of Dupuytren as a recurrence when documented during a subsequent encounter. This phenotype may be biased in certain circumstances, as neither the diagnosis nor the procedure codes specified the affected hand. Accordingly, we could not differentiate between a recurrence and a novel affection of the other hand. Further, the patient could have already undergone surgery in a different hospital, so that his seeming first encounter may in fact represent a recurrence.
Operation Technique Completeness Granularity The actually corresponding procedure codes for the described operation techniques in the original study were not frequently used in our EHR, which instead employed different procedure codes; this suggests that documentation habits may have affected frequency estimates. We were unable to clearly ascertain which procedure codes represented treatment of conditions that had been documented via simultaneous diagnostic codes.
Simultaneous Granularity We computed the distribution of simultaneous surgeries by counting the secondary diagnoses of carpal tunnel and trigger finger. The corresponding procedure codes described surgical techniques that could be applied to treat several different hand conditions, so that we could not determine whether these diagnoses had been treated during the encounter.
Complications Currency Granularity Counting complications would require interpretations of plausible temporal and causal relationships, which we were not always able to infer from observable codes. When a subject had received more than one intervention during an encounter, for example, it was difficult to determine which of the corresponding clinical events happened first and caused each other.
Lifestyle Completeness
Information on lifestyle features such as alcohol and smoking was not frequently coded in our EHR. The diagnostic codes did not clarify whether, for example, a patient had started or stopped smoking. We noticed an inconsistent use of these codes throughout patient encounters, which can be interpreted as a changing diagnostic status or incompleteness (see Additional file 1, p. 15).
Affected digits Completeness
There was no diagnostic code available that specified the number of affected digits. To bypass this problem, we computed a phenotype from procedure codes that had specified the number of digits operated on. The phenotype provided incomplete numbers, which may point to low utilization of the included procedure codes in our EHR.
DG study
Issues in the study population
Eligibility criteria DQ violation Issue
Esophageal stenosis Completeness We noticed an inconsistent coding of esophagus stenosis. Adding the diagnostic code for esophagus stenosis to the eligibility criteria as described in the publication would decrease the study population size.
Patient characteristics DQ violation Issue
Indication Granularity
In some instances, we were unable to determine and count the indications for bougination or dilatation, as the etiology of the treated esophagus stenosis formation was occasionally not well reflected in the codes. Some patients had multiple diagnoses that could be considered as a potential indication. According to free-text notes, some subjects developed a postsurgical or postradiation esophagus stenosis after tumor treatment, but had only a tumor diagnosis coded.
Perforation: Occurrence, clinic, management Currency
The computed number of intervention-associated perforations might be subject to bias, insofar as we could not retrace whether the endoscopic procedure carried out on the esophagus preceded the complication, or whether the coded perforation was a remnant of earlier medical history. Further, we could not determine the subsequent management of perforations and the associated clinical picture, as the diagnostic codes and their time stamps did not permit us to extrapolate causal chains and temporal sequences.
Death Granularity We had data on inpatient death dates. Information on the cause of death was not available.
DRO study
Issues in the study population
We filtered for patients who received the exact same chemotherapy protocol, varying by a maximum of +/− 2 days, which excludes subjects with a delayed administration or subjects with additional chemotherapy after the treatment. Further, since the disease stage and tumor histology type was not reflected in the codes, we could not tell if the radiochemotherapy in our recruited study population had been administrated for the primary tumor, distant metastasis, or for a different tumor.
Eligibility criteria DQ violation Issue
Locally advanced pancreatic cancer Granularity No specific diagnostic code was available for this tumor entity.
Radiotherapy Granularity No specific procedure code was available for the radiotherapy protocol.
Chemotherapy Granularity No specific procedure code was available for the chemotherapy protocol.
Patient characteristics DQ violation Issue
Tumor Location Correctness
We computed the distribution of tumor location distribution from coded tumor location of the first encounter. For some subjects, the tumor location remained unspecified at this point. Further, we noticed a changing tumor location in subsequent encounters (see Additional file 1, p. 17).
Diagnosed by operation/biopsy Completeness Only patient data from the university hospital in Erlangen were available for our analysis. Consequently, operations or biopsies performed at other hospitals could not be included in our distribution computation.
Death Completeness
We only had data on death dates of patients who had died in the hospital. Information on the cause of death was not available.
Radiochemotherapy: Cause of adjustment Granularity
The cause of radiochemotherapy adjustment required interpreting temporal and causal relationships which could not be inferred from the codes.
  1. We evaluated “correctness” if a corresponding gold standard data source was available, and “completeness” if the corresponding patient characteristic distribution had been calculated. DQ= Data quality