Skip to main content

Table 1 Variation Assessment Table for Data Abstraction

From: Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction

Variation Type


Potential Implication

Example of Assessment Method

Institutional variation

Variation in practice patterns, outcomes, and patient sociodemographic characteristics

Inconsistent phenotype definition; unbalanced concept distribution

• Compare clinical guideline, protocol, and definition

• Calculate the number of eligible patients divided by screening population

• Calculate the ratio of the proportion of the persons with the disease over the proportion with the exposure

EHR system variation

Variation in data type and format caused by different EHR system infrastructure

Inconsistent data type; different data collection processes

• Compare data type, document structure, and metadata

• Conduct a semi-structured interview to obtain information about the context of use

Documentation variation

Variation in reporting schemes during the processes of generating clinical narratives

Noisy data

• Compare the cosine similarity between two documents represented by vectors

• Conduct a sub-language analysis to assess syntactic variation

Process variation

Variation in data collection and corpus annotation process

Poor data reliability, validity, and reproducibility

• Calculate the degree of agreement among abstractors

• Conduct a semi-structured interview to obtain information about the context of use