Skip to main content

Table 1 Counts of vagueness and under-specification in narrative phenotype algorithms

From: Under-specification as the source of ambiguity and vagueness in narrative phenotype algorithm definitions

Code

Category

Sub-category

Description

Total instances

Phenotype count (%)

1.1

Definition of variable

Attributes of variable

Under-specification in attributes (min, max, etc.) of a variable

47

13 (68.4%)

1.1.1.a

Time point

Temporal entity

Under-specification of the time anchor or point of reference for a certain criterion

22

11 (57.9%)

1.1.1.b

Time point

Temporal interval

Under-specification of the range of time you are looking at to find a certain criteria (diagnosis, medication, lab, etc.)

6

5 (26.3%)

1.1.2.a

Threshold

Missing threshold

Vagueness or under-specification for a criterion in the phenotype algorithm

2

2 (10.5%)

1.1.2.b

Threshold

Quantifying qualitative terms

Vagueness or under-specification in the qualitative term describing a criterion (e.g., chronic, young, old, severe, negative, positive) and lacking quantitative values

1

1 (5.3%)

1.1.2.c

Threshold

Units

The units associated with the numeric value (e.g., mg/dL) are not specified

2

1 (5.3%)

1.2

Definition of variable

Alternatives to missing data

Request for instructions when data elements not available

6

5 (26.3%)

1.3

Definition of variable

Code/acronym/term definition

Under-specification regarding acronyms, variables or codes. This could be related to:

1. Local and unique codes

2. Coding/terminology system (including use of base codes)

3. Vague terminology/codes

28

11 (57.9%)

1.4

Definition of variable

Location in EHR

Under-specification regarding how or where certain criteria/variables should be obtained within the EHR

10

6 (31.6%)

2.1

Data dictionary

Data delivery

Under-specification regarding how the data dictionaries should be structured and how to be delivered to site

3

2 (10.5%)

2.2

Data dictionary

Information inclusion

Under-specification regarding what results should be included in the data dictionary

31

10 (52.6%)

2.3

Data dictionary

Results presentation and formatting

Under-specification regarding the formatting of the results in the data dictionary. This may include numeric formatting (e.g., number of decimal places), or granularity of units (e.g., date of birth vs. age)

27

8 (42.1%)

3.1

Logic

Discordant logic

Discrepancy between the written description and the flow chart or the procedures in the flowchart

17

8 (42.1%)

3.2

Logic

Missing rationale or context

Under-specification in the rationale and/or context of the phenotype for its appropriate application

11

8 (42.1%)

3.3

Logic

Population criteria

Vagueness and under-specification in the criteria differences between the case and control or other cohort definitions

20

11 (57.9%)

  1. A total of 304 instances were found across 253 comments (a single comment could exhibit more than one category). Sub-codes are more specific and considered distinct from a higher-level code. Total instances denote the aggregate count of unique instances of under-specifications found across all phenotypes