Skip to main content

Table 4 An example of two previous mIDA case studies annotated using ATTEST checklist

From: An ontology-based documentation of data discovery and integration process in cancer outcomes research

 

Item No

Recommendation

Page No Study (1) [20]

Page No Study (2) [29]

Objectives

 

 Background/rationale

1

Explain the scientific background and rationale for the study being reported in one or two sentences

Page 1, section “Abstract”, paragraph 1, line 1–7

Page 1, section “Abstract”, paragraph 1, line 1–4

 Prespecified hypotheses

2

State prespecified hypotheses in on or two sentences

Page 2, section “Introduction”, paragraph 3, line 1–2

N/A

Study design: data sources selection & variables selection & data integration

 

 Data source

3a

Describe the time coverage

FCDS: Page 2, section “Data source and case selection”, paragraph 1, line 2

FCDS: Page 4, section “Data sources”, paragraph 1, line 11

BRFSS: Page 2, section “Data source and case selection”, paragraph 1, line 6

BRFSS: N/A

2000 U.S. census data: Page 2, section “Data source and case selection”, paragraph 1, line 7

United States Census Bureau: Page 4, section “Data sources”, paragraph 1, line 23

 

ATSDR: N/A

County Health Ranking & Roadmaps: N/A

3b

Describe the geographic coverage

FCDS: Page 2, section “Data source and case selection”, paragraph 1, line 4–5”

FCDS: Page 4, section “Data sources”, paragraph 1, line 12–14

BRFSS: N/A

BRFSS: Page 10, section “Result”, paragraph 2, line 7–8

2000 U.S. census data: N/A

United States Census Bureau: N/A

 

ATSDR: N/A

County Health Ranking & Roadmaps: N/A

3c

Describe the sample size

FCDS: Page 2, section “Data source and case selection”, paragraph 2, line 7

FCDS: Page 4, section “Data sources”, paragraph 2, line 6–7

BRFSS: N/A

BRFSS: N/A

2000 U.S. census data: N/A

United States Census Bureau: N/A

 

ATSDR: N/A

County Health Ranking & Roadmaps: N/A

3d

Describe the demographic distribution

FCDS: Page 2, Table 1

N/A

BRFSS: N/A

2000 U.S. census data: N/A

3e

Describe the Cohort criteria

FCDS: Page 2, section “Data source and case selection”, paragraph 2, line 1–5

FCDS: Page 4, section “Data sources”, paragraph 2, line 1–6

BRFSS: N/A

BRFSS: N/A

2000 U.S. census data: N/A

United States Census Bureau: N/A

 

ATSDR: N/A

County Health Ranking & Roadmaps: N/A

3f

Describe the sources of bias

N/A

N/A

3 g

Describe the data collection approach

N/A

FCDS: N/A

BRFSS: Page 4, section “Data sources”, paragraph 2, line 6–7

United States Census Bureau: N/A

ATSDR: N/A

County Health Ranking & Roadmaps: N/A

 Dependent variable

4a

State the variable definition and variable type (e.g., primary outcome variable, secondary outcome variable)

Survival time: Page 2, section “Variable definitions”, line 1–3

Cancer survival: Page 4, section “Data integration use case: The multi-level integrative data analysis of Cancer survival”, paragraph 1, line 1–2

4b

State the data source of dependent variable

Survival time: Page 2, section “Data source and case selection”, paragraph 1, line 2

Cancer survival: Page 4, section “Data sources”, paragraph 1, line 9–14

4c

State the data type (e.g., numerical, categorical, date-time) of dependent variable

Survival time: Page 2, section “Variable definitions”, paragraph 1, line 1

Cancer survival: N/A

4d

State descriptive statistics (e.g., min, max. Median, value range, percentile) of dependent variable

Survival time: Page 4, Table 1

Cancer survival: N/A

4e

State the NIMHD domain and levels of dependent variable

Survival time: Page 2, section “Data source and case selection”, paragraph 1, line 1–2

Cancer survival: Page 4, section “Data sources”, paragraph 2, line 15

 Independent variable

5a

State the variable definition and variable type (e.g., primary predictor, secondary predictor)

Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 1–2

Demographic variables: Page 5, Table 1

Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 1–2

Smoking status: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2, line 13–27

Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6

Marital status: Page 14, section “Type 4: Queries that generate results based on the knowledge encoded in ontology”, paragraph 2, line 7–10

 

Insurance payer: Page 5, Table 1

Residency: Page 5, Table 1

Age at diagnosis: Page 5, Table 1

Year of diagnosis: Page 5, Table 1

Tumor stage: Page 5, Table 1

Tumor type: Page 5, Table 1

Treatment procedure: Page 5, Table 1

Census Tract SVI: Page 14, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 5–16

Census tract high school completion rates: Page 5, Table 1

Census tract family poverty rates: Page 5, Table 1

Census tract rurality status: Page 4, section “Data integration use case: The multi-level integrative data analysis of Cancer survival”, paragraph 1, line 8–11

County adult mental and physical health status: Page 5, Table 1

County density of primary care physicians: Page 5, Table 1

County smoking rate: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2

County alcohol consumption rate: Page 5, Table 1

5b

State the data type (e.g., numerical, categorical) of independent variable

Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 9–10

Demographic variables: N/A

Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 2–3

Smoking status: Page 13, Table 3

Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6

Marital status: Page 14, section “Type 4: Queries that generate results based on the knowledge encoded in ontology”, paragraph 2, line 7–10

 

Insurance payer: N/A

Residency: N/A

Age at diagnosis: Page 16, Fig. 6

Year of diagnosis: Page 16, Fig. 6

Tumor stage: N/A

Tumor type: Page 4, section “Data sources”, paragraph 2, line 1–6

Treatment procedure: Page 5, Table 1

Census Tract SVI: Page 14, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 5–16

Census tract high school completion rates: N/A

Census tract family poverty rates: N/A

Census tract rurality status: N/A

County adult mental and physical health status: N/A

County density of primary care physicians: N/A

County smoking rate: Page 10, section “The ontology for Cancer research variables (OCRV)”, paragraph 2

County alcohol consumption rate: N/A

5c

State the data source of independent variable

Socioeconomic status: Page 2, section “Data source and case selection”, paragraph 1, line 6–7

Page 5, Table 1

Individual smoking: Page 2, section “Data source and case selection”, paragraph 1, line 1–2

Regional smoking: Page 2, section “Data source and case selection”, paragraph 1, line 7–10

5d

State descriptive statistics (e.g., min, max. Median, value range, percentile) of independent variable

Page 4, Table 1

N/A

5e

State the NIMHD domain and levels of independent variable

Socioeconomic status: Page 2, section “Data source and case selection”, paragraph 1, line 6

Page 5, Table 1

Individual smoking: Page 2, section “Data source and case selection”, paragraph 2, line 1

Regional smoking: Page 3, section “Data source and case selection”, paragraph 2, line 4–6

 Controlled variable

6a

State the controlled variable and variable type (e.g., numerical, categorical) of controlled variable

Age of diagnosis: Page 2, section “Variable definitions”, paragraph 1, line 10–13

N/A

Anatomic site: Page 2, section “Variable definitions”, paragraph 1, line 2–9

Race-ethnicity: Page 4, Table 1

Marital status: Page 4, Table 1

Insurance: Page 4, Table 1

Year of diagnosis: Page 4, Table 1

Gender: Page 4, Table 1

Stage of diagnosis: Page 4, Table 1

Treatment: Page 4, Table 1

6b

State the data source of controlled variable

Page 2, section “Data source and case selection”, paragraph 1, line 2a

N/A

6c

State descriptive statistics (e.g., min, max. Median, value range, percentile) of controlled variable

Page 2, section “Data source and case selection”, paragraph 1, line 2a

N/A

6d

State the NIMHD domain and levels of controlled variable

Page 2, section “Data source and case selection”, paragraph 1, line 1–5a

N/A

 Missing data

7a

For each data source, describe whether required or expected variable that is not present

N/A

N/A

7b

For each variable, describe method of how to handle missing data

N/A

N/A

7c

For each variable, describe the missing rate

N/A

N/A

 Data processing

9a

Data extraction: for each variable, describe how to process the raw data source to extract the variable

N/A

Demographic variables: Page 15, Fig. 5

Age at diagnosis: Page 16, Fig. 6

Census Tract SVI: Page 16, Fig. 7

County smoking rate: Page 17, Fig. 8

Marital status: Page 18, Fig. 9

9b

Data cleaning: for each variable, describe the method used to detect and correct (or remove) the incorrect records, missing values or outliers

N/A

N/A

 Integration strategy

10

Describe the integration strategy for each variable:1) Integrate with variables from same level, 2) Integrate with variables from different levels, and 3) Creation of additional computed elements

Socioeconomic status: Page 2, section “Variable definitions”, paragraph 3, line 6–7.

Demographic variables: Page 15, Fig. 5

Regional smoking: Page 2, section “Variable definitions”, paragraph 2, line 4–5.

Age at diagnosis: Page 16, Fig. 6

 

Census Tract SVI: Page 16, Fig. 7

County smoking rate: Page 17, Fig. 8

Marital status: Page 18, Fig. 9

Census tract high school completion rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

Census tract family poverty rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

Census tract rurality status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County adult mental and physical health status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County density of primary care physicians: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County alcohol consumption rate: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

 Integration algorithms

11

For each variable, describe the algorithm used to integrate it with variables from other data sources

N/A

Demographic variables: Page 15, Fig. 5

Age at diagnosis: Page 16, Fig. 6

Census Tract SVI: Page 16, Fig. 7

County smoking rate: Page 17, Fig. 8

Marital status: Page 18, Fig. 9

Census tract high school completion rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

Census tract family poverty rates: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

Census tract rurality status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County adult mental and physical health status: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County density of primary care physicians: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

County alcohol consumption rate: Page 15, section “Type 3: Queries that are used to link a patient to contextual factors through geographic variables”, paragraph 1, line 1–3

 Variable validation

12

For each variable, describe data validation rule for the selected variable. Rule should identify both the variable and the validation algorithms

N/A

Demographic variables: Page 19, section “Data quality and consistency checks of the source data using the ontology”

 Integrated variable

13

Describe the variable after integration and basic descriptive statistics (e.g., min, max. Median, value range, percentile)

N/A

Page 18, Table 4

  1. FCDS Florida Cancer Data System
  2. ATSDR Agency for Toxic Substances& Disease Registry
  3. BRFSS behavioral risk factor surveillance system
  4. aIf the reported items for all variables or data sources are described at the same place, you can list the page/section/table information at once. For the integration related items, we only presented variables that have the information (N/A will not be showed in the table)