Skip to main content

Table 2 Overview of the five missing methods

From: Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET® in Japan

Brief description

Limitation/ Concern

Setting in this study

Excluded a laboratory test item from baseline covariates. (Exclusion)

 

• Supplementary evaluation targets.

Included only patients with laboratory test results (CC method)

• The method may still yield valid results only if the missing values are not related to the outcome.

• In the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement [24], it is mentioned that the main differences with the result of the CC method should be discussed when applying MI or IPW method.

 

• Supplementary evaluation targets.

Accounted for missing laboratory test results using SRI

• The method that assumes an imputation model for the distribution of missing values given the observed values and imputes the missing values with the predicted values obtained.

• Only one imputation model is created, so it does not take into account the uncertainty of lack of data.

• Imputation model: linear regression modela

• The result to be complemented is logarithmic transformation.

Accounted for missing laboratory test results using MI

• In this method, after assuming the imputation model, imputation is repeated multiple times (m) using random numbers. The parameter estimation of the outcome model is performed for m imputed datasets, and the m estimates are integrated by Rubin’s rule, etc.

• The algorithm is complicated, but it can be implemented by available statistical analysis software such as SAS.

• Uncertainty between complementary models that single regression imputation cannot handle is considered.

• The imputation model should include not only the predictors of missing values but also the covariates of the outcome model.

• It is assumed that the distribution of missing values follows a multivariate normal distribution, but there are reports that it is robust even if this assumption does not hold [15].

• Imputation methods include regression imputation and predicted mean matching.

• Since it can be implemented by an existing program, it may be implemented without sufficient consideration of the settings of variables to be included in the model.

• Imputation model: linear regression modela.

• The result to be complemented is logarithmic transformation.

• Imputation method: predicted mean matchingb

• Imputation count: 10 timesc

• Integration method: Rubin’s rule

Accounted for missing laboratory test results using the IPW method

• The method uses an estimation formula weighted by the inverse of the probability of observing the data.

• A model is assumed for the distribution of the observed probabilities given the observed values, and the complete case is weighted by the inverse of the obtained observed probabilities to estimate the outcome model parameters.

• A logistic regression model is generally used for the model for the probability of observing the data.

• Parameter estimation of outcome model becomes unstable in the presence of patients with extremely large weight.

• Model for the probability of observing the data: logistic regression modela,d

  1. Abbreviations: CC Complete case, IPW Inverse probability weighted, MCAR Missing completely at random, MI Multiple imputation, SRI Single regression imputation
  2. aIncluded patient background factors (sex, age, year of cohort entry, hospitalization, first visit, emergency care, class number of concomitant medications, complications, and concomitant medication), exposure variables, follow-up period, and event indicators. We referred to the study by White et al. [25] for the follow-up period and event indicator
  3. bPredicted mean matching is a method of randomly selecting one observed value close to the predicted value obtained by the imputation model, and it was selected because it can take into account the underestimation of the variance that can occur by regression imputation
  4. cDetermined by referring to prior research by Raebel et al. [6] that used the Mini-Sentinel Distributed Database
  5. dIncluding outcome information as covariates were referred to the study by Xu Q et al. [26]