Skip to main content

Table 1 Candidate design choices for dealing with loss to follow-up

From: An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design

Design choice Pros Cons
1: Binary classification model using data that exclude all patients lost to follow-up [12, 13] (e.g., exclude any patient not observed for the full time-at-risk) The labels are correct as we observed all the patients in the training data for the complete time-at-risk follow-up We reduce the size of the training data (the longer the time-at-risk, the smaller the dataset)
If the health outcome is often fatal, then we may exclude all or the majority of the patients who have the health outcome
May limit model generalizability to only those who are healthy
2: Binary classification model using data that include all patients (including those lost to follow-up) [14] (e.g., include every patient in the cohort. A patient not observed for the full time-at-risk is included but their outcome is determined based on whether they experienced the outcome during the observed time-at-risk) We do not compromise generalizability Labels may be incorrect for those who are lost to follow-up (this noise may impact the model’s ability to learn)
Larger sample size
3: Binary classification model using data that exclude patients lost to follow-up unless they have the outcome prior to loss to follow-up [15] (e.g., only exclude patients not observed for the full time-at-risk if they did not have the outcome during the observed time-at-risk. This means patients with a partial time-at-risk who have the outcome during this time are still included) The labels are correct Generalizability may be compromised
We include all outcomes Outcome patients may be sicker as we can include those who die within time-at-risk but this is not possible for non-outcomes
Do not lose outcomes when outcome is associated to death
4: Cox model using data that includes all patients (including those lost to follow-up) [16] (e.g., include every patient, even those not observed for the full time-at-risk. The survival time is the minimum of time to end of observation, time to outcome or time-at-risk end (time to study period end from cohort index) Method suitable for censored patients Not intended for risk prediction, the main purpose is hazard rate calculation per predictor. Requires baseline hazard function for prediction
Predict survival time (time before event) rather than risk of event
Computationally more expensive
\