Design choice | Pros | Cons |
---|---|---|
1: Binary classification model using data that exclude all patients lost to follow-up [12, 13] (e.g., exclude any patient not observed for the full time-at-risk) | The labels are correct as we observed all the patients in the training data for the complete time-at-risk follow-up | We reduce the size of the training data (the longer the time-at-risk, the smaller the dataset) |
If the health outcome is often fatal, then we may exclude all or the majority of the patients who have the health outcome | ||
May limit model generalizability to only those who are healthy | ||
2: Binary classification model using data that include all patients (including those lost to follow-up) [14] (e.g., include every patient in the cohort. A patient not observed for the full time-at-risk is included but their outcome is determined based on whether they experienced the outcome during the observed time-at-risk) | We do not compromise generalizability | Labels may be incorrect for those who are lost to follow-up (this noise may impact the model’s ability to learn) |
Larger sample size | ||
3: Binary classification model using data that exclude patients lost to follow-up unless they have the outcome prior to loss to follow-up [15] (e.g., only exclude patients not observed for the full time-at-risk if they did not have the outcome during the observed time-at-risk. This means patients with a partial time-at-risk who have the outcome during this time are still included) | The labels are correct | Generalizability may be compromised |
We include all outcomes | Outcome patients may be sicker as we can include those who die within time-at-risk but this is not possible for non-outcomes | |
Do not lose outcomes when outcome is associated to death | ||
4: Cox model using data that includes all patients (including those lost to follow-up) [16] (e.g., include every patient, even those not observed for the full time-at-risk. The survival time is the minimum of time to end of observation, time to outcome or time-at-risk end (time to study period end from cohort index) | Method suitable for censored patients | Not intended for risk prediction, the main purpose is hazard rate calculation per predictor. Requires baseline hazard function for prediction |
Predict survival time (time before event) rather than risk of event | ||
Computationally more expensive |