The prevalence of VTE has significantly increased across all age groups of hospitalized children, which has been attributed, in part, to the widely used CVCs in this population . Given this critical and growing problem, several important pediatric organizations have developed initiatives to prevent VTE. However, a study showed that current screening guidelines for VTE risk in hospitalized children have low sensitivity (61%; 95% CI 51–70%) for identifying patients at increased risk of both CVC-associated and other VTE events . It was also confirmed by the traditional LR model in this study that recall, which is the same as sensitivity, was approximately 61%. This means that approximately 40% of CADVT events will not be predicted. Traditional risk models are inadequate for this complex problem.
In this retrospective analysis, we evaluated the performance of different machine learning models to predict CADVT before it occurred at 3 time points. As a complicated task, the 3 machine learning models that only use static data did not achieve the desired predictive performance. A multimodal deep learning model called the MMDL, which can handle temporal data, exhibited an improved performance. Inspired by this finding, we believe that time-series dynamic data contain much more clinical information than static data or dynamic data-based statistics. For these reasons, we propose a new multimodal deep learning model that can provide deeper insight and learn shared latent representations for prediction tasks from both static and temporal dynamic data. The proposed model with an AUC > 0.82 can meet most needs of clinical applications. It allows clinicians to predict CADVT 3 days in advance. The ability to predict CADVT may allow patients to benefit from thromboprophylaxis or close surveillance . The proposed deep learning models have the potential to be used as decision support tools for thromboprophylaxis. Although it is not too difficult to develop a machine learning model, data integration can be a challenge in practice for such a model that requires a large number of data features, especially dynamic data features. Prior to implementing predictive models in novel settings, analyses of calibration remain as important as discrimination, but they are not frequently discussed . As many studies shown, a highly discriminative classifier (e.g., a classifier with a larger area under ROC curve) including widely used logistic regression model and several machine learning approaches such as Naïve Bayes, decision trees, and artificial neural networks all may not be well-calibrated [22, 23]. The calibration reported in this study also show many prediction models were not well calibrated.
The advantage of deep learning models is that they can receive a large number of data features at the same time and learn from them to obtain implicit correlations to serve complex prediction problems. In this study, the static data contained 143 features after one-hot encoding, and the dynamic data were in an n * 2 * 56 size 3D matrix. However, because the deep learning model is not highly interpretable, which factors and how they contribute to CADVT still need to be studied. Based on the contribution of different features, the model can be optimized and compressed to using fewer data input and with simpler network structure in practice. Furthermore, explainability is more important than accuracy, as it will identify which modified risk factor contributes to CADVT and what kind of measures could help to change the situation and determine a patient's individual risk. Explainable artificial intelligence (XAI), which is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms, is also the focus of current AI research in medicine . Some interpretation methods, such as SHAP , have been introduced to explain the output of machine learning models by computing each feature for the prediction. However, such an explainable framework works well on static features but not on latent temporal features.
Several studies have shown that different diseases have different risks of CADVT . In this study, we found that patients with intracranial space occupying lesions have a particularly high risk of CADVT (OR 14.2 p value < 0.001 compared with CHD patients). Further analysis showed the dehydration agent, such as mannitol, glycerol fructose, and furosemide etc., that is widely used to reduce brain swelling and intracranial pressure, maybe contribute to the CADVT. For these reasons, intracranial occupying lesions are used independently as a feature. In addition, only the primary discharge diagnosis was used to label patient. It should include more diseases information in the future study. Therefore, for machine learning technology to be used to predict CADVT, patients in the training dataset should be collected from similar hospitals or the same hospital of the practice hospital due to the disease spectrum. Different hospitals with different populations with different diseases may require different models. Machine learning models trained on datasets from multiple centers should also consider hospitals as features. Furthermore, if the prediction risk probability were directly provided to clinician for clinical decision support, applying calibration model to estimates is needed.
This study had several limitations. First, as the temporal data were organized in fixed 12-h time windows, all the original temporal features of different clinical data were not retained. Furthermore, a more flexible learning model that could accept dynamic data at different time intervals should be developed. A scheme which uses forward well-defined index time from the catheter insertion was also suggested for future study. Second, as the proposed deep learning model was evaluated only in a single center with retrospective data in a case–control design, a larger evaluation using a cohort design is needed to demonstrate its broad applicability. The most important limitation of such a complicated multimodal deep learning model is its clinical explainability and its different calibration in different populations.