- Open Access
Factors affecting the performance of brain arteriovenous malformation rupture prediction models
BMC Medical Informatics and Decision Making volume 21, Article number: 142 (2021)
In many cases, both the rupture rate of cerebral arteriovenous malformation (bAVM) in patients and the risk of endovascular or surgical treatment (when radiosurgery is not appropriate) are not low, it is important to assess the risk of rupture more cautiously before treatment. Based on the current high-risk predictors and clinical data, different sample sizes, sampling times and algorithms were used to build prediction models for the risk of hemorrhage in bAVM, and the accuracy and stability of the models were investigated. Our purpose was to remind researchers that there may be some pitfalls in developing similar prediction models.
The clinical data of 353 patients with bAVMs were collected. During the creation of prediction models for bAVM rupture, we changed the ratio of the training dataset to the test dataset, increased the number of sampling times, and built models for predicting bAVM rupture by the logistic regression (LR) algorithm and random forest (RF) algorithm. The area under the curve (AUC) was used to evaluate the predictive performances of those models.
The performances of the prediction models built by both algorithms were not ideal (AUCs: 0.7 or less). The AUCs from the models built by the LR algorithm with different sample sizes were better than those built by the RF algorithm (0.70 vs 0.68, p < 0.001). The standard deviations (SDs) of the AUCs from both prediction models with different sample sizes displayed wide ranges (max range > 0.1).
Based on the current risk predictors, it may be difficult to build a stable and accurate prediction model for the hemorrhagic risk of bAVMs. Compared with sample size and algorithms, meaningful predictors are more important in establishing an accurate and stable prediction model.
Brain arteriovenous malformation (bAVM) is a cerebrovascular disease characterized by direct shunts between arteries and veins and abnormal vascular masses . The main presenting clinical symptoms are hemorrhage and epilepsy. Because of the high mortality and disability associated with bAVMs rupture in many cases, particularly how to prevent and treat rupture, is always the focus of research. However, whether to intervene when bAVMs occur is still controversial [2,3,4]. Sometimes both the rupture rate of bAVMs in patients and the risk of endovascular or surgical treatment(when radiosurgery is not appropriate) are not low, it is important to assess the risk of rupture more cautiously before treatment.
The common method of developing a prediction model or a scoring system for disease risk is to build a mathematical model based on correlated clinical predictors. For binary category data, multivariate logistic regression (LR) is the conventional algorithm [5, 6]. With the development of computational algorithms, different machine learning methods have been introduced into this field . Of them, random forest (RF) is considered to be a promising method. Previous studies on predicting the risk of diseases have reported many successful cases in which RF was applied [8, 9].
In this study, we collected the clinical data of 353 patients with bAVMs and built prediction models by the LR algorithm and RF algorithm based on multiple random samplings and different training sample sizes, and areas under the curve (AUCs) were used to assess the performances of the models. The purpose of our study is to test and compare the stability and performances of prediction models built by both algorithms and to investigate the deficiencies in these prediction models.
Case selection and data collection
All patients with bAVMs confirmed by digital subtraction angiography (DSA) from January 2013 to December 2019 were enrolled in our study. Patients with the following conditions were excluded: 1) a combination with brain injury or brain tumors; and 2) incomplete clinical data. Variables that were reported to be correlated with bAVM rupture in previous studies were collected [1, 6, 10]. General variables including age and sex were collected, and morphological variables pertaining to the bAVMs were separately measured on DSA images by 2 neurosurgeons (Wengui Tao and Laochao Yan), including the location, size, associated aneurysm, draining type, and number of draining veins. Other variables, including rupture information, were recorded.
All procedures in this retrospective study that involved human participants were approved by the ethical committee of Xiangya hospital and performed in accordance with the institutional ethical standards, the 1964 Helsinki declaration and its later amendments, or comparable ethical standards.
Building prediction models by the LR algorithm and RF algorithm based on multiple repeated samplings and different sample sizes
RStudio (version 1.1.383; RStudio Inc.) was used to build the prediction models. Variables including sex, location, correlated aneurysm, draining type, and rupture were set as factor (categorical) variables, and variables including age, size, and the number of draining veins were set as numeric (continuous) variables. Rupture was set as the dependent (response) variable, and the other 7 variables were set as independent (explanatory) variables. In the LR algorithm, the independent variables were filtered by the step method, and significant variables were finally used for the predicting formula. In the RF algorithm, default values were set for the "ntree" and "mtry" parameters (500 and 3).
According to the 10 events per variable (EPV) rule [11,12,13], we sampled different sizes of training datasets from all 353 cases each time, and the remaining cases were defined as test datasets. The sample sizes of the training datasets were 140, 175, 210, 245 and 280, and the corresponding test datasets were 213, 178, 143, 108 and 73. For each pair of datasets, the number of random sampling times was 1, 10, 50, 100, 300, 600, 1200 and 2100.
Calculating AUCs to assess the performances of prediction models
AUCs were used to assess the performances of the prediction models. The mean ± standard deviations (SD) was used to depict the AUCs.
After the source code was confirmed, multiple samplings, building the models, predictions, calculating the AUCs and plotting were fulfilled by a computer.
Paired sample T-tests were used to compare the AUCs that resulted from the different prediction models built by the LR and RF algorithms. A p value < 0.05 was considered to be statistically significant.
The clinical data of 353 patients with ruptured and unruptured bAVMs are summarized in Table 1. Of all patients, 220 were male, and 133 were female, with a mean age of 32.82 ± 15.77 years. A total of 264 (74.8%) bAVMs were located in the cerebral lobes (superficial), 40 (11.3%) in the corpus callosum, basal ganglia or lateral ventricle (deep), and 49 (13.9%) in the cerebellum or brain stem (infratentorial). Ten (5.4%) patients had aneurysms related to bAVMs. The mean size of the bAVM nidus was 3.71 ± 2.15 cm. Seventy-four (21.0%) patients only had deep draining veins. A total of 198 (43.9%) patients only had single draining veins. BAVMs in 228 patients were confirmed to be ruptured and 125 unruptured.
*p value < 0.05: statistically significant
Univariate analysis showed that age, location, associated aneurysm, size and the number of draining veins were significantly different between patients with unruptured and ruptured bAVMs. All these variables were used in LR and RF analyses.
Performances of the prediction models
All the AUCs showed that the performances of the prediction models built by the LR algorithm were better than those built by the RF algorithm (p < 0.001), see Fig. 1 and Table 2. The AUC results showed that while the training sample size increased in the LR algorithm, the AUCs were slightly improved from 0.70 to 0.71 (> 100 sampling times). However, in the RF algorithm, the AUCs decreased. The standard deviations (SDs) of the AUCs showed a maximum fluctuation range > 0.1 in different samplings, and different single samplings also reflected unstable performances of the prediction models (see the first row of Fig. 1).
BAVMs represent an intracranial hemorrhagic disease. The annual rupture rate of bAVMs reported in various literature is different [14,15,16,17,18]. For each patient and lesion, the risk of rupture should be assessed separately. Of patients who survive after the initial hemorrhage, approximately 20% die, and one-third remain moderately disabled after 3 months . For patients with unruptured bAVMs, the psychological impacts associated with the long-term fear of hemorrhage should not be underestimated . Additionally, it is necessary to compare the risk of bAVMs rupture with that of treatment. All these showed that predicting the hemorrhagic risk was important for unruptured bAVMs. Some studies proposed predictors for hemorrhagic risk, such as female sex, deep location, deep draining veins, single draining veins, and associated aneurysm [20,21,22,23]. Depending on these predictors, some authors tried to develop prediction models or scoring systems for the hemorrhagic risk of bAVM . A successful prediction model or a scoring system would help clinical workers find suitable and low-risk management options for patients.
For binary categorical clinical data, the LR algorithm is the conventional method for building prediction models . In recent years, machine learning algorithms have been introduced in this field. The highly accurate results and simplified procedures that resulted from the introduction of these methods are impressive. Of these machine learning algorithms, the RF algorithm is considered most promising because of its better performance, especially for big data .
The common method for building a prediction model is to obtain a training dataset from the whole data by date sequence or randomly and then to build a model in the form of a predicting formula (LR) or a predicting procedure hidden in black boxes (machine learning). The remaining data are defined as the test dataset and used to test the model. The AUC is usually used to evaluate predicting performances. The training sample size of the training dataset should meet the basic request of the 10 events per variable (EPV) rule [11,12,13].
In this study, our original purpose was to try to build prediction models for predicting the risk of bAVM rupture by the LR algorithm and RF algorithm and to compare the performances of those models. However, the results were not as expected, and the models displayed instability and uncertainty. When we performed multiple random samplings for the training dataset, the coefficients of the prediction formula from the LR algorithm varied, and the AUC also displayed different values, as did the RF algorithm. To explore this problem further, we increased the number of sampling times, changed the ratio of the training sample size to the test sample size, and even changed the number of independent variables; additionally, we observed the change in AUCs and tried to identify rules. Although the AUCs were widely dispersed with varying sample sizes and random sampling times, they still displayed certain patterns. Being familiar with these patterns can help us understand the possible uncertainty and instability of prediction models, help us build optimal prediction models, and avoid pitfalls.
The independent variables (explanatory variables) used in this study have been accepted by most researchers and are considered to be risk factors for bAVM rupture [1, 6, 10], but their performances in predicting hemorrhage were not ideal in this study. Their deficiencies did not radically change regardless of the algorithms we used or the increased sampling times or different training sample sizes. We believed that obtaining an ideal prediction model for predicting bAVM rupture might depend on the identification of new, more valuable predictors.
According to statistics, it is generally considered that if we try to obtain an effective result in regression analysis, the sample size should meet the 10 EPV rule. Our study showed that if the training sample size for the LR algorithm was increased on the basis of the 10 EPV rule, the predicting performance would only be improved slightly. This result indirectly proved the 10 EPV rule. Although the RF algorithm has shown advantages in many studies, in this study, its performance was not better than that of the LR algorithm. This result suggested that if there were not some significant independent variables, it would also be difficult for the RF algorithm to display its power.
In most previous studies on prediction models, the training dataset was almost always based on a single random sampling or date order; in fact, the number of sampling times was not specified in the statistics [5, 6]. However, in our study, the SDs reflected the instability that resulted from different samplings.
This study was based on clinical data from 353 patients with bAVMs; limitations in the sample size may affect the conclusions, and data were collected from a single center. The reliability and generality of the conclusions should be verified in a multicenter study.
Both the prediction model by LR algorithm or RF algorithm based on the current risk predictors are not ideal. Compared with sample size and algorithms, meaningful predictors are more important in establishing an accurate and stable predictive model.
Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Brain arteriovenous malformation
Area under the curve
A Randomized Trial of Unruptured Brain Arteriovenous Malformation
Events per variable
Digital subtraction angiography
Solomon RA, Connolly ES Jr. Arteriovenous malformations of the brain. N Engl J Med. 2017;376(19):1859–66.
Cenzato M, Boccardi E, Beghi E, Vajkoczy P, Szikora I, Motti E, et al. European consensus conference on unruptured brain AVMs treatment (supported by EANS, ESMINT, EGKS, and SINCH). Acta Neurochir (Wien). 2017;159(6):1059–64.
Magro E, Gentric JC, Darsaut TE, Ziegler D, Msi Bojanowski MW, et al. Responses to ARUBA: a systematic review and critical analysis for the design of future arteriovenous malformation trials. J Neurosurg. 2017;126(2):486–94.
Pulli B, Chapman PH, Ogilvy CS, Patel AB, Stapleton CJ, Leslie-Mazwi TM, et al. Multimodal cerebral arteriovenous malformation treatment: a 12-year experience and comparison of key outcomes to ARUBA. J Neurosurg. 2019;95:1–10.
Falconieri N, Van Calster B, Timmerman D, Wynants L. Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: a simulation study. Biom J. 2020;62(4):932–44.
Feghali J, Yang W, Xu R, Liew J, McDougall CG, Caplan JM, et al. R2eD AVM score. Stroke. 2019;50(7):1703–10.
Fatima N, Zheng H, Massaad E, Hadzipasic M, Shankar GM, Shin JH. Development and validation of machine learning algorithms for predicting adverse events after surgery for lumbar degenerative spondylolisthesis. World Neurosurg. 2020;140:627–41.
Couronne R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinform. 2018;19(1):270.
Hu WS, Hsieh MH, Lin CL. A novel atrial fibrillation prediction model for Chinese subjects: a nationwide cohort investigation of 682 237 study participants with random forest model. Europace. 2019;21(9):1307–12.
Huang Z, Peng K, Chen C, Zeng F, Wang J, Chen F. A reanalysis of predictors for the risk of hemorrhage in brain arteriovenous malformation. J Stroke Cerebrovasc Dis. 2018;27(8):2082–7.
Figueroa RL, Zeng-Treitler Q, Kandula S, Ngo LH. Predicting sample size required for classification performance. BMC Med Inform Decis Mak. 2012;12:8.
Palazon-Bru A, Folgado-de la Rosa DM, Cortes-Castell E, Lopez-Cascales MT, Gil-Guillen VF. Sample size calculation to externally validate scoring systems based on logistic regression models. PLoS ONE. 2017;12(5):e0176726.
Pavlou M, Ambler G, Seaman SR, Guttmann O, Elliott P, King M, et al. How to develop a more accurate risk prediction model when there are few events. BMJ. 2015;351:h3868.
Halim AX, Johnston SC, Singh V, McCulloch CE, Bennett JP, Achrol AS, et al. Longitudinal risk of intracranial hemorrhage in patients with arteriovenous malformation of the brain within a defined population. Stroke. 2004;35(7):1697–702.
Hernesniemi JA, Dashti R, Juvela S, Vaart K, Niemela M, Laakso A. Natural history of brain arteriovenous malformations: a long-term follow-up study of risk of hemorrhage in 238 patients. Neurosurgery. 2008;63(5):823–9 ((discussion 9-31)).
Ondra SL, Troupp H, George ED, Schwab K. The natural history of symptomatic arteriovenous malformations of the brain: a 24-year follow-up assessment. J Neurosurg. 1990;73(3):387–91.
Rutledge WC, Ko NU, Lawton MT, Kim H. Hemorrhage rates and risk factors in the natural history course of brain arteriovenous malformations. Transl Stroke Res. 2014;5(5):538–42.
Stapf C, Mast H, Sciacca RR, Choi JH, Khaw AV, Connolly ES, et al. Predictors of hemorrhage in patients with untreated brain arteriovenous malformation. Neurology. 2006;66(9):1350–5.
van der Schaaf IC, Brilstra EH, Rinkel GJ, Bossuyt PM, van Gijn J. Quality of life, anxiety, and depression in patients with an untreated intracranial aneurysm or arteriovenous malformation. Stroke. 2002;33(2):440–3.
Alexander MD, Cooke DL, Nelson J, Guo DE, Dowd CF, Higashida RT, et al. Association between Venous Angioarchitectural Features of Sporadic Brain Arteriovenous Malformations and Intracranial Hemorrhage. AJNR Am J Neuroradiol. 2015;36(5):949–52.
Dinc N, Platz J, Tritt S, Quick-Weller J, Eibach M, Wolff R, et al. Posterior fossa AVMs: increased risk of bleeding and worse outcome compared to supratentorial AVMs. J Clin Neurosci. 2018;53:171–6.
Padilla-Vazquez F, Zenteno MA, Balderrama J, Escobar-de la Garma VH, Juan DS, Trenado C. A proposed classification for assessing rupture risk in patients with intracranial arteriovenous malformations. Surg Neurol Int. 2017;8:303.
Yamada S, Takagi Y, Nozaki K, Kikuta K, Hashimoto N. Risk factors for subsequent hemorrhage in patients with cerebral arteriovenous malformations. J Neurosurg. 2007;107(5):965–72.
Senders JT, Staples PC, Karhade AV, Zaki MM, Gormley WB, Broekman MLD, et al. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018;109(476–86):e1.
The authors thank Xing Huang, PhD (Hunan Normal University), for assistance with statistical analysis.
This study has received funding by grants from the National Natural Science Foundation of China (Grant Number 81873756).
Ethics approval and consent to participate
All procedures in this retrospective study that involved human participants were approved by the ethical committee of our hospital and performed in accordance with the institutional ethical standards, the 1964 Helsinki declaration and its later amendments, or comparable ethical standards. This is a retrospective longitudinal study. The data are anonymous, and the requirement for written informed consent was waived by the Institutional Review Board.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tao, W., Yan, L., Zeng, M. et al. Factors affecting the performance of brain arteriovenous malformation rupture prediction models. BMC Med Inform Decis Mak 21, 142 (2021). https://doi.org/10.1186/s12911-021-01511-z
- Brain arteriovenous malformation
- Logistic regression
- Random forest
- Prediction model