Skip to main content

Predictive model of prognosis index for invasive micropapillary carcinoma of the breast based on machine learning: a SEER population-based study

Abstract

Background

Invasive micropapillary carcinoma (IMPC) is a rare subtype of breast cancer. Its epidemiological features, treatment principles, and prognostic factors remain controversial.

Objective

This study aimed to develop an improved machine learning-based model to predict the prognosis of patients with invasive micropapillary carcinoma.

Methods

A total of 1123 patients diagnosed with IMPC after surgery between 1998 and 2019 were identified from the Surveillance, Epidemiology, and End Results (SEER) database for survival analysis. Univariate and multivariate analyses were performed to explore independent prognostic factors for the overall and disease-specific survival of patients with IMPC. Five machine learning algorithms were developed to predict the 5-year survival of these patients.

Results

Cox regression analysis indicated that patients aged > 65 years had a significantly worse prognosis than those younger in age, while unmarried patients had a better prognosis than married patients. Patients diagnosed between 2001 and 2005 had a significant risk reduction of mortality compared with other periods. The XGBoost model outperformed the other models with a precision of 0.818 and an area under the curve of 0.863.

Conclusions

A machine learning model for IMPC in patients with breast cancer was developed to estimate the 5-year OS. The XGBoost model had a promising performance and can help clinicians determine the early prognosis of patients with IMPC; therefore, the model can improve clinical outcomes by influencing management strategies and patient health care decisions.

Peer Review reports

Introduction

Breast cancer is the most common cancer and the second most common cause of cancer-related death among women worldwide [1]. The pathological types of breast cancer are predominantly non-specific invasive ductal carcinomas and invasive micropapillary carcinoma (IMPC), a rare pathological type that accounts for approximately 3–6% of all invasive breast cancers [2].

IMPC was first discovered by Luna-Moré et al. [3]. as atypical tumour cells with a papillary or papillary-like structure, no fibrovascular core, and a special spatial arrangement with cell polarity reversal [4,5,6,7]. IMPC is often characterised by the early detection of large masses and high axillary lymph node metastasis [8, 9], which is an important factor leading to late-stage and post-operative local recurrence in patients. Over the past 30 years, IMPC has become a popular research topic, both domestically and internationally [10,11,12]. Due to the lack of targeted diagnosis and treatment programs, the current clinical application of IMPC is still based on invasive breast cancer, meaning that the treatment strategy might not be adequately precise. Currently, clinicians commonly use the American Joint Committee on Cancer (AJCC) staging system or pathologic factors to predict cancer prognosis. However, due to the rarity and aggressive behavior of IMPC, Prognostic prediction based solely on AJCC stage or pathologic factors is insufficient to accurately predict the prognosis of all IMPC patients. Studies have shown that the unique pathological features and high metastatic potential of IMPC are not fully captured by traditional staging systems [13]. Additionally, our analysis demonstrates that incorporating a wider range of variables, including demographic and treatment factors, significantly enhances the accuracy of survival predictions. In addition to factors such as tumor lymph node metastasis classification, there is evidence that other factors, such as demographic characteristics and treatment strategies, also influence the prognosis of IMPC patients. Therefore, it’s important to consider and identify factors that can serve as prognostic factors in order to develop models that accurately predict survival outcomes for IMPC patients. With this in mind, some scholars have attempted to construct nomogram models to predict the prognosis of IMPC. However, these predictive models have their own drawbacks, including the possibility of introducing selection bias, generally low accuracy, as well as low area under the curve (AUC) and lack of comparison between different models, among others. In recent years, machine learning (ML) has garnered widespread attention and has been applied to various healthcare problems, including outcome prediction and treatment support. For instance, Siddiqui et al. (2020) developed a cost-sensitive seizure detection classifier for imbalanced EEG data sets using machine learning techniques, demonstrating significant improvements in prediction accuracy [14]. Additionally, Siddiqui et al. (2019) proposed a novel quick seizure detection and localization method through brain data mining on ECoG datasets, further showcasing the potential of machine learning in medical applications [15]. Although ML has been used to predict axillary lymph node metastasis in IMPC, it has not yet been utilized in predicting prognosis in IMPC. Therefore, we built a machine learning model based on the unique pathological abnormalities and prognostic factors of these individuals with IMPC to develop more accurate diagnoses and treatment plans for these patients. The data for this study were sourced from the Surveillance, Epidemiology, and End Results (SEER) database, which contains a wide range of cancer-related diagnoses, treatments, and survival outcomes and covers approximately 28% of cancer patients in the United States [16, 17]. Our study thus aimed to incorporate multiple possible prognostic factors into a single mathematical model to construct an accurate prognostic indicator system for predicting IMPC.

We aimed to use a combination of machine learning and the SEER database to construct an efficient mathematical model for multiple factors, such as column line graphs, that are otherwise difficult to construct. Firstly, we evaluated the clinicopathological features, treatment modalities, and prognosis of IMPC. In addition, a thorough prediction model with 10 prognostic parameters (year, age, HER2, primary site, race, chemotherapy, ER, PR, Surgery, and Radiation) was built utilising machine learning techniques. We believe the predictive model established in this study has a high correlation and accuracy, which can help doctors perform better treatment.

Methods

Patient selection

This study used the SEER*Stat software (version 8.4.2; https://seer.cancer.gov/SEER). IMPC patients diagnosed between 1998 and 2019 were first located in the SEER database. International Classification of Diseases (ICD) code O-3 morphology 8507/3, 2) IMPC as the sole or initial primary tumour with histological confirmation, 3) comprehensive clinicopathological and follow-up data, and 4) known causes of mortality and survival times were the inclusion criteria. The following were the exclusion criteria: (1) unidentified race; (2) unidentified histological grade; (3) unidentified clinical stage; (4) unidentified ER status; (5) unidentified PR status; and (6) unidentified cause of death; and (7) records with missing values.

Data collection

A detailed patient selection workflow is shown in Fig. 1. After excluding patients with incomplete information, 1123 patients with IMPC were enrolled in the study and randomly assigned to training and validation cohorts at a 80:20 ratio. The variables included patient age, race, year of diagnosis [18, 19], marital status, primary tumour site, histology, TNM stage, surgical status, radiotherapy status, and chemotherapy status. The primary endpoints were overall survival (OS) and disease-specific survival (DSS). The OS was defined as the period from the patient’s diagnosis to their death (the last follow-up visit was the time to death for patients who were lost to follow-up before their death). The duration between an IMPC-caused diagnosis and death was designated as the DSS.

Fig. 1
figure 1

The workflow of the patient selection process. ByFigdraw (ID: TUPSY555de)

Statistical analysis

R Studio software was used to screen for statistically significant variables (P < 0.05 was considered statistically significant) in the univariate Cox regression models. Multivariate Cox regression analyses were performed to determine independent prognostic indicators, risk ratios, and 95% confidence intervals (CIs) related to DSS and OS.

Survival curves were generated using the Kaplan-Meier method, and the log-rank test was used to determine differences in the demographic and clinical characteristics of patients with IMPC. Factors associated with the outcome were determined using Cox proportional risk regression models to determine the hazard ratios (HRs) associated with 95% CIs. Statistical analyses were performed using SPSS (version 26.0; IBM, Armonk, New York, United States), and P-values < 0.05 were considered statistically different. In addition, we compare the performance of the AJCC7 TNM staging model based on the Cox proportional hazards regression model. Ten categorical indicators were gathered, including year, age, HER2, primary site, race, chemotherapy, ER, PR, Surgery, and Radiation to create a machine learning model for predicting the 5-year survival (Fig. 1).

The package of “MissForest” was used to impute missing values in the data set. Out of all the patients recruited, 1.8 per 1,000 (n = 2) were excluded because of unknown primary site, 2.7 per 1,000 (n = 3) had missing information about the histologic type, and 1.8 per 1,000 (n = 1) lacked surgical treatment information. The “MissForest” algorithm performed well because the percentage of missing values was far lower than the severe missingness cut-off value of 75% [20]. In this study, we developed a machine learning-based model to predict the 5-year survival of patients with invasive micropapillary carcinoma (lMPC) using data from the SEER database. The study design and reporting were guided by the TRlPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines to ensure transparency and reproducibility [21]. Various machine learning algorithms were evaluated for their predictive performance. Prior to developing the machine learning model, a 80:20 randomisation process was used to separate all patients with invasive breast cancer into training and test groups. Five machine learning algorithms — support vector machine (SVM), k-nearest neighbour (KNN), Random Forest, Extra Trees, and eXtreme Gradient Boosting (XGBoost) — were employed in our investigation. For each model, a 10-fold internal cross-validation was used to identify the ideal parameters that yielded the best level of accuracy. The detailed information about five machine learning algorithms was showed in the Supplement 1. A test set was used to assess each machine learning algorithm’s performance using metrics for sensitivity, accuracy, precision, Negative Predictive Value (NPV), and area under the curve (AUC) of the subjects’ working characteristics. All analyses were conducted using Python (version 3.8; Python Software Foundation, Wilmington, Delaware, United States).

Results

Study population

The baseline clinical characteristics of the patients are presented in Table 1. Overall, 1123 eligible patients were included in our study. Of these, 639 (56.9%) and 484 (43.1%) were aged < 65 years and ≥ 65 years, respectively. There were 863 (76.8%) and 107 (9.5%) White and Black participants, respectively. TNM staging was distributed as follows: 513 (45.7%) cases were stage I, 369 (32.9%) cases were stage II, 186 (16.6%) cases were stage III, and 35 (3.1%) cases were stage IV. HR+/HER2- was the most common IMPC histological type, followed by HR+/HER2 + in 14% of cases; triple-negative breast cancer was the least common type. Additionally, registered patients tended to receive localised treatments, including surgery (no surgery: n = 70 [6.2%] vs. mastectomy: n = 1051 [93.6%]) and radiation therapy (no radiotherapy: n = 447 [39.8%] vs. radiotherapy: n = 676 [60.2%]), whereas approximately the same proportion of patients were treated with and without chemotherapy (no chemotherapy: n = 593 [52.8%] vs. chemotherapy: n = 530 [47.2%]).

Table 1 Baseline characteristics of IMPC

Survival analysis

The median follow-up period of the enrolled patients was 52 months. Kaplan-Meier curves for OS and DSS based on different demographic and clinical characteristics at baseline are shown in Figs. 2 and 3, respectively. In our analysis, patients aged > 65 years had a significantly worse prognosis than those younger in age, and unmarried patients had a better prognosis than married patients, suggesting that age and marital status are important prognostic factors. Figures 2D and 3D display the Kaplan-Meier curves for the time of disease diagnosis; the periods from 2001 to 2005 are superior to the other periods. Survival was shorter in patients with the HR-/HER2- subtype than in those with other histological subtypes. Patients with IMPC who received localised therapy demonstrated a unique survival benefit compared with those who did not receive localised therapy. It was also discovered that the primary site of the breast had an impact on the prognosis of IMPC patients; those whose original tumor site was in the outer lower quadrant of the breast had a worse prognosis than those whose primary tumor site was in another breast quadrant.

Fig. 2
figure 2

Kaplan-Meier estimate of overall survival by subgroup analysis: (A) age, (B) race, (C) marital status, (D) year of diagnosis; (E) primary tumor site, (F) histologic type; (G) stage, (H) surgery status, (I)radiation status, and (J) chemotherapy status

Fig. 3
figure 3

Kaplan-Meier estimate of disease-specific survival by subgroup analysis: (A) age, (B) race, (C) marital status, (D) year of diagnosis, (E) primary tumor site, (F) histologic type, (G) stage, (H) surgery status, (I) radiation status, and (J) chemotherapy status

Race was not associated with prognosis. Regarding treatment, local surgery, chemotherapy, and radiotherapy resulted in better OS and DSS; the univariate Cox regression analysis for each variable is shown in Supplement 2. The results of the multifactorial Cox regression analysis are presented in Table 2. Independent predictors included age, diagnosis year, surgery status, chemotherapy, and radiation. At the time of diagnosis, patients over 65 years old had a higher chance of dying than those under 65 ([OS] HR: 4.033, 95% CI: 2.762–5.888, P < 0.001; [DSS] HR: 2.116, 95% CI: 1.243–3.603, P < 0.05). Mortality was significantly higher in patients diagnosed with stages III and IV than in patients diagnosed at an early stage ([OS] HR: 3.931, 95% CI: 1.953–7.913, P = 0.000; [DSS] HR: 51.387, 95% CI: 17.045–154.918, P = 0.000).

Table 2 Multivariate Cox proportional hazard model of disease-specific survival and overall survival in all patients

Local therapy is an important treatment for patients with IMPC and translates into a significant survival benefit. Regardless of breast-conserving or total mastectomy, surgical treatment substantially reduced the risk of disease-specific and all-cause mortality: breast-conserving surgery ([OS] HR: 0.261, 95% CI: 0.146–0.465, P = 0.000; [DSS] HR: 0.219, 95% CI: 0.086–0.558, P = 0.001); mastectomy ([OS] HR: 0.238, 95% CI: 0.137–0.414, P = 0.000; [DSS] HR: 0.300, 95% CI: 0.126–0.714, P = 0.007). Similarly, The risk of both disease-specific and all-cause mortality was decreased by radiation therapy([OS] HR: 0.589, 95% CI: 0.404–0.859, P = 0.006; [DSS] HR: 0.566, 95% CI: 0.323–0.991, P = 0.046); However, those who received chemotherapy and those who did not did not significantly vary in terms of OS or DSS([OS] HR: 0.804, 95% CI: 0.530–1.220, P = 0.305; DSS, HR: 0.989, 95% CI: 0.550–1.780, P = 0.972). Additionally, although there was a difference in OS between married and unmarried participants, DSS was not significantly different ([OS] HR: 1.849, 95% CI: 1.066–3.208, P = 0.029; DSS, HR: 1.123, 95% CI: 0.540–2.334, P = 0.757). The OS and DSS did not significantly differ between races ([OS] HR: 1.045, 95% CI: 0.607–1.799, P = 0.874; [DSS] HR: 0.462, 95% CI: 0.175–1.220, P = 0.119).

Machine learning-based 5-year survival prediction in patients with IMPC

A total of 1123 patients diagnosed with invasive micropapillary carcinoma (IMPC) were included in this study. After excluding censored data and focusing on the 5-year survival status, the following results were observed: Number of Patients Alive at 5 Years: 456, Number of Patients Deceased at 5 Years: 119.

The TNM staging model based on the Cox proportional hazards regression model showed that the AUCs value is 0.614 (Supplement-Fig. 1).

Five machine learning models (10 variables were included in this model: year, age, HER2, primary site, race, chemotherapy, ER, PR, Surgery, and Radiation) were trained on a dataset of 1123 individuals in order to forecast the 5-year survival rate following an IMPC diagnosis. Table 3 lists the performances of these five algorithms, and Supplement 1 displays the features selection results. Supplement 3 displays the confusion matrix results. For the test dataset, the sensitivities for the SVM, KNN, Random Forest, Extra Trees, and XGBoost models were 0.863, 0.829, 0.863, 0.846, and 0.863, respectively. The AUCs for the SVM, KNN, Random Forest, Extra Trees, and XGBoost models were 0.989, 0.968, 0.947, 0.936, 0.979, respectively. Figure 4 displays the operating characteristic curves and decision curve analysis for the five receiver models.

Table 3 Model performance for the 5-year survival
Fig. 4
figure 4

Decision curve analyses (DCA) and Receiver operating characteristic curves (ROC) for each model in the testing dataset. (A, F) SVM-based model; (B, G) KNN-based model; (C, H) RandomForest-based model; (D, I) ExtraTrees-based model; (E, J) XGBoost-based model

We mainly concentrated on assessing the sensitivity of high-risk patients whose fatalities happened in the fifth year due to the design of our study. The XGBoost model outperformed the other four models in terms of accuracy, precision, sensitivity, and net present value. The XGBoost algorithm turned out to be the most appropriate model for this investigation because the model also showed a high AUC.

Discussion

In 2022, breast cancer fully surpassed lung cancer as the most common cancer, with the highest incidence rate worldwide [1]. Micropapillary carcinoma is a rare pathological subtype of breast cancer with a low incidence, and its treatment is primarily based on invasive ductal carcinoma owing to the lack of specialised guidelines. Currently, the treatment of patients with breast cancer is mainly based on a combination of surgery, chemotherapy, endocrine therapy, targeted therapy, immunotherapy, and other therapeutic modalities, which has substantially improved the disease control rate and quality of life of patients compared with previous simple surgical or medical treatment [22, 23]. Patients with IMPC have a high rate of clinical lymph node metastasis, which was reported to be up to 84.45% or more in one study [8]. This has to do with the tumor’s significant lymphovascular invasiveness. The microcapillary histological pattern’s underlying biology is detrimental to the tumor’s lymphatic route. Additionally, IMPC is classified as simple versus mixed, with simple IMPC exhibiting more aggressive behaviour, lower locoregional recurrence-free survival, and more locoregional recurrences than mixed IMPC [24]. Scholars such as Verras [25] have argued that Breast surgeons should be aware that IMPC may necessitate broader margins of excision, even in the absence of particular recommendations.

Radiation therapy, being younger than 65, and having an ER positive status have all been demonstrated to be protective variables. Larger tumors, younger age, Black racial background, and absence of hormone receptor expression were also substantially linked to regional lymph node involvement. Although surgery can effectively slow the growth of IMPC, there are no predictive factors for patients who have had surgery. As a result, machine learning has helped personalized medicine gain traction and has been suggested as a way to enhance illness prediction. In this study, we analysed 1123 patients with IMPC breast cancer in the SEER database to identify age, marital status, stage, ER status, surgery, radiotherapy, and chemotherapy as factors affecting prognosis. Patients with IMPC who were 65 years of age or older, single, had a stage III-IV tumor, and had a negative ER status were all associated with poor postoperative prognoses. While there was no difference in overall survival (OS) between mastectomy and breast-conserving surgery, the prognosis of patients was much improved by surgery, radiation, and chemotherapy.

Developing a new technology that reliably and accurately judges tumour prognosis is of great significance for the early diagnosis of patients with breast micropapillary carcinoma. Currently, our diagnosis of patients with breast micropapillary carcinoma is mainly based on its clinical and pathological characteristics; however, the currently available information is not sufficient for clinical physicians to effectively treat the disease. Meng et al. [26] and Zhang [27] evaluated the survival of IMPC using column line analysis; however, there were still issues such as selection bias, sample bias, and molecular-level deletions, which limited the accuracy of this method. In addition, methods that had a significant impact on patient prognosis — therapies, including radiation, chemotherapy, and surgery, were left out of this model since the data did not support their inclusion. Such omissions have filled all the controversy over the results.

In medical practice, machine learning has gained increasing research attention in areas such as disease diagnosis, prognosis, and treatment plan formulation [28,29,30]. This method achieves the maximum utilisation of data by adaptively adjusting the weights of each factor. Machine learning methods were previously used to predict the recurrence of invasive breast cancer at 5 and 10 years of age. Compared with traditional TNM staging model, our Machine learning models are adept at automatically adjusting the importance or weight assigned to different factors to create predictive models that make the most of available data without sacrificing accuracy. Unlike traditional statistical methods, which may discard certain factors if they don’t meet strict significance criteria, machine learning algorithms can incorporate a broader range of variables, allowing for a more comprehensive analysis (for example, Cox proportional hazard regression model). Our machine learning algorithms excel in predicting fifth-year survival outcomes, surpassing the accuracy of traditional statistical methods. This proficiency stands out as a key strength of our model. Moreover, for studying the association between covariates and endpoint events, Cox proportional hazards modeling proves more suitable. When it comes to speed, machine learning algorithms offer a significant advantage, delivering results within milliseconds. This rapid processing enables the system to respond in real-time, enhancing its overall efficiency. A critical point of significance for the early prognostic determination of patients with micropapillary breast cancer, the 5-year survival rate was used in this study as a predictive endpoint. Compared with other methods [31,32,33,34], XGBoost has better predictive performance and broad application prospects. The XGBoost model constructed earlier (XGBoost) had a higher AUC, indicating that good prediction results could still be obtained under small-sample conditions. When establishing the XGBoost model, factors such as year of diagnosis, age, histological type, and site of origin were considered to have the greatest impact on the 5-year survival rate. Age and histological type are currently recognised as important prognostic factors, which is consistent with other research results [35]. Undoubtedly, patients with micropapillary carcinoma of the breast have more underlying diseases, poor drug resistance, or suboptimal health due to the advanced age of disease onset, which may have a direct negative impact on survival time.

The sample size in our study was sufficient to comprehend prognostic variables of cancer during the previous 20 years. Consequently, new machine learning algorithm-based models have been developed to forecast the 5-year survival rate. Among them, the XGBoost method showed the best performance regarding the AUC, precision, accuracy, sensitivity, and NPV score. This project is expected to provide an efficient early diagnostic method for tumour-related diseases, laying the foundation for clinical physicians to develop personalised treatment plans, management strategies, and treatment plans. This model may therefore help identify patients with a higher risk of adverse outcomes who require more aggressive treatment.

This was a retrospective study with certain limitations. It would have been better if there had been data from an outside organization that could have validated our models. But IMPC is a very rare type of breast cancer (with an incidence of 3–6%), and then we are also currently collecting cases such as are contacting other units to expand our samples, and hopefully will be able to use them for validation in the future. By 2010, about 31.2% of patients with breast cancer had not detected HER2, which made the clinical prognostic effect of HER2. In addition, no data for simple and mixed pathological subtyping of patients with IMPC were available in the SEER database, which will also have an impact on the research results. Although the influence of M stage on the univariate analysis was significant, we did not use M stage as a parameter in the model due to the small number of samples with distant metastasis in the tumour tissue, and the possibility of errors in the research conclusions. In conclusion, this model has a good application value for the post-operative prognosis of patients with IMPC.

Conclusions

In this study, we developed a machine learning-based model to predict the 5-year overall survival (OS) of patients with invasive micropapillary carcinoma (IMPC) using data from the SEER database. Among the various models evaluated, the XGBoost model outperformed others, achieving a precision of 0.818 and an area under the curve (AUC) of 0.863. These findings indicate that the XGBoost model is a robust tool for forecasting the prognosis of IMPC patients. By incorporating an extensive set of clinical and demographic variables, our model offers critical insights that can support clinicians in making informed decisions about patient management and treatment strategies. The implementation of such predictive models has the potential to enhance clinical outcomes by facilitating personalized and timely interventions. Future research should focus on validating this model across diverse populations and integrating it into clinical practice to further improve its applicability and impact.

Data availability

Our primary data were extracted from the publicly available SEER database.The URL of thedatabase is https://seer.cancer.gov/. After agreeing to a data use agreement for the SEER 1998–2019 research data file, we were granted authorisation to extract and use the data.

References

  1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72:7–33.

    Article  PubMed  Google Scholar 

  2. Ye F, Yu P, Li N, Yang A, Xie X, Tang H, et al. Prognosis of invasive micropapillary carcinoma compared with invasive ductal carcinoma in breast: a meta-analysis of PSM studies. Breast. 2020;51:11–20.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Luna-Moré S, De Los Santos F, Bretón JJ, Cañadas MA, Estrogen, Receptors P. C-ERBB-2, p53, and BCL-2 in thirty-three invasive Micropapillary breast carcinomas. Pathol - Res Pract. 1996;192:27–32.

    Article  PubMed  Google Scholar 

  4. Bayramoglu H, Zekioglu O, Erhan Y, Çiriş M, Özdemir N. Fine-needle aspiration biopsy of invasive micropapillary carcinoma of the breast: a report of five cases. Diagn Cytopathol. 2002;27:214–7.

  5. Öngürü Ö, Deveci S, Günhan Ö. Cytological findings of invasive micropapillary carcinoma of the breast: a report of two cases. Cytopathology. 2002;13:160–3.

    Article  PubMed  Google Scholar 

  6. Kim M-J, Gong G, Joo HJ, Ahn S-H, Ro JY. Immunohistochemical and Clinicopathologic Characteristics of Invasive Ductal Carcinoma of breast with Micropapillary Carcinoma Component. Arch Pathol Lab Med. 2005;129.

  7. Ota D, Toyama T, Ichihara S, Mizutani M, Kamei K, Iwata H. A case of Invasive Micropapillary Carcinoma of the breast. Breast Cancer. 2007;14:323–6.

    Article  PubMed  Google Scholar 

  8. Chen L, Lang YF, Guo R, Sun X, Cui Y. Breast Carcinoma with Micropapillary features: Clinicopathologic Study and Long-Term Follow-Up of 100 cases. Int J Surg Pathol. 2008;16:155–63.

    Article  CAS  PubMed  Google Scholar 

  9. Nassar H, Wallis T, Andea A, Dey J, Adsay V, Visscher D. Clinicopathologic analysis of Invasive Micropapillary differentiation in breast carcinoma. Mod Pathol. 2001;14:836–41.

    Article  CAS  PubMed  Google Scholar 

  10. Vieira TC, Oliveira EA, Santos BJD, Souza FR, Veloso ES, Nunes CB, et al. COX-2 expression in mammary invasive micropapillary carcinoma is associated with prognostic factors and acts as a potential therapeutic target in comparative oncology. Front Vet Sci. 2022;9:983110.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Shi Q, Shao K, Jia H, Cao B, Li W, Dong S, et al. Genomic alterations and evolution of cell clusters in metastatic invasive micropapillary carcinoma of the breast. Nat Commun. 2022;13:111.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Xu J, Ma H, Wang Q, Zhang H. Expression of autocrine motility factor receptor (AMFR) in human breast and lung invasive micropapillary carcinomas. Int J Exp Pathol. 2023;104:43–51.

    Article  CAS  PubMed  Google Scholar 

  13. Li Y, Liu J, Xu Z, Shang J, Wu S, Zhang M, et al. Construction and validation of a nomogram for predicting the prognosis of patients with lymph node-positive invasive micropapillary carcinoma of the breast: based on SEER database and external validation cohort. Front Oncol. 2023;13:1231302.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Siddiqui MK, Huang X, Morales-Menendez R, Hussain N, Khatoon K. Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets. Int J Interact Des Manuf IJIDeM. 2020;14:1491–509.

    Article  Google Scholar 

  15. Siddiqui MK, Islam MZ, Kabir MA. A novel quick seizure detection and localization through brain data mining on ECoG dataset. Neural Comput Appl. 2019;31:5595–608.

    Article  Google Scholar 

  16. Hu G, Hu G, Zhang C, Lin X, Shan M, Yu Y, et al. Adjuvant chemotherapy could not bring survival benefit to HR-positive, HER2-negative, pT1b-c/N0–1/M0 invasive lobular carcinoma of the breast: a propensity score matching study based on SEER database. BMC Cancer. 2020;20:136.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Yoon TI, Jeong J, Lee S, Ryu JM, Lee YJ, Lee JY, et al. Survival outcomes in premenopausal patients with invasive lobular carcinoma. JAMA Netw Open. 2023;6:e2342270.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gradishar WJ, Moran MS, Abraham J, Abramson V, Aft R, Agnese D, et al. Breast Cancer, Version 3.2024, NCCN Clinical Practice guidelines in Oncology. J Natl Compr Canc Netw. 2024;22:331–57.

    Article  PubMed  Google Scholar 

  19. Slamon DJ, Leyland-Jones B, Shak S, Fuchs H, Paton V, Bajamonde A, et al. Use of Chemotherapy plus a monoclonal antibody against HER2 for metastatic breast Cancer that overexpresses HER2. N Engl J Med. 2001;344:783–92.

    Article  CAS  PubMed  Google Scholar 

  20. Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min ASA Data Sci J. 2017;10:363–77.

    Article  Google Scholar 

  21. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. Ann Intern Med. 2015;162:55–63.

    Article  PubMed  Google Scholar 

  22. Mutebi M, Anderson BO, Duggan C, Adebamowo C, Agarwal G, Ali Z, et al. Breast cancer treatment: a phased approach to implementation. Cancer. 2020;126:2365–78.

    Article  PubMed  Google Scholar 

  23. Takahashi M, Cortés J, Dent R, Pusztai L, McArthur H, Kümmel S, et al. Pembrolizumab Plus Chemotherapy followed by Pembrolizumab in patients with early triple-negative breast Cancer: a secondary analysis of a Randomized Clinical Trial. JAMA Netw Open. 2023;6:e2342107.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Eren Kupik G, Altundağ K. The clinicopathological characteristics of pure and mixed invasive Micropapillary breast carcinomas: a single Center experience. Balk Med J. 2022;39:275–81.

    Article  Google Scholar 

  25. Verras G-I, Mulita F, Tchabashvili L, Grypari I-M, Sourouni S, Panagodimou E, et al. A rare case of invasive micropapillary carcinoma of the breast. Menopausal Rev. 2022;21:73–80.

    Article  Google Scholar 

  26. Meng X, Ma H, Yin H, Yin H, Yu L, Liu L, et al. Nomogram Predicting the risk of Locoregional Recurrence after Mastectomy for Invasive Micropapillary Carcinoma of the breast. Clin Breast Cancer. 2021;21:e368–76.

    Article  PubMed  Google Scholar 

  27. Zhang T. Nomograms for predicting overall survival and cancer-specific survival in patients with invasive micropapillary carcinoma: Based on the SEER database.

  28. Demircioğlu A. The effect of preprocessing filters on predictive performance in radiomics. Eur Radiol Exp. 2022;6:40.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Huang W, Shang Q, Xiao X, Zhang H, Gu Y, Yang L, et al. Raman spectroscopy and machine learning for the classification of esophageal squamous carcinoma. Spectrochim Acta Mol Biomol Spectrosc. 2022;281:121654.

    Article  CAS  Google Scholar 

  30. Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med. 2005;34:113–27.

    Article  PubMed  Google Scholar 

  31. Huang Y, Chen W, Zhang X, He S, Shao N, Shi H, et al. Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy using a multiparametric MRI-Based machine learning model in patients with breast Cancer. Front Bioeng Biotechnol. 2021;9:662749.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Kurrant D, Omer M, Abdollahi N, Mojabi P, Fear E, LoVetri J. Evaluating performance of Microwave Image Reconstruction algorithms: extracting tissue types with segmentation using machine learning. J Imaging. 2021;7:5.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Rakshit P, Zaballa O, Pérez A, Gómez-Inhiesto E, Acaiturri-Ayesta MT, Lozano JA. A machine learning approach to predict healthcare cost of breast cancer patients. Sci Rep. 2021;11:12441.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Gutiérrez-Cárdenas J, Wang Z. Classification of breast Cancer and breast neoplasm scenarios based on machine learning and sequence features from lncRNAs–miRNAs-Diseases associations. Interdiscip Sci Comput Life Sci. 2021;13:572–81.

    Article  Google Scholar 

  35. Sun Y, Gu W, Wang G, Zhou X. The clinicopathological and prognostic characteristics of mucinous micropapillary carcinoma of the breast. Preprint. In Review; 2021.

Download references

Acknowledgements

The authors thank reviewers for helpful comments on the manuscript. And We also thank OnekeyAI platform and its developers.

Funding

This study was funded by Natural Science Foundation of ningde, China (NO. 2023J14) and High-level Talent Introduction Proiect of Fujian Cancer Hospital (NO. F2328RGC301-01).

Author information

Authors and Affiliations

Authors

Contributions

CGS conceived and designed the study. ZRJ and YSY performed the experiments. XY, QW, KYH analyzed the data. ZRJ wrote the manuscript. ZRJ, MYH revised the manuscript. All authors have read and approved this manuscript.

Corresponding author

Correspondence to Chuangui Song.

Ethics declarations

Ethics approval and consent to participate

Our primary data were extracted from the publicly available SEER database. The URL of the database is https://seer.cancer.gov/. After agreeing to a data use agreement for the SEER 1998–2019 research data file, we were granted authorisation to extract and use the data. As a result, informed consent and human subject research ethics evaluations were not required for this study. We verified that all statistical analyses were carried out in compliance with the guidelines of the SEER Program and that the data of enrolled patients was either anonymous or de-identified.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Z., Yu, Y., Yu, X. et al. Predictive model of prognosis index for invasive micropapillary carcinoma of the breast based on machine learning: a SEER population-based study. BMC Med Inform Decis Mak 24, 268 (2024). https://doi.org/10.1186/s12911-024-02669-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12911-024-02669-y

Keywords