A model building exercise of mortality risk for Taiwanese women with breast cancer
© Chang and Kuo; licensee BioMed Central Ltd. 2010
Received: 14 July 2009
Accepted: 19 August 2010
Published: 19 August 2010
The accurate estimation of outcome in patients with malignant disease is an essential component of the optimal treatment, decision-making and patient counseling processes. The prognosis and disease outcome of breast cancer patients can differ according to geographic and ethnic factors. To our knowledge, to date these factors have never been validated in a homogenous loco-regional patient population, with the aim of achieving accurate predictions of outcome for individual patients. To clarify this topic, we created a new comprehensive prognostic and predictive model for Taiwanese breast cancer patients based on a range of patient-related and various clinical and pathological-related variables.
Demographic, clinical, and pathological data were analyzed from 1 137 patients with breast cancer who underwent surgical intervention. A survival prediction model was used to allow analysis of the optimal combination of variables.
The area under the receiver operating characteristic (ROC) curve, as applied to an independent validation data set, was used as the measure of accuracy. Results were compared by comparing the area under the ROC curve.
our model building exercise of mortality risk was able to predict disease outcome for individual patients with breast cancer. This model could represent a highly accurate prognostic tool for Taiwanese breast cancer patients.
Breast cancer is a serious threat to women's health. In Taiwan, breast cancer ranked fourth among the top 10 causes of death among women in the period from 1995 to 2003 . The investigative results published by the Bureau of Health Promotion, Department of Health, Taiwan, indicate that the incidence and mortality of breast cancer increase almost every year. The incidence rate and the age-adjusted incidence rate have both increased almost two-fold when compared with those calculated for the period from 1995 to 2003. The corresponding mortality also increased: the mortality rate increased from 8.9 per 10000 people to 12.45 per 10000 people and the age-adjusted death rate increased from 8.79 per 10000 people to 11.07 per 10000 people . Improved surgical procedures and chemotherapy regimens seem not to have effectively diminished breast cancer incidence and mortality [3, 4]. It is therefore important to identify risk factors that significantly affect survival among women with breast cancer, as the control of these risk factors.
Unlike most countries in Asia, which have produced few publications on cancer recurrence risk analyses among breast cancer patients, many such studies have been published in Western countries [5–8]. Among them, meta-analyses are widely used to discuss causal relationships between risk factors and breast cancer survival [9, 10]. Meta-analyses are secondary analyses that derive results from data reported in different studies addressing similar research topics. A different combination of methods can lead to different meta-analytical outcomes. Furthermore, it is extremely difficult to predict the disease outcome of cancer patients. To solve this problem, we used a logistic regression approach to simultaneously investigate the relationships between all significantly effected risk factors, including demographic, clinical, and pathological data, and the survival status of breast cancer patients.
The original data was collected from 1 190 patients with breast cancer diagnosed between January 1, 1995 and August 31, 2005 at the National Cheng Kung University Hospital, Tainan, Taiwan. As our objective was to study the prognostic factors of breast cancer and to develop more precise predictive mortality risk models, both patients with stage IV disease and patients who were followed up for less than one year were excluded from our analyses. Among the remaining 1 137 patients, 70 died and the other 1 067 were censored. The median age of the patients was 49 years (range, 20-88 years). Ethical approval was provided by Human Experiment and Ethics committee of the National Cheng Kung University Hospital (ER-99-076).
A variety of potential breast cancer risk factors were constructed for each patient. The demographic data included marriage status, education level, familial history of breast cancer, presence of other underlying diseases, and menopause status. The clinical data included physical examination (PE), ultrasound (US), fine-needle aspiration cytology (FNAC), core needle biopsy (CNB), mammography, type of breast surgery, and type of axillary lymphatic surgery. Finally, the pathological findings included tumor size, nodal status, tumor grade, estrogen receptor (ER) status, progesterone receptor (PR) status, Her-2/neu status, extensive intraductal carcinoma (EIC), presence of lymphatic tumor emboli (LTE), hepatitis B and C status, and hepatitis B surface antigen (HBsAg) and hepatitis C virus antibody (HCV Ab). The clinical and pathological data were classified into four categories: benign (B), intermediate (I), suspicious (S), and malignant (M). The different treatment modalities, including anti hormone therapy, radiotherapy, and chemotherapy, were also included in our analysis.
The overall survival function for breast cancer patients was calculated using the Kaplan-Meier method: the log-rank test was used to test the significance of different stage groups . To investigate the association between survival status and each potential risk factor, odds ratios were computed and p values were evaluated by using univariate logistic regression test, where applicable . Odds ratios were used to evaluate the relative odds of death caused by breast cancer between two groups sorted under a risk factor, and p values were calculated to assess significance of results. A multivariable logistic regression analysis was used to measure the significance of several risk factors simultaneously and to predict the survival probability of breast cancer patients . To determine the accuracy of our model, Bootstrap method was used, which can be implemented by obtaining a number of re-samples of our observed dataset . The predictive model, which was built using forward stepwise analyses, included only the risk factors that showed significance in the univariate analyses. Statistical significance was set at p < 0.05.
Three methods were used for the evaluation of the fitness of the multivariable logistic regression model. First, ROC curves (using FORTRAN programs)  were plotted to estimate the sensitivity and specificity of the predictive model. The closer that the area under ROC curve is to 1, the better the fit of the model. Second, the Hosmer-Lemeshow test, written as for the statistic being tested (where is the number of patients in the kth group, and and p k are the predicted and real possibilities of death, respectively, in the kth group) was used to examine the fitness of the predictive model by considering the difference between the predicted and observed probabilities of death caused by breast cancer. Patients were divided into several groups according to ordered predicted probability of death. The statistic Ĉ is well approximated by the chi-square distribution with g-2 degrees of freedom, X 2 g-2 . The larger the p value obtained using the Hosmer-Lemeshow test, (which corresponds to a smaller Ĉ), the smaller the square of the distance between and p k , and hence, the better the fit of the model [15, 16]. The comparison was performed based on the confidence interval of both models using the SPSS software, version 11.
Associations of breast cancer mortality with demographic, clinical, and pathological factors
Description of the population by univariate logistic regression test using the demographic data
Married (n = 1071)
Unmarried (n = 63)
Below junior high school (n = 685)
Above senior high school (n = 446)
Premenopause (n = 397)
Others (n = 732)
No (n = 798)
Yes (n = 104)
Familial breast cancer history
No (n = 997)
Yes (n = 139)
No (n = 577)
Yes (n = 560)
0.378 (a v.s. c)
0.745 (b v.s. c)
36-60 (a) (n = 856)
61-85 (b) (n = 206)
20-35 (c) (n = 75)
Description of the population by univariate logistic regression test using the clinical data
0.304 (a v.s. c)
0.616 (b v.s. c)
N, B, I (a) (n = 484)
S (b) (n = 216)
M (c) (n = 267)
N, B, I, S (n = 741)
M (n = 251)
B, I, S (n = 323)
M (n = 591)
BR1-4 (n = 446)
BR5 (n = 512)
Core biopsy (n = 661)
Others (n = 476)
BCS (n = 341)
TM (n = 795)
SLNB (n = 341)
ALND (n = 775)
Description of the population by univariate logistic regression test using the pathological data
<.001 (a v.s. c)
0.532 (b v.s. c)
Others (a) (n = 138)
Invasive ductal carcinoma (b) (n = 966)
Invasive lobular carcinoma (c) (n = 33)
III (n = 234)
II (n = 431)
I (n = 294)
(-, -), (-, +), (+, -) (n = 583)
(+, +) (n = 545)
+++ (n = 181)
-, +, ++ (n = 668)
Absent (n = 590)
Present (n = 301)
Present (n = 501)
Absent (n = 468)
+ (n = 124)
- (n = 598)
+ (n = 50)
- (n = 658)
Yes (n = 581)
No (n = 552)
No (n = 228)
Yes (n = 909)
T4 (n = 57)
T2 or T3 (n = 595)
Tis or T1 (n = 485)
N3 (n = 81)
N1 or N2 (n = 377)
N0 (n = 677)
0.061 (a v.s. c)
0.391 (b v.s. c)
No (a) (n = 327)
Yes (b) (n = 745)
Abandonment or Refusal (c) (n = 65)
Multivariable logistic regression
Comparison of risk factors calculated using the univariate logistic analysis, multivariable logistic analysis and Bootstrap for variables.
Univariate logistic analysis
Multivariate logistic analysis
Bootstrap for Variables in the Equation
2.646 (c v.s. a)
3.937 (c v.s. a)
1.342 (c.v.s. b)
1.548 (c v.s. b)
N, B, I, S
N1 or N2
(-, -), (-, +), (+, -)
2.558 (c v.s. a)
16.393 (c v.s. b)
Abandonment or Refusal (c)
Goodness of fit
In recent years, several improvements in medical treatment modalities in breast cancer were observed; [4, 17–19] however, the overall prognosis and predictive values for breast cancer patients remains ambiguous [5, 6]. It is important to improve the efficiency of predicting the survival of breast cancer patients; therefore, a model building exercise can be extended to include any number of prognostic or risk factors, while also providing treatment predictions. The TNM classification system has long been the accepted predictive tool for breast cancer and provides useful information for the clinical decision-making process in these patients;  however, this system is based solely on disease-related parameters and does not include diverse variables, including diagnostic methods, which may influence the outcome of patients. Furthermore, a comprehensive predictive model should also take into account treatment modalities, including chemotherapy, hormone therapy or targeted therapy, which are currently either in use or are under study [21–23].
The impact of the race of the individual on the survival of breast cancer patients has been reported [24–26]. To our knowledge, the current study is the first to demonstrate systematically the influence of prognostic factors on the survival of patients with breast cancer in Asian and Pacific Islander populations. Admittedly, the design of the model and the selection of variables should have considerable clinical applicability. The main purpose of our study was to construct a suitable survival prediction formula for Taiwanese women. To create a survival prediction model for breast cancer, we used a comprehensive dataset that included clinico-pathological data, diagnostic modalities, and treatment variables from Taiwanese patients who suffering from this disease.
Our model building exercise showed that age, ultrasound diagnostic classification, mammography diagnostic classification, diagnosis by core biopsy, tumor grade, ER/PR status, lymph node status, and chemotherapy are the most important predictive factors for breast cancer in Taiwanese women. The combination of these risk factors using multivariable logistic regression analysis, led to the development of a predictive formula for breast cancer survival. Our data also draw attention to the importance and influence of diagnostic modalities on breast cancer survival rate. In our model building exercise, the use of ultrasound, mammography, and core biopsy technologies had a high impact on disease outcome.
The prognostic power in a disease context can be improved by applying a predictive model, even when using TNM data or other predictive factors [7, 22, 27]. Burke et al.  demonstrated that the predictive accuracy for breast and colon carcinoma could be improved by using an ANN-based model using TNM information exclusively. Similarly, in the current study we created an additional model for the prediction of survival in patients with breast cancer using data that was more complete than TNM staging information.
The high predictive accuracy of the current model may stem from several factors. First, in other models, investigators often relied strongly on input data that were weighted toward tumor histopathological parameters, rather than toward clinical or demographic patient data [6, 17, 19, 21, 27]. This is in contrast to the current model, in which several parameters, including diagnostic and treatment modalities or demographic data, represented the majority of the selected optimal variable datasets. Second, the current study is the first to use prognostic factors as a predictive tool in Asian breast cancer populations.
Caution should nevertheless be employed when generating and interpreting data using our model building exercise. First, the current model was based on data assembled at a single institution; therefore, the validity of this model should be verified before its application to patients from other populations or institutions. The variability in survival rates observed for breast cancer patients from different countries seems to support this argument [25, 26]. A possible method for overcoming this limitation may be the inclusion of patients from other Asian populations in the construction of a new model. Thus, the identification and evaluation of universally applicable variables may require collaborations between different institutions or nations. Nevertheless, the current pilot study serves as a proof-of-principle strategy that underscores the utility of this model building exercise. Second, the data used here were not established from prospective and randomized studies. If other users wish to adopt our model building exercise for the selection of therapeutic methods, then any variables pertaining to focused treatment methods should be compared with standardized protocols. If treatment variables were included, any result would be biased by case-by-case selection criteria for that particular treatment; [7, 8] therefore, a web-based prediction engine may facilitate its use by clinicians in the future.
We have designed an effective model for predicting outcomes in Taiwanese breast cancer patients by combining demographic, clinical, and pathological data, including multiple tumor-related and patient-related variables. Our model building exercise showed a strong potential to enhance the prediction of patient survival and to identify important variables that have an impact on disease outcomes. Information provided by this model building exercise may improve the selection of appropriate and effective therapy for breast cancer patients.
- Taiwan Cancer Registry. [http://crs.cph.ntu.edu.tw/]
- Bureau of Health Promotion Department of Health, TAIWAN. 2007, [http://www.bhp.doh.gov.tw/BHPnet/Portal/]
- Goldhirsch A, Wood WC, Gelber RD, Coates AS, Thurlimann B, Senn HJ: Progress and promise: highlights of the international expert consensus on the primary therapy of early breast cancer 2007. Ann Oncol. 2007, 18 (7): 1133-1144. 10.1093/annonc/mdm271.View ArticlePubMedGoogle Scholar
- Loprinzi CL, Thome SD: Understanding the utility of adjuvant systemic therapy for primary breast cancer. J Clin Oncol. 2001, 19 (4): 972-979.PubMedGoogle Scholar
- De Laurentiis M, De Placido S, Bianco AR, Clark GM, Ravdin PM: A prognostic model that makes quantitative estimates of probability of relapse for breast cancer patients. Clin Cancer Res. 1999, 5 (12): 4133-4139.PubMedGoogle Scholar
- Hwa HL, Kuo WH, Chang LY, Wang MY, Tung TH, Chang KJ, Hsieh FJ: Prediction of breast cancer and lymph node metastatic status with tumour markers using logistic regression models. J Eval Clin Pract. 2008, 14 (2): 275-280. 10.1111/j.1365-2753.2007.00849.x.View ArticlePubMedGoogle Scholar
- Lundin M, Lundin J, Burke HB, Toikkanen S, Pylkkanen L, Joensuu H: Artificial neural networks applied to survival prediction in breast cancer. Oncology. 1999, 57 (4): 281-286. 10.1159/000012061.View ArticlePubMedGoogle Scholar
- Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL: Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001, 19 (4): 980-991.PubMedGoogle Scholar
- Shapiro S: Screening: assessment of current studies. Cancer. 1994, 74 (1 Suppl): 231-238. 10.1002/cncr.2820741306.View ArticlePubMedGoogle Scholar
- Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005, 365 (9472): 1687-1717. 10.1016/S0140-6736(05)66544-0.
- Klein JP, Moeschberger ML: Survival analysis. 2003, Springer, 2Google Scholar
- Agresti A: Categorical data analysis. 2002, New York: Wiley, 2View ArticleGoogle Scholar
- Davison AC, Hinkley DV: Bootstrap Methods and their Application. 2006, Cambridge: Cambridge university press, 8Google Scholar
- Metz CE, Herman BA, Shen JH: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med. 1998, 17 (9): 1033-1053. 10.1002/(SICI)1097-0258(19980515)17:9<1033::AID-SIM784>3.0.CO;2-Z.View ArticlePubMedGoogle Scholar
- Hosmer DW, Lemeshow S: Applied logistic regression. 1989, New York: WileyGoogle Scholar
- Zhou XH, Obuchowski NA, McClish DK: Statistical method in diagnostic medicine. 2002, New York: Wiley, 100-36.View ArticleGoogle Scholar
- Berg WA, Blume JD, Cormack JB, Mendelson EB, Lehrer D, Bohm-Velez M, Pisano ED, Jong RA, Evans WP, Morton MJ: Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer. JAMA. 2008, 299 (18): 2151-2163. 10.1001/jama.299.18.2151.View ArticlePubMedPubMed CentralGoogle Scholar
- Levine M: Clinical practice guidelines for the care and treatment of breast cancer: adjuvant systemic therapy for node-negative breast cancer (summary of the 2001 update). CMAJ. 2001, 164 (2): 213-PubMedPubMed CentralGoogle Scholar
- Tewari M, Krishnamurthy A, Shukla HS: Predictive markers of response to neoadjuvant chemotherapy in breast cancer. Surg Oncol. 2008Google Scholar
- Sobin LH, Wittekind C: TNM classification of malignant tumor. 2002, Hoboken, New Jersey: John Wiley & Sons, 6Google Scholar
- Weir L, Speers C, D'Yachkova Y, Olivotto IA: Prognostic significance of the number of axillary lymph nodes removed in patients with node-negative breast cancer. J Clin Oncol. 2002, 20 (7): 1793-1799. 10.1200/JCO.2002.07.112.View ArticlePubMedGoogle Scholar
- Whelan T, Sawka C, Levine M, Gafni A, Reyno L, Willan A, Julian J, Dent S, Abu-Zahra H, Chouinard E: Helping patients make informed choices: a randomized trial of a decision aid for adjuvant chemotherapy in lymph node-negative breast cancer. J Natl Cancer Inst. 2003, 95 (8): 581-587. 10.1093/jnci/95.8.581.View ArticlePubMedGoogle Scholar
- Truong PT, Yong CM, Abnousi F, Lee J, Kader HA, Hayashi A, Olivotto IA: Lymphovascular invasion is associated with reduced locoregional control and survival in women with node-negative breast cancer treated with mastectomy and systemic therapy. J Am Coll Surg. 2005, 200 (6): 912-921. 10.1016/j.jamcollsurg.2005.02.010.View ArticlePubMedGoogle Scholar
- Ghafoor A, Jemal A, Ward E, Cokkinides V, Smith R, Thun M: Trends in breast cancer by race and ethnicity. CA Cancer J Clin. 2003, 53 (6): 342-355. 10.3322/canjclin.53.6.342.View ArticlePubMedGoogle Scholar
- Smigal C, Jemal A, Ward E, Cokkinides V, Smith R, Howe HL, Thun M: Trends in breast cancer by race and ethnicity: update 2006. CA Cancer J Clin. 2006, 56 (3): 168-183. 10.3322/canjclin.56.3.168.View ArticlePubMedGoogle Scholar
- Hausauer AK, Keegan TH, Chang ET, Clarke CA: Recent breast cancer trends among Asian/Pacific Islander, Hispanic, and African-American women in the US: changes by tumor subtype. Breast Cancer Res. 2007, 9 (6): R90-10.1186/bcr1839.View ArticlePubMedPubMed CentralGoogle Scholar
- Oldenhuis CN, Oosting SF, Gietema JA, de Vries EG: Prognostic versus predictive value of biomarkers in oncology. Eur J Cancer. 2008, 44 (7): 946-953. 10.1016/j.ejca.2008.03.006.View ArticlePubMedGoogle Scholar
- Burke HB, Goodman PH, Rosen DB, Henson DE, Weinstein JN, Harrell FE, Marks JR, Winchester DP, Bostwick DG: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer. 1997, 79 (4): 857-862. 10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/10/43/prepub