Measuring decision quality: psychometric evaluation of a new instrument for breast cancer surgery

Background The purpose of this paper is to examine the acceptability, feasibility, reliability and validity of a new decision quality instrument that assesses the extent to which patients are informed and receive treatments that match their goals. Methods Cross-sectional mail survey of recent breast cancer survivors, providers and healthy controls and a retest survey of survivors. The decision quality instrument includes knowledge questions and a set of goals, and results in two scores: a breast cancer surgery knowledge score and a concordance score, which reflects the percentage of patients who received treatments that match their goals. Hypotheses related to acceptability, feasibility, discriminant validity, content validity, predictive validity and retest reliability of the survey instrument were examined. Results We had responses from 440 eligible patients, 88 providers and 35 healthy controls. The decision quality instrument was feasible to implement in this study, with low missing data. The knowledge score had good retest reliability (intraclass correlation coefficient = 0.70) and discriminated between providers and patients (mean difference 35%, p < 0.001). The majority of providers felt that the knowledge items covered content that was essential for the decision. Five of the 6 treatment goals met targets for content validity. The five goals had moderate to strong retest reliability (0.64 to 0.87). The concordance score was 89%, indicating that a majority had treatments concordant with that predicted by their goals. Patients who had concordant treatment had similar levels of confidence and regret as those who did not. Conclusions The decision quality instrument met the criteria of feasibility, reliability, discriminant and content validity in this sample. Additional research to examine performance of the instrument in prospective studies and more diverse populations is needed.


Background
The majority of patients diagnosed with early stage breast cancer are eligible to decide between having a mastectomy or lumpectomy with radiation (breast conserving therapy), yet this decision can be very challenging. Guidelines and consensus statements for breast cancer treatment emphasize the equivalence of mastectomy and breast conserving therapy for survival [1][2][3]. However, these options differ on other dimensions that matter to patients, particularly regarding appearance, need for radiation, likelihood of re-excision and risk of local recurrence.
After many years of increasing breast conserving surgery rates, several recent studies suggest that mastectomy rates are rising [4,5]. Proposed reasons for the rise have focused on different factors such as increased use of MRI imaging, younger patient age and patient preference. Rates of breast conserving surgery or rates of mastectomy have been proposed as quality measures for breast surgery; however, both of these measures have been found to be lacking [6,7]. In this situation, where survival is equivalent, the "best" treatment is dependent upon how each individual patient weighs the other factors, including cosmetic results, likelihood of recurrence and concerns about radiation. A measure that reports on high or low utilization does not provide any evidence that the right procedure is being used on the right patient.
For many situations, including surgery for early breast cancer, the appropriateness of surgery cannot be determined solely based on clinical factors [8,9]. An international consensus process including clinicians, patients and decision making experts endorsed a definition of decision quality that focuses on two key areas: (1) the extent to which patients are informed and (2) the extent to which treatments match what is most important to patients [10]. This definition of decision quality was taken as the basis for the development of a survey instrument to measure patients' knowledge and the match between their goals and type of breast cancer surgery received.
In breast cancer, a few studies have used knowledge questionnaires to assess how informed breast cancer patients were about the surgery decision [11][12][13][14]. Other studies have examined patients' goals and found that cosmetic concerns and concerns about recurrence play a big role in patients' decisions, in addition to fear of radiation and desire to follow the doctors' recommendation [15][16][17][18][19]. Each of these studies used different survey instruments and none of the studies examined both knowledge and extent to which patients' goals were aligned with treatment. In summary, there is a lack of comprehensive, well validated instruments to measure the quality of surgical decisions in breast cancer.
To address this gap, a patient-reported survey instrument for early stage invasive breast cancer surgery decisions was developed to assess the extent to which patients were informed and received treatments that matched their goals. It is important that patient reported surveys demonstrate both strong psychometric properties (e.g., reliability and validity) and clinical sensibility (e.g., acceptability and feasibility) [20]. In this study, we examined performance of the Breast Cancer Surgery Decision Quality Instrument (BCS-DQI) along these criteria using three study samples: (1) breast cancer patients who had made a surgical treatment decision two or three years prior to the survey (2) breast cancer health care providers and (3) a group of healthy controls who had never had breast cancer. These samples provide complementary data on the performance of the instrument. The retrospective patient sample provided an "experienced" sample to evaluate the items and to examine stability of the responses. The provider sample was used to examine content validity and discriminant validity of the knowledge score. The healthy control sample was also used to examine discriminant validity of the knowledge score.

Approach and instrument development
The BCS-DQI has two decision-specific sections: (1) a set of knowledge questions to determine whether patients are informed and (2) a set of goals and concerns that are used in a model to determine the concordance, or extent to which patients receive treatments that reflected what is most important to them.
The definition of decision quality used to guide the instrument was validated by the International Patient Decision Aid Standards group [10]. The approach to instrument development was based on a conceptual framework of shared decision making by Mulley [21] and extended by Sepucha and Mulley [22] and Sepucha et al [23]. This framework takes a systems view of decision making and draws from both normative and behavioral decision theories. Both theories assume that people are goal-directed and will make choices that are in their best interest. Normative theories have fairly strict assumptions of pure rationality, whereas our approach draws from behavioural theories that recognize limitations (cognitive capacity, competing interests, biases and heuristics) that may result in deviations from rational choice. In addition, normative theories only consider utilities for health states as appropriate influences on choices, we have relaxed this to include other factors (e.g. the recovery time or number of surgeries) that may also appropriately influence choices. These factors are more generally referred to as "goals and concerns." The process for developing the BCS-DQI and preliminary validation work has been described in detail elsewhere [24,25]. The key content for the items was derived from the conceptual framework with significant input from breast cancer survivors and multidisciplinary group of clinical experts. The content of the knowledge items and goals was selected to cover core areas, but it was not meant to be exhaustive. For example, we did not include all factors that might influence patients' preferred treatment, such as spouse's goals, and instead focused on how patients felt about key tradeoffs for the two main options. Experts in survey research drafted multiple choice and open-ended items that covered the content. The draft surveys were subjected to cognitive testing with breast cancer survivors (n = 6), during which respondents completed the instrument while talking out loud to explain their thinking and interpretation of the instructions, items, and responses. Based on those results, minor revisions were made to improve acceptability and comprehension of the instruments.

Sample and procedures
Adult women patients diagnosed in 2005 or 2006 with early stage (Stage 1 or 2) breast cancer were identified through cancer registries at four participating sites.
Exclusion criteria (e.g. bilateral breast cancer, recurrent breast cancer, prior radiation) were set to ensure that the majority of patients were clinically eligible for both mastectomy and lumpectomy. Each patient's treating physician had to approve contact. All patients received the study packet in the mail, and non-respondents received a reminder phone call after two weeks and a reminder packet after four weeks. Respondents received a small incentive with the initial packet and a small compensation (e.g., book of stamps) for completing the survey. A subset of patient respondents across all sites was sent a second, identical survey by mail approximately four weeks after responding, to assess test-retest reliability.
All health care professionals (surgeons, radiation oncologists, medical oncologists, plastic surgeons and oncology nurses) who work with breast cancer patients at the participating sites were invited to complete the knowledge portion of the decision quality instruments. Providers received the survey by mail. Non-responders received an email reminder after two weeks and a reminder mailing after four weeks. Provider respondents received $50 cash or a gift card upon completion of the survey.
Healthy controls who did not have a history of breast cancer or a first degree relative (mother, sister or daughter) with breast cancer and who were not health care providers, were recruited through email bulletins sent to all employees at one of the sites. Women were screened by phone and eligible women were mailed the surveys. Non-respondents received a reminder phone call after two weeks and a reminder mailing after four weeks. Respondents received $10 for completing the survey.
The study was approved by and conducted in accordance with the policies set out by the participating sites' Institutional Review Boards.

Measures
Data on treatment stage was taken from the cancer registry.

Patient survey
Patients reported standard demographics and breast cancer treatments.
Breast Cancer Surgery Decision Quality Instrument (BCS-DQI): included two main parts (1) 13 multiple choice and 2 open-ended knowledge items and (2) 8 goals and concerns rated on an 11-point scale from 0 (not at all important) to 10 (extremely important).
Other items were included in the patient survey to assess the following aspects of the decision: Top three goals and concerns: patients selected their top three goals and concerns from those included in the BCS-DQI.
Patients preferred treatment: a single item asked, "Which treatment was your personal preference?" with possible responses of Mastectomy, Lumpectomy and Radiation or I am not sure. Treatment received: was defined as the final surgical treatment received. Decision confidence: assessed with one item, "On a scale of 0 to 10 where 0 is not at all confident and 10 is extremely confident, how confident are you that the decision about breast surgery was the right one for you?" Decision regret: assessed with one item, "If you had the chance to make the decision again, would you have the same type of surgery? Definitely yes, probably yes, not sure, probably no, definitely no".

Provider survey
Providers completed the knowledge portion of the BCS-DQI, and indicated the importance of each knowledge item on a scale of 1 (not important) to 4 (essential) and how well the overall set of items covered the essential information on a scale of 1 (not at all well) to 4 (extremely well).

Healthy control survey
Participants completed the knowledge portion of the BCS-DQI and some demographic items.

Sample evaluation
First, the sample characteristics were compiled to see how well they matched the sampling plan. Two-sample t-tests and chi-square tests were used to compare the patient characteristics between responders and nonresponders. Chi-square tests were used to compare the response rate among sites. We also examined the interaction between patient characteristics and response rate by site. Additionally, two-sample t-tests and chi-square tests were used to compare demographics between patients and healthy volunteers. Next, the item nonresponse was examined.

Item retention and deletion
Items were examined for issues such as difficulty (e.g. knowledge items that were too easy, where healthy volunteers scored >80% or too hard, where provider average scores <50%), problematic format (if >5% had problems such as two responses check off when only one was expected), redundancy (items that had very high >0.8 inter-item correlation), and floor or ceiling effects (scores at top or bottom of the range for knowledge scores or goals). [26,27] The data were discussed by a steering group of six people that included experts in survey research, decision sciences and clinical experts in breast cancer (including three of the authors KS, CL, CC). Problematic items were deleted or revised.

BCS-DQI knowledge score
Each item received one point for correct response. A mean knowledge score was standardized by dividing the number of correct responses by the number of items, resulting in scores ranging from 0% to 100%. Openended items were scored correct if they fell within a range that was pre-determined by medical experts based on clinical evidence. Items with multiple parts were weighted to sum to one. The response, "I am not sure," was considered incorrect. We scored missing responses as 1/k where k was the number of possible responses to the item (essentially equivalent to guessing). A knowledge score was calculated for every respondent who completed at least 50% of the items.

BCS-DQI concordance score
There is no standard approach to measuring concordance between patients' goals and treatments [28]. The approach follows that used by Barry et al (1995) to examine the extent to which patients' goals are associated with treatments [29]. We examined whether having mastectomy was associated with stage of disease and each of the goals both in univariate analyses, using t-tests for continuous variables and Chi-squared tests for categorical variables, and in multivariable analysis using a logistic regression model with treatment (mastectomy vs. lumpectomy) as the dependent variable. For the regression model, the missing responses from the goal items were imputed from the other available goal items using the EM algorithm [30].
We used the regression model to determine the model predicted probability of mastectomy for each patient. Patients with a predicted probability ≥0.5 who had mastectomy and those with a predicted probability <0.5 who had lumpectomy, were classified as concordant. A summary concordance score was calculated that indicated the percentage of patients whose decisions "matched" their goals. The score reflects how well the model fits the observed data. The interpretation of the score is at the group level. Groups with higher scores are better at matching treatments to patients' goals. Patients who had lumpectomy followed by mastectomy were excluded from the concordance model as they were not clinically eligible for both options. Clinical considerations (e.g. inability to get clear margins) appropriately guided treatment for these patients, as a result, these patients might look like their goals are mismatched from their final treatment, mastectomy.

BCS-DQI screener
A short version of the DQI that includes 5 of the knowledge items and 4 of the goals was also created and scored in comparable way to the longer version. All of the goals that were significant in the multivariate concordance model are included in the screener version as well as concerns about reducing chance of having cancer come back in the breast.

Acceptability and feasibility
Acceptability was examined using length of time to complete the instrument, which was self-reported by patients, and response rates. Feasibility was examined using rates of missing data, with any item with more than 5% missing responses considered problematic and subject to revision. We examined association between average time to complete and missing responses by education level using ANOVA.

Assessment of reliability
Reproducibility, or test-retest reliability, for the full knowledge score and the screener knowledge scores and the individual goals was calculated using intraclass correlation coefficient (ICC). The a priori target was to exceed 0.7. [27] The knowledge items do not draw from a single underlying construct; as a result, we do not report Cronbach's alpha or internal consistency.

Assessment of validity
There is no gold standard for knowledge score, goals and concerns, or concordance score. We tested several hypotheses, developed a priori, to provide evidence of validity: (1)Content validity hypotheses: a. For knowledge: More than 70% of providers will report that the knowledge items covered the key content very or extremely well. b. For goals and concerns: At least 20% of patients will include the item as one of their top three issues for the decision.
(2)Discriminant validity hypotheses: a. Providers would be more knowledgeable than patients, who in turn would be more knowledgeable than the healthy controls (tested using analysis of variance (ANOVA) with planned comparisons). This was tested for the full version and the screener version. b. The individual goals and concerns will discriminate significantly between those who have mastectomy and those who have lumpectomy (tested using one-sided t-tests). For example, patients who rate the importance of "avoiding radiation" highly should be more likely to have mastectomy (because that option provides the highest chance of avoiding radiation). c. Patients who stated a preference for mastectomy will have higher model predicted probabilities than those who were unsure, who in turn will have higher predicted probabilities than those who had a preference for lumpectomy and radiation (tested using ANOVA with planned comparisons).
a. Patients who received treatment that matched that predicted by the regression model will have higher confidence and less regret than those who did not (tested using two sided t-test and Chisquare test).

Response rates and sample
The overall patient response rate was 60%. The eligible patient sample is described in Table 1. Responders only differed from non-responders on two factors, race/ethnicity and site. Responders were more likely to be white than non-responders (85% vs. 71%, p < 0.001). Response rates also varied significantly by site (ranged from 49% to 70%, p < 0.001). We examined respondent characteristics by site, and only one site had signficant interactions. At that site, non-white race (37% vs. 61%, p = 0.001) and having hormonal therapy (39% vs. 57% for those who did and did not have hormonal therapy respectively, p = 0.02) were associated with a lower response rate. A majority of providers, 88/116 (79%), completed the survey. The provider sample was on average 45 years old (SD 9), mostly female (65%), and in practice for 15 years (SD 11). The provider sample was 33% medical oncologists, 30% nurses (including oncology nurses, nurse practitioners, and registered nurses), 17% general surgeons and surgical oncologists, 8% radiation oncologists and 8% plastic surgeons. The response rate for the healthy controls was 100%. Healthy controls were significantly younger than the patient sample (mean age 42 versus 57, p < 0.001) but did not vary significantly from the patient sample on any other demographic variables.

Item retention and deletion
The total knowledge scores ranged from 0-100% and there was no evidence of a floor or ceiling effect, i.e. scores were not concentrated at the bottom or top of the scale (see Figure 1). Two knowledge items were deleted for being too easy and one was deleted for negative item to total correlation. The reduced set of items was used in the remaining analyses. For each goal and concern, the responses spanned the entire range of possibilities (0 to 10). One item, "avoid cancer coming back in the breast," had evidence of a ceiling effect as 88% of respondents marked 10. All of the goals were used in the analyses.

Acceptability and feasibility
The BCS-DQI took an average of 6 minutes for patients to complete (range 1.4-19.0 minutes). The time to complete was associated with education level, as patients with high school degree or less took longer than those with some college and those with college degree or more (6.8 vs. 6.4 vs. 5.3 minutes respectively, p < 0.001). The knowledge items had an average of 1.4% missing responses (range 0.2-2.5%). Four respondents (0.9%) did not complete enough of the knowledge items to calculate a total knowledge score. The goals and concerns had a median of 3.3% missing responses (range 0.9-10.9%). One goal was an outlier for missing responses, "remove breast for peace of mind," with 10.9% missing. The majority of those who skipped it (94%) had had a lumpectomy. The number of missing responses was not associated with education level (p = 0.62).

Reliability
The retest reliability of the knowledge score was intraclass correlation coefficient (ICC) = 0.70. The short (5-item) version of the knowledge score had lower reliability (ICC = 0.60). The retest reliability of the goals were: keep breast ICC = 0.87, remove breast ICC = 0.83, avoid radiation ICC = 0.76, avoid cancer coming back in the breast ICC = 0.66, avoid serious side effects of radiation ICC = 0.64, and avoid hassle of radiation ICC = 0.61.

Validity of knowledge scores
The majority of providers (76.1%) felt that the set of items covered the key facts very or extremely well supporting content validity.
Patients' mean total knowledge score was 52.7% (SD 21.8). Providers were significantly more informed than patients (87.7% vs. 52.7%, p < 0.001). Patients were more informed than healthy controls, but the difference was not significant (52.7% vs. 49.3%, p = 0.28). Figure 1 shows the overall distributions and Table 2 shows responses for selected knowledge items.
The screener knowledge score (5-item) version was highly correlated with the total knowledge score (Pearson =0.80). It also had similar results with providers being significantly more informed than patients (93.6% vs. 60.7%, p < 0.001) and with patients having similar knowledge to the healthy controls (60.7% vs. 57.7%, p < 0.001).

Validity of concordance score
Only one goal, avoiding the hassle of radiation did not meet the a priori cutoff of 20% for content validity (19% Total knowledge score (%) Frequency Figure 1 Knowledge score distributions for patients, providers and healthy controls.
of lumpectomy patients and 12% of mastectomy patients selected it in their top three). All other goals had 30% or more patients select it as one of their top three issues.
In univariate analyses, five of the six goals significantly discriminated between the surgical options in the expected direction. In the multivariable logistic regression model, three goals were significantly associated with receipt of mastectomy: desire to keep breast, remove breast for peace of mind, and avoid radiation (see Table 3). The logistic regression model had a c statistic of 0.95, indicating very good predictive accuracy of the model. The concordance score, or percentage of patients who got the treatment predicted by the model, was 89%. Concordance did vary by treatment, as 82% of mastectomy patients were predicted correctly, compared to 93% of lumpectomy patients (p = 0.007).
The model discriminated well among patients who stated a preference for mastectomy, those who were unsure, and those who preferred lumpectomy (model predicted probabilities of 0.70 vs. 0.30 vs. 0.08, respectively, p < 0.0001 for all comparisons). Patients who "matched" had similarly high levels of confidence in their decision (9.12 vs. 9.07, p = 0.86) and were just as likely to want to do the same thing again (91% vs. 85%, p = 0.13) compared to those who did not match.

Discussion
The purpose of the Breast Cancer Surgery Decision Quality Instrument is to provide a comprehensive assessment of the extent to which patients make informed decision and receive treatments that match their goals. The knowledge score had good retest reliability and was able to discriminate between providers and patients. The concordance score provides an assessment of how well treatments matched patients' goals. The goals used in calculating the concordance score had good retest reliability. Patients were able to complete the BCS-DQI on their own, with low rates of missing items indicating that it was feasible. The knowledge items cover content that the majority of providers feel is very important for patients to know before making a decision. Patients had some gaps in knowledge, e.g. only 57% correctly answering that survival is the same for mastectomy and lumpectomy with radiation. It was surprising that the knowledge scores of the healthy controls were similar to those of the breast cancer survivors. This result could be due to the fact that the controls were more knowledgeable generally (due to working in health care environment), or that survivors' breast cancer knowledge had decreased over time. Other studies have found similarly low levels of knowledge for patients surveyed much closer to the time of diagnosis, which suggests that timing may not fully explain the lack of knowledge for this sample [14,24].
Three of the goals and concerns, importance of keeping the breast, removing the breast for peace of mind, and avoiding radiation, met the criteria for retest reliability and also discriminated between the options. The first two goals may appear to be simply two ends of the same issue, and although they are negatively correlated, they are not redundant (Pearson = −0.46). Similar concerns have been found to be related to treatment choices in other breast cancer studies [16,17,19,31]. Clinicians seeking to elicit patients' preferences should, at a minimum, discuss how patients feel about keeping or losing their breast and how they feel about radiation.
The concordance score was high, indicating that most patients received treatments that matched their goals. Patients who preferred mastectomy were somewhat less likely to receive it (82%) compared to those who preferred lumpectomy (92%). Contrary to our hypotheses, we did not find evidence that respondents who received treatments that matched their goals had higher confidence or less decisional regret. This was possibly due to a ceiling effect with these items, as all patients had very high confidence and low regret. Several studies have found a positive relationship between decision making processes, such as being offered a choice of breast surgery and matching preferred level of involvement in decision making, with health outcomes including body image, psychological adjustment, and satisfaction [18,19,32,33]. It will be important to examine associated between concordance and these other outcomes in prospective studies.
The definition for concordance used here requires that the treatments received match patients' goals. There are a growing number of studies that are reporting this metric, although the studies define and measure concordance differently [28]. The multidimensional measure of informed choice is one approach that combines knowledge and value concordance into a single measure [34]. It was developed to measure the quality of decision about prenatal testing and has been adapted for use in genetic testing for cancer [34][35][36]. The ability to reliably document that the treatments received, or the care delivered, reflects patients' goals, needs and wants will be important.
The BCS-DQI was designed to audit the quality of decisions and to compare performance of providers or breast cancer centers on how well they inform their patients and how well they tailor treatments to patients' goals. Clinicians have requested a short version that could be used in routine practice as a screening tool to assess patients' knowledge and goals before the visit so that they could address any gaps. A short version, the BCS-DQI Screener, includes 5 knowledge items and 5 goals. Since there are fewer items, it might miss knowledge gaps or key goals that patients have. The purpose is to stimulate conversation between patients and providers about options, outcomes and goals, but not to limit content to only those items included. Collins et al. (2009) surveyed newly diagnosed patients with an earlier version of the BCS-DQI Screener after patients had viewed a decision aid and before they saw their surgeon. They found high patient knowledge scores, for example, 98% of respondents answered the question about survival equivalence of the treatments correctly compared to 57% from the retrospective sample reported here [37]. In general, the patients' knowledge scores in their study (86%) were comparable to the providers' scores (87%). Further, in the Collins study, the same three goals were significant in the multivariable treatment model, suggesting the concordance model may hold for newly diagnosed patients who are actively making the treatment decision [37].
The current study has several limitations that should be noted. First, it was a retrospective study and patients were surveyed about 2 ½ years after the decision during which time their knowledge, goals and concerns likely changed. The survey relies on patient report and may not fully reflect what information was conveyed by clinicians during the decision making process. The sample was from four academic, National Cancer Institute-designated comprehensive cancer centers, and the performance may be different in community settings. Non-white patients had a lower response rate than white patients which raise questions about acceptability across diverse populations. Several studies have documented lower response rates and lower participation in research studies for non-white participants [38,39]. Further, the content validity was established on the full set of items and it is possible that the reduced set and the 5-item screener would not be reviewed as highly. It will be important to further test the BCS-DQI in order to understand the acceptability and performance of the instrument across more diverse populations and practice settings, and with patients at the time of the decision.

Conclusions
The pressure on hospitals and health care providers to justify surgical treatments will continue to mount. Having a well validated, feasible survey instrument that can provide evidence that patients were informed and that the treatment reflected patients' goals will be an important part of efforts to document quality of breast care. The BCS-DQI met several key criteria for high quality, patient-reported survey instruments including feasibility, retest reliability, discriminant validity and content validity, and may be useful in assessing the quality of breast cancer surgical decisions.