Clinical decision modeling system
© Shi and Lyons-Weiler. 2007
Received: 30 April 2007
Accepted: 13 August 2007
Published: 13 August 2007
Skip to main content
© Shi and Lyons-Weiler. 2007
Received: 30 April 2007
Accepted: 13 August 2007
Published: 13 August 2007
Decision analysis techniques can be applied in complex situations involving uncertainty and the consideration of multiple objectives. Classical decision modeling techniques require elicitation of too many parameter estimates and their conditional (joint) probabilities, and have not therefore been applied to the problem of identifying high-performance, cost-effective combinations of clinical options for diagnosis or treatments where many of the objectives are unknown or even unspecified.
We designed a Java-based software resource, the Clinical Decision Modeling System (CDMS), to implement Naïve Decision Modeling, and provide a use case based on published performance evaluation measures of various strategies for breast and lung cancer detection. Because cost estimates for many of the newer methods are not yet available, we assume equal cost. Our use case reveals numerous potentially high-performance combinations of clinical options for the detection of breast and lung cancer.
Naïve Decision Modeling is a highly practical applied strategy which guides investigators through the process of establishing evidence-based integrative translational clinical research priorities. CDMS is not designed for clinical decision support. Inputs include performance evaluation measures and costs of various clinical options. The software finds trees with expected emergent performance characteristics and average cost per patient that meet stated filtering criteria. Key to the utility of the software is sophisticated graphical elements, including a tree browser, a receiver-operator characteristic surface plot, and a histogram of expected average cost per patient. The analysis pinpoints the potentially most relevant pairs of clinical options ('critical pairs') for which empirical estimates of conditional dependence may be critical. The assumption of independence can be tested with retrospective studies prior to the initiation of clinical trials designed to estimate clinical impact. High-performance combinations of clinical options may exist for breast and lung cancer detection.
The software could be found useful in simplifying the objective-driven planning of complex integrative clinical studies without requiring a multi-attribute utility function, and it could lead to efficient integrative translational clinical study designs that move beyond simple pair wise competitive studies. Collaborators, who traditionally might compete to prioritize their own individual clinical options, can use the software as a common framework and guide to work together to produce increased understanding on the benefits of using alternative clinical combinations to affect strategic and cost-effective clinical workflows.
Classical decision analysis is a well-established field of primarily theoretical analytical inquiry that can shed light either on the optimality of decision in the face of complexity and uncertainty, or a series of decisions for a particular circumstance. Less commonly, it may be used to define a fixed protocol of options to follow as general guidelines. Decision trees are sometimes represented as bifurcating structures where each node represents a particular decision, and the internodes represent paths to secondary decision nodes. Most decision modeling to date in medicine has focused on the problem of identifying optimal decisions of use of new healthcare technology when confronted with alternative (usually mutually exclusive) healthcare interventions. For a recent methodological reviews focused on methods see  Philips et al., 2004, and for an overview of methods and criteria for quality assessment of decision modeling see  Weinstein, 2006.
Model inputs are usually risk preferences derived via expert elicitation (e.g., , Alberdi et al., 2004). In advanced decision modeling, all possible decision trees are represented as a single tree, and algorithms exist (e.g., roll-forward, roll-back) to define an optimal decision path based on the consideration of multiple objectives, the cost and benefit of which are ideally expressed as a common utility function. For a fully enumerated decision analysis, the full joint probability matrix should ideally be specified, but is rarely available, in which case uncertainty can be explored via sensitivity analysis.
In application, decision analysis and decision modeling are often used to develop computer-aided decision support systems within a particular field of biomedical specialization (e.g., radiology). They have rarely been used in studying or defining research priorities for integration of diverse clinical options, or for the study of the integration of new clinical options into existing clinical workflows. The reasons for the lack of advances in modeling integration are practical; modeling clinician-patient dyad preferences ('expert elicitation') is extremely hard, and among-site variance in preferences is high. Models have been proposed that elicit input from both patients and caregivers ( Col, 2005).
How to weigh the same evidence varies from individual to individual. Moreover, the reasoning used to render a particular decision or risk preferences may not in some cases be represented accurately as an easily defined model. Defining a useful common 'currency' in which the cost functions all considerations can be expressed in terms of a utility function can be difficult, especially when many variables influencing decisions must be considered. The construction of multi-attribute utility functions, except in their simplest form, is an arduous process in which few decision makers are willing to participate. Finally, collecting a sufficient amount of data and uniform preferences on all pairs of diverse proposed clinical options becomes intractable, especially when many or newly proposed clinical options are considered.
Djulbegovic et al (2000; ) show how evidence-based medicine (EBM) summary measures derived from population studies can be incorporated into the framework of clinical decision analysis. Such approaches are imminently useful in the goal of clinical decision making with available clinical options. This area is called "clinical decision support" for which numerous academic and commercial resources already exist. In contrast, our focus on clinical decision modeling is for when too many new clinical options have been proposed, as in the case of putative biomarkers for disease detection, and no clear route exists to establishing priorities for integrative evaluative and translational research to determine which combinations of clinical options might receive priority for further research as an integrated set of options within a clinical workflow.
As an aid to defining integrative translational research priorities, our goal is not clinical decision support per se; instead, our goal is to provide a framework for the rationale discussion for clinical research's impact of integrating diverse sources of clinical information. By providing such an underpinning for these discussions, useful and cost-effective combinations can be overtly explored while other, more costly or less effective combinations can be given lower priority. Our motivation is well-founded; indeed, in application, a recent study found that as the complexity of decisions made increases, the use of decision support systems decrease . Thus, the use of classical decision analysis to effect integrative translational research seems unlikely at worst, and challenging at best (but see  Leal et al., 2007 for a practical computing resource that may yield possible exceptions).
We have devised an alternative strategy that we call "naïve decision modeling" (NDM) that accepts the intractability of deriving a fully defined model. Beginning with the most basic elements of risks associated with individual decision options (performance characteristics of clinical options), NDM requires a critically operational, but ultimately testable, assumption of conditional independence among successive clinical options.
It is assumed that the aim of the research enabled by NDM is to define a clinical workflow that integrates a high-performance, cost-effective decision tree for diagnostics that uses ruling-in and ruling-out assays. NDM is not designed for real-time, i.e., dynamic, clinical decision making (e.g., , Housset & Junod, 2003), but rather to derive a general decision tree to be studied as a potential (hypothetical) clinical workflow, with follow-up testing being specified by the outcome (+/-) of the previous test. NDM is designed to facilitate the clinical research study designs needed to establish cost-effective integrated standard-of-care clinical options.
In the first step of NDM, performance evaluation measures of individual clinical options are collected. In the second step, alternative hypothetical combinations of clinical options are then characterized based on their expected performance and cost or any other attribute that can be specified. The resulting combinations are rank-sorted by performance or cost, and then explored manually by experts (e.g., clinicians), who might reject specific combinations of clinical options as unlikely (e.g., unethical) hypothetical clinical workflows. Information on critical pairs of clinical options is derived during the second step. In a third step, the assumption of conditional dependence among critical pairs of clinician-selected clinical options should be tested with empirical (e.g., retrospective) data. The model is then determined to either meet the assumption of conditional independence, or to violate it. If a hypothetical combination is found that meets the assumption of conditional dependence, then further clinical study of that particular combination as fixed clinical workflow may be warranted. If the assumption of conditional independence is violated, then the model may be updated, including estimates of conditional probabilities, from the retrospective study, and the one particular hypothetical workflow re-assessed on the basis of the new information. A new search that uses the revised input can then also be conducted to identify new workflows that may, or may not, be superior to the previously selected near-optimal workflow.
In this paper, we describe our software resource, CDMS, which implements this evidence-based strategy to decision modeling to promote collaborative integrative translational clinical research.
CDMS is a standalone application with a user-friendly graphic interface. Both the application and its interface are implemented in Java. CDMS is provided as an executable jar file. To run it, the user should download a Java Runtime Environment Version 5.0 Update 6 or above. CDMS currently works under Windows XP and 2000 operating systems. However, since Java is a platform-independent language, the application would work under other Java-compatible operating systems as well.
CDMS requires a tab-delimited text file as input and a specific prevalence for the disease or condition being studied. After that, running CDMS is just as simple as pressing a button. All searching results are displayed in the CDMS interface graphically. In addition, CDMS creates two output files. The first files (*.cdms) contains all graphic results objects. The other file is a text output file that is used to record the complete searching details, which can be used in the future for results checking reference, for example, to repeat a previously saved search.
CDMS uses a rooted bifurcating tree structure to represent a clinical workflow. In this representation, the first clinical option is applied to all patients in a clinical setting. At each node, the patient population is divided into test positive and test negative partitions, with subsequent follow-up testing or treatment indicated by subsequent nodes.
Searching among possible combinations is currently restricted to a random tree search algorithm [see Additional file 1]. CDMS searches randomly among possible tree topologies within a user-defined range of size (# of clinical options), and retains the "best" clinical workflows according to user-defined optimality constraints. The random search is strategic, for two reasons: first, near-optimal solutions may be more clinically realistic than computationally guaranteed optimal solutions, and second, an exhaustive search of all possible tree topologies is implausible for very large numbers of clinical options. Felsenstein (2004)  reports that there are 34,459,425 "rooted, bifurcating, labeled trees" for 10 nodes, 8,200,794,532,637,891,559,375 for 20 nodes, 4.9518 × 1038 for 30 nodes, 1.00985 × 1057 for 40 nodes, and 2.75292 × 1076 for 50 nodes. Those numbers are based on assumption that the "left-right order of branching does not make any difference" (Felsenstein, 2004 ). However, this assumption does not hold for decision tree. Thus there are even more possible tree topologies. It is intractable to search all trees to find the global optimal decision tree given large numbers of clinical options. This strategy is undesirable anyway, as clinical researchers may reject the globally optimal tree as unrealistic or unethical.
The total number, and size of tree topologies searched is specified by the user through the Control Panel interface. In the future, various tree-searching heuristics and the branch-and-bound algorithm may be added, but it will be important to retain near-optimal trees as the main application of CDMS is to facilitate the visual exploration of alternative hypothetical cost-effective combinations of clinical options, and not necessarily to discover the set of globally optimal combinations.
The retained results of a given search are presented in graphical and tabular form in various tabs. CDMS evaluates the performance of a tree or a clinical workflow by calculating its Emergent Expected Sensitivity (EESN), Emergent Expected Specificity (EESP), and Expected Overall Cost per Patient (EOCPP). The calculations of these terms are provided in the appendix A1 [see Additional file 1]. CDMS records the best performance tree topology found among all trees searched and presents it graphically in the Optimal Tree Topology tab. A summary of the results from a given tree search is also reported in the TreeSearching Summary tab. The summary includes the information such as the total number of tree topologies searched, number of tree topologies that satisfy the performance and/or cost constraints.
A contour plot is displayed in the ROC Contour Plot tab. The ROC Contour Plot displays the counts (frequencies) of the EESN and 1-EESP among all tree topologies searched based on their combined performance values of EESN and 1-EESP.
CDMS displays the average cost per patient distribution of all trees searched in a histogram. Both the contour plot and histogram can be displayed on logarithmic scales. The user can do this by mousing-over the legend and right-clicking the mouse button.
Perhaps the most useful component of the output of CDMS is the Tree Browser tab. In the tree browser, the tree topologies that satisfy both performance and cost constraints are listed in the left side of the browser. Each tree topology is represented by its rank, its associated confusion matrix, and other specific scores such as its EESN, EESP, Emergent Expected Achieved Classification Error (EEACE), EOCPP, and tree size. The display includes a button to view the tree, and a button to reject a given tree. All rejected tree topologies are moved to the Rejected Trees sub tab, and can be restored from the rejected tree list.
The results from a particular search can be saved to disk to allow the user to retrieve and view them later. The user can print all the graphical objects displayed in the interface, including any displayed tree topology.
The input values for clinical options for breast cancer detection derived from a literature search
Proposed Diagnostic Tests/Assays
Magnetic Resonance Imaging (MRI)
Imbriaco et al. 2001 
Electrical Impedance Scanning
Stojadinovic et al., 2006 
electronic resonance spectroscopy of albumin configuration
Seidel et al., 2005 
Klein et al., 2002 
Liang et al., 2002 
mammaglobin (cutpoint 8.8)
Bernstein et al., 2005
SELDI (serum), CART, two surfaces
Vlahou et al., 2003 
scintimammography on px w/suspicious breast mass
Polan et al., 2001 
NAS bFGF, race, menopausal status + PSA
Hsiung et al., 2002 
cytology alone, breast ductal lavage cells
Zhang et al., 2006 
G-actin biomarker, breast ductal lavage cells
Zhang et al., 2006 
DNA5cER biomarker, breast ductal lavage cells
Zhang et al., 2006 
The input values for clinical options for lung cancer detection derived from a literature and internet search
Newland Biotech 
Newland Biotech 
Newland Biotech 
Yang et al., 2005 
Tarro et al., 2005 
Tarro et al., 2005 
Tarro et al., 2005 
Tarro et al., 2005 
Bazhin et al., 2004 
Hirsch et al., 2001 
Hirsch et al., 2001 
methylation in at least 1 of 5 tumor suppressor genes
Fujiwara et al., 2005 
CT-guided fine-needle aspirate biopsy
Wallace et al., 2002 
x-ray determinate only
Gavelli & Giampalma, 2000 
Gavelli & Giampalma, 2000 
All other parameters in this part have default values that the user can modify to suit their study. For example, the user can increase or decrease the values of the population and the number of tree topologies to search.
The user also can select one clinical option as the fixed root node of the tree by checking the Fixed Root Node check box and selecting an option from the drop down list. When this option is activated, CDMS will only search trees that begin with the same clinical option as the first.
The default number of tree topologies to search for all levels of clinical options is 1,000,000. To prevent excessive run times, CDMS has an option to allow the user to intervene at any time prior to completion of the search process. Results generated to that point are shown.
The user starts the random tree searching process by pressing the Run button in the Control Panel dialog. A progress bar appears and shows the progression of the random tree searching process. The searching process is saved to the *.txt file of same name as the *.cdms file.
The Contour Plot tab (Figure 4(b)) shows the performance distribution of all 1,000,000 trees searched. Performance measurements for the contour plot are the paired values of EESN and 1-EESP within 2-dimensional bins. For example, if one tree has an EESN = 0.9 and an EESP = 0.85, the coordination in the contour plot is (0.9, 0.15). The contour line in the plot represents a threshold. For example, the contour line of 0 represents 10,000.
All subsequent results are highly dependent on the accuracy and precision of the input performance evaluation, cost and other parameter estimates. Studies should therefore be screened for possible biases, and sensitivity analysis can be conducted to assess the impact of potentially optimistic estimates.
The most obvious application of CDMS is the exploration of putative combinations of clinical options for diagnostics. In this application, the idea is to perform a search under the naïve assumption of conditional dependence to minimize searching for pairs or sets of tests for which joint probabilities are needed. Under the assumption of conditional independence, many clinical combinations are likely to be highly optimistic. Importantly, the introduction of conditional dependence however will only lower the EESN or EESP. The exploration of the robustness of specific workflows to conditional dependence can be explored empirically, and acceptable levels of conditional dependence can be determined prior to data collection. In the future, CDMS will allow the user to upload a sparse matrix of conditional probabilities so the calculation of EESN and EESP can be readily modified dynamically using empirically derived conditional probabilities during the tree search as needed.
While the use cases we provide are cost-neutral (assume equal cost of all clinical options, the NDM method implemented by the CDMS software is capable of considering user-provided estimates of cost. Indeed, the default data input format requires cost estimates. It may be useful run a preliminary cost-neutral analysis to determine whether, even under the best-case assumption of conditional independence, any clinically acceptable high-performance combinations exist given the available clinical options. To conduct cost-neutral analyses, the user can specify any common cost for all clinical options (in our use cases, $100).
There are numerous benefits to conducting a random tree search. First, it provides a rapid answer to the question of whether any combinations exist that are high-performance; i.e., it answers the question "Does a population of high-performance, cost-effective putative clinical workflows exist?". Second, it allows the manual exploration of near-optimal trees. Leal et al  cite a "gap in the literature (that exists) between theoretical elicitation techniques and tools that can be used in applied decision-analytic models". Our approach places the experts (or a committee of experts), in the position of applying their preferences to entire competing decision models based on any number of attributes, both formally included in the valuation, and those inherent to a proposed series of successive clinical steps. Finding the global optimal solution may also not be desirable. In many cases, the theoretically optimal trees may be clinically unacceptable because they are considered impractical, or unethical. As more information about each of the diverse newly proposed tests become incorporated as criteria (e.g., 'risk to of harm to patient'), successive updates to the model searches will become more refined.
The main core of the NDM search strategy implemented by CDMS is, by design, random tree searching. In the future, additional options that increase overall utility will be added. These include options for automating parameter sensitivity analysis, and it may also include the capability to conduct some critical aspects of standard decision modeling. We view the integrative framework outlined in  as a very promising direction to implement our strategy so that each tree search result can be the product of multiattribute functions. For the time being, we foresee applications of the CDMS in the search for ways to integrate diverse sources of clinical data in a manner that allows clinicians to weigh in and discuss and debate their rationale for rejecting specific putative potential workflows, and to identify the critical missing types of information required to finalize decisions needed for highly integrative clinical studies.
Decisions to adopt new clinical options for patient diagnosis and treatment usually follow a hierarchy within an organization, and numerous real-life factors are taken into account. We envision that CDMS might provide impetus for the adoption of clinical options that, when considered in isolation, might not be adopted due to these other factors. Decision-makers at the highest levels in medical research institutions are encouraged to adopt CDMS, and to undertake the team-building exercise of decision modeling. NDM makes the process simple, makes all of the details of all of the factors explicit, and, most importantly, can allow clinical research teams to state the problem of adoption in terms of testable hypotheses (e.g., 'the adoption of clinical workflow x will result in a SN of at least 0.8 and SP of at least 0.90 at a per-patient cost of at most $1300US'), where the hypotheses are based on evidence that the critical pairs of clinical options are, in fact, conditionally independent. This type of research might prove more amenable to expediting translational integration than the traditional 1 vs. 1 (option x vs. option y) comparisons.
The CDMS software can be used to study the integration of thousands of clinical options; the scalability is limited only by the RAM of the computer used. If the user wishes to consider topologies that include many clinical options, viewing the entire tree may be problematic. In practice, however, most users will likely restrict their consideration to workflows with a reasonable number of options per tree, even when the number of possible clinical options is very large.
The CDMS software can be used on numerous computational platforms. NDM is a general framework that can be applied to various types of problems in biomedicine, including, for example, integrative diagnostics (as in our use cases), or drug therapy studies when there are multiple choices with conflicting evidence. Modeling the efficacy of various drugs in combination, however, should consider nonlinear dependencies. While CDMS does not yet permit such higher-order dependencies among the clinical options, it could be found useful in helping to focus consideration of alternative combinations of treatments, and their order, considering factors such as cost and accumulated risks associated with negative side-effects.
It should be recalled that any decision modeling exercise, however implemented, can only ever produce hypotheses that must be tested with empirical data. It is our hope that improvements in the integration of biomarkers for the clinical diagnostics for cancer and other debilitating diseases will be found using CDMS, tested via retrospective studies, updated as needed (for example, as Bayesian networks), and most importantly validated via prospective clinical studies where the decision model is the instrument tested, as a workflow.
Project name: Clinical Decision Modeling System
Project home page: http://www.bioinformatics.pitt.edu/software/cdms
Operating system(s): Platform independent. Current version tested on Microsoft Windows XP and 2000.
Programming language: Java
Other requirements: Java JDK or JRE1.5.6 or higher
License: (C) 2007 The University of Pittsburgh, All Rights Reserved.
Any restrictions to use by non-academics: For commercial licensing, contact The University of Pittsburgh Office of Technology Management (Brian Copple or Marc Malandro, Tel: 412-648-2208).
Clinical Decision Modeling System
Receiver Operating Characteristic
Naïve Decision Modeling
Emergent Expected Sensitivity
Emergent Expected Specificity
Expected Overall Cost Per Patient
Achieved Classification Error
Emergent Expected Achieved Classification Error
Surveillance Epidemiology and End Results
We would like to thank Dr. Milos Hauskrecht, Dr. Roger Day, and Dr. David C. Whitcomb for discussions on decision modeling.
This publication was made possible by Grant Number 1 UL1 RR024153 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.
Information on NCRR is available at .
Information on Re-engineering the Clinical Research Enterprise can be obtained from .
This study was also partly funded by the University of Pittsburgh Cancer Center Support Grant (NCI-P30-CA047904, 5P30CA47904 (Herberman, PI)) and by the NCI's Lung SPORE grant to the University of Pittsburgh Cancer Institute (P50CA90440; Siegfried, PI). We thank Dr. Ronald Herberman for his support. We also would like to thank Rick Jordan for his proofreading the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.