The underlying algorithms of the CDSS
The group first conducted literature reviews of the potential CRC risk factors, CRC risk score calculation, and CRC screening approaches. We adopt the absolute risk score calculation model [8] for the CRC risk score calculation in our CDSS. It used population-based case-control studies as source data to train a prediction model for estimating the risk of developing CRC in a certain period (e.g., 10 or 20 years). In this model, the projected probability will be the absolute risk score with a confidence interval of 95%. Eq. 1 summarizes the primary components of their model:
$$ absolute\ risk={f}_1\left( relative\ risk\ parameters\right)+{f}_2\left( age\ specific\ cancer\ hazards\right)+{f}_3\left( attributable\ risk s\right) $$
The detailed mathematic model and risk factor coefficients have been explained in Freedman’s report [8]. The relative risk parameters are estimated from population-based case-control data. Sample risk factors include the numbers of relatives with CRC, the patient physical activity, smoking habit, diet preference, body mass index, and others. The f1() function will calculate the relative risk based on tumor sites, including the proximal (cecum through transverse colon), distal (splenic flexure, descending, and sigmoid colon), and rectal (rectosigmoid junction and rectum) tumor sites. The f2() is a function to predict the CRC risk based on different ages and risk factor profiles. The f3() function will assess the attributable risks from the case-control data., The baseline age-specific cancer hazards and attributable risks are all estimated from the case-control data. The final CRC absolute risk predicted by this model combines the three absolute risks (proximal, distal, rectal) and risks of competing causes of death other than CRC. A SAS Macro program which implemented the proposed model is publicly available online. This program eased our effort on integrating the absolute risk score calculation model into our CDSS.
In our CDSS implementation, we adopted the 20-year absolute risk score as the projected risk score. We then rescaled the absolute risk reported by Andrew’s model to a range of [0, 10] based on the maximum and minimum risk scores. Based on the risk scores, the CRC risks are classified into three levels, according to a previous study [9] by Jane et al. The low-risk level, medium risk level, and high-risk level, reported from our CDSS system, are corresponding to the scaled risk score ranges of [0, 3], (3, 7], (7, 10], respectively. For example, if the rescaled risk score is higher than 7, our CDSS will report high-risk score.
Our developed CDSS also provides the recommended CRC screening methods. Information on multiple screening methods that are suitable to the identified CRC risk factors are gathered. The screening method details, such as the performance complexity and test time intervals, are stored in the backend database and used for giving screening recommendations to patients based on their risk factors. The recommendation algorithm is a simple structured decision tree [13]. For instance, if a patient reports that he/she has inflammatory bowel disease, the decision tree will report Fecal Immunochemical Test (FIT) as one of the recommended screening methods because of its low complexity, low side effect, and low cost.
The application framework
Figure 1 illustrates the system infrastructure for the prototype of the patient-oriented CDSS. It is developed using Django, a Python-based Model-View-Controller (MVC) web application development framework. An MVC framework separates application functionalities into three domains. The models describe the data structures of the backend database. The views display application outputs and collect inputs, which can consist of several files, such as HTML, CSS, JavaScript, and others. The controllers define the internal logic of the application. It is also responsible for data processing [14]. In our CDSS, we use the D3 data visualization package to visualize the risk score data as bullet chart and create the interactive dashboard [15].
The backend database
We use MySQL [16] as the backend database. Figure 2 shows the primary data structure. The User table stores the information of the CDSS users. When the CDSS is connected to an EHR system, the user information can be transformed to a patient table in the EHR system. The asmt_results table is the main component that stores the assessments patients have done. Since our CDSS will give recommendations on CRC screening methods, the result_scrn_test table serves as a relation table, which represents a many-to-many relationship between assessment results and screening methods. All the detailed information (e.g., test name, test time interval, test performance, etc.) on screening methods will be kept in the scrn_tests table. The asmt_questionnaire table defines questionnaire title and theme. In each questionnaire, there will be several sections, which contain some similar type of questions. The asmt_sections table describes the section title and its preferred style. All the questions will be stored in the asmt_questions table. Each question has a status attribute, which has a potential value of active or disabled. This attribute will help CDSS administrator add or delete questions for each questionnaire easily, making the database design more flexible and extensible. Options for each question will be kept in the asmt_options table, with a type attribute to indicate the type of input (such as a radio button or a text input) and a value indicating the risk score for each risk factor. The full list of attribute descriptions can be found in Additional file 1.
The data structure of our database keeps all the relative information used by our CDSS. It also keeps the flexibility of changing questions and options in the questionnaire. By designing such a database structure, we have also maintained the flexibility for CDSS upgrades in the future.
The website design
Figure 3 demonstrates the design and workflow of the CRC CDSS website. The green-colored textboxes indicate a webpage in the CDSS. Other boxes describe the content of the pages. Our CDSS prototype has the following components and primary CDSS functions. The first is an interactive website with an anonymous scientific questionnaire to obtain the information about essential CRC risk factors. These questions are designed according to previous studies on CRC risk factors [8]. The second is a user-friendly display module with the risk scores calculated based on the input risk information. The third innovative component is an interactive visualization dashboard to show how changing lifestyle habits and diet preferences will affect their CRC risk level. The visualization is personalized based on the user input to the survey questions. Fourth, we incorporate a CDSS module to provide individualized recommendations on screening methods based on survey results and risk scores. The fifth is an appointment scheduling system with CRC providers based on user preferences on doctor characteristics and geographical locations. Last, the CDSS provide educational information on CRC preventative care.