Clinical software development for the Web: lessons learned from the BOADICEA project
© Cunningham et al; licensee BioMed Central Ltd. 2012
Received: 7 July 2011
Accepted: 27 March 2012
Published: 10 April 2012
Skip to main content
© Cunningham et al; licensee BioMed Central Ltd. 2012
Received: 7 July 2011
Accepted: 27 March 2012
Published: 10 April 2012
In the past 20 years, society has witnessed the following landmark scientific advances: (i) the sequencing of the human genome, (ii) the distribution of software by the open source movement, and (iii) the invention of the World Wide Web. Together, these advances have provided a new impetus for clinical software development: developers now translate the products of human genomic research into clinical software tools; they use open-source programs to build them; and they use the Web to deliver them. Whilst this open-source component-based approach has undoubtedly made clinical software development easier, clinical software projects are still hampered by problems that traditionally accompany the software process. This study describes the development of the BOADICEA Web Application, a computer program used by clinical geneticists to assess risks to patients with a family history of breast and ovarian cancer. The key challenge of the BOADICEA Web Application project was to deliver a program that was safe, secure and easy for healthcare professionals to use. We focus on the software process, problems faced, and lessons learned. Our key objectives are: (i) to highlight key clinical software development issues; (ii) to demonstrate how software engineering tools and techniques can facilitate clinical software development for the benefit of individuals who lack software engineering expertise; and (iii) to provide a clinical software development case report that can be used as a basis for discussion at the start of future projects.
We developed the BOADICEA Web Application using an evolutionary software process. Our approach to Web implementation was conservative and we used conventional software engineering tools and techniques. The principal software development activities were: requirements, design, implementation, testing, documentation and maintenance. The BOADICEA Web Application has now been widely adopted by clinical geneticists and researchers. BOADICEA Web Application version 1 was released for general use in November 2007. By May 2010, we had > 1200 registered users based in the UK, USA, Canada, South America, Europe, Africa, Middle East, SE Asia, Australia and New Zealand.
We found that an evolutionary software process was effective when we developed the BOADICEA Web Application. The key clinical software development issues identified during the BOADICEA Web Application project were: software reliability, Web security, clinical data protection and user feedback.
In the past 20 years, society has witnessed the following landmark advances in the biological and computer sciences: (i) the sequencing of the human genome, (ii) the development and distribution of software by the open source movement, and (iii) the invention of the World Wide Web. The sequencing of the human genome [1, 2] was a defining achievement of late 20th century science which had profound implications for our understanding of the genetic risk of disease. The work of the open source movement may be viewed as a cultural, scientific and computing phenomenon  which has had innumerable benefits for science and society. Similarly, the invention of the Web server , the implementation of Common Gateway Interface (CGI) programs, and the widespread adoption of open technological standards such as the Extensible Markup Language  have revolutionised the distribution and processing of digital information. Together, these advances have provided a new impetus for clinical software (CS) development: software developers now translate the products of human genomic research into CS tools; they use open-source programs to build them; and they use the Web to deliver them. These activities form the basis of many translational research projects. Whilst this open-source component-based approach has undoubtedly made CS development easier, CS projects are still hampered by long-standing problems that traditionally accompany the software process.
This study describes the development of the BOADICEA Web Application (BWA) , a computer program used by clinical geneticists to assess risks to patients with a family history of breast and ovarian cancer. The key challenge of the BWA project was to deliver a program within an acceptable timeframe that was safe, secure and easy for healthcare professionals to use. We focus on the BWA software process, problems faced, and lessons learned, so that other software developers can learn from our experience. Many problems described here are widespread in the software industry, and so our discussion has implications for other areas of software development.
The BWA project began in January 2005. By that time, some alternative genetic risk models had already been implemented as desktop applications (e.g. the BRCAPRO and Claus models implemented in CancerGene , and the Tyrer-Cuzick model implemented in IBIS ). However, the BWA was the first program of its kind to be made available on the Web.
Our key objectives are: (i) to highlight key CS development issues; (ii) to demonstrate how software engineering tools and techniques can facilitate CS development for the benefit of individuals who lack software engineering expertise; and (iii) to provide a CS development case report that can be used as a basis for discussion at the start of future projects.
In this section, we describe the BOADICEA model, and the BWA project and software process.
The Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) [9, 10] is a risk model for familial breast and ovarian cancer. The model can be used to compute BRCA1 and BRCA2 mutation carrier probabilities and age specific risks of developing breast and ovarian cancer, using explicit family history data (pedigrees), age information, cancer diagnoses in family members, and BRCA1 and BRCA2 genetic testing information. The algorithm was developed using complex segregation analysis of breast and ovarian cancer based on a combination of families identified through population based studies of breast cancer, and families with multiple affected individuals who had been screened for BRCA1 and BRCA2 mutations. BOADICEA models the simultaneous effects of BRCA1 and BRCA2 mutations and assumes that the residual familial clustering of breast cancer is explained by a polygenic component (a large number of genes each of small effect), with a variance that decreases linearly with age. Individuals are assumed to follow calendar period and cohort specific incidence rates for breast and ovarian cancer. The BOADICEA model has been assessed  using data from UK genetics clinics and has been found to be well calibrated within these families and to discriminate well between BRCA1, BRCA2 and non-mutation carriers.
BOADICEA was originally implemented as a standalone Fortran program (termed here the BOADICEA core program, BCP). The BCP has been used by scientists for some years as a research tool. However, in practice, computing risks with the BCP was difficult and time consuming, which made it inappropriate for use in a clinical setting. To address this problem, we developed the BWA, a user-friendly Web interface to the BCP which has greatly simplified this process. BWA v1 was released for general use 2 years and 10 months after the start of the project. The BWA project milestones were as follows: (i) BWA project begins (January 2005); (ii) BWA v1 trial software released for evaluation and testing (January 2007); (iii) BWA v1 released for general use (November 2007); (iv) BWA v2 trial software released for evaluation and testing (November 2009); and (v) BWA v2 released for general use (August 2010). This work took longer than anticipated, but there were important mitigating factors: (i) none of our team had any prior experience of CS development for the Web; (ii) our list of requirements was substantial (see Requirements section below); and (iii) the development of the online pedigree building module was particularly difficult and time consuming (see Design section below).
At the start of the BWA project, we established the functional requirements of the program. In particular, the program would enable the user to: (i) build new pedigree data sets quickly and flexibly online; (ii) upload pre-existing pedigree data sets; (iii) review the pedigree data set in a table and drawing; (iv) compute BRCA1 and BRCA2 mutation carrier probabilities and breast/ovarian cancer risks; (v) view the computed results in data tables; and (vi) download the input pedigree data set and computed results in plain text and PDF format. These requirements were set out in a software requirements specification document with mock-ups of the Web interface.
The software requirements specification document ensured that requirements were properly understood, and that problematic design issues were considered from the outset. Clinical geneticists were involved in the requirements process from the start of the BWA project. Their advice helped to ensure as far as possible that the program would be easy to use in a clinical setting. We believe that the involvement of end users in the requirements process was key to the success of the project. Owing to the complexity of the proposed software, we could not anticipate all program behaviours at the outset. As a result, Web interface designs evolved substantially as the project progressed. Since the release of BWA v1, software development has been driven by two further ongoing requirements: (i) the need to accommodate extensions to the BOADICEA model; and (ii) the need to implement suggestions for improvement made by users.
Our main objectives were to design a Web interface that would: (i) have a look and feel that would seem familiar to clinical geneticists; (ii) make risk calculations quick and easy; (iii) reduce data errors by constraining user inputs; (iv) function intuitively and respond intelligently to user inputs; (v) have a clear and consistent layout; (vi) have unambiguous data prompts; (vii) minimise key strokes and enable users to set defaults automatically; and (viii) help clinical geneticists to communicate results to patients (e.g. by providing a processing report PDF). The Web interface was also designed to process cached Web pages intelligently.
Our original aim was to design a Web interface that would make BOADICEA easy to use for clinical geneticists, researchers and members of the public. However, these individuals represented a wide spectrum of users with differing prior knowledge and expertise. As a result, we found it difficult to design a Web interface that could accommodate the diverse needs of all these different user groups. One alternative would have been to implement different modes of operation e.g. 'clinician mode' and 'public mode'. Patients could have used a simplified 'public mode' interface to build a preliminary pedigree data set prior to a consultation with a clinical geneticist. This would have helped to shift some of the burden of data entry from the counsellor to the counselee. However, time and funding constraints prevented us from designing two separate modes of operation. As a result, our main aim was to design a program that would be easy to use in a clinical setting.
The online pedigree building module was intended to enable users to build pedigrees quickly and easily. However, in practice, the module delivered with BWA v1 was difficult and time consuming to use. This problem was principally one of Web interface design. Work on this module was problematic because: (i) it included pedigree building functions that were previously unavailable in comparable Web-based programs, and so required a novel design solution; and (ii) we were hampered in this task by our lack of prior Web programming experience. After further development, the pedigree building module delivered with BWA v2 did fulfil all initial requirements. This was made possible by refinements to the design of the interface. Our experience of this task was consistent with the observation that 'the incompletenesses and inconsistencies of our ideas become clear only during implementation' , or that 'you often don't really understand the problem until after the first time you implement a solution' .
We also made extensive use of open-source technologies including the Linux operating system, Apache Web server, MySQL database, Perl interpreter, gfortran compiler, R statistical computing package, Kinship pedigree drawing package, ImageMagick image processing package, html2ps document converter, Concurrent Versions System version control package and Xemacs programming editor.
We used a bottom-up development scheme: we developed low level modules first, and then combined them to form higher level ones. We also used the following defensive programming techniques : (i) we used assertions; (ii) we checked the values of input from external sources; (iii) we handled exceptions gracefully; (iv) we implemented subroutines with low coupling to try to contain the damage caused by errors; and (v) we checked function return values. The source code included hundreds of assertions to check for data processing errors. When an assertion failed during program execution, we modified the code to ensure that it would not fail again in the same way. The assertions also made it easier to identify new defects introduced during source code modifications.
The aim of software testing was to find and eliminate defects (faults that failed on execution). Before the BWA was released, we ran an initial set of in-house tests and applied defect fixes and design updates on the basis of the test results. This process was iterative. After these initial tests, we wrote the accompanying documentation and released the BWA v1 trial software for evaluation by users (users were informed of its trial status). During this period of evaluation, we ran a further set of in-house tests. Once this further phase of in-house testing was complete and we had implemented the final defect fixes, we released BWA v1 for general use.
In-house software tests were planned and executed systematically, and test results were documented. We prioritised tests to exercise modules that had the potential to generate the most harmful defects. Test cases were designed to uncover specific classes of defect, and to ensure that data were displayed consistently. The most serious defects were uncovered by in-house tests. Some of these defects were due to simple semantic errors, or to errors of omission. This observation is sobering. However, we found that software faults could be easily overlooked when the program included several tens of thousands of lines of source code.
We considered software reliability to be the most important issue in BWA development. Consequently, nearly 30% of our development time was devoted to testing. We found that in-house tests and user evaluations were complementary as these activities were based on internal and external views of the program respectively. During the software evaluations, users provided essential feedback. As users developed a better understanding of the program, so we developed a better understanding of how they were using it. In this way, user feedback formed a vital part of the software process. Nevertheless, in spite of this, we recognised that users could only identify a subset of defects as: (i) user evaluations were not systematic; and (ii) users were only likely to identify a defect if a computed result (or software behaviour) differed significantly from their own expectation.
The principal aims of software maintenance were to implement defect fixes and enhancements to the BWA, and to support the BWA user community. Web deployment made some maintenance tasks easier because: (i) we could update program executables on the server easily; (ii) we did not have to maintain different software builds for different operating systems; and (iii) we could log user activity on the server.
The installation of the BWA v2 trial software revealed a potential software maintenance pitfall. To implement this new software, we had to upgrade several open-source components (e.g. Perl, R and MySQL). However, we found that some of these updated components were incompatible with BWA v1. At that time, we already had > 900 users, and it was essential that these upgrades did not cause an interruption to service. To address this problem, we setup a separate Linux test server and ran a trial software installation to ensure that both BWA v1 and v2 worked together with the updated components. Once we had resolved the incompatibilities and the trial software installation was successful, we could then perform the same installation on our production server without problems.
Users can also upload a pre-existing pedigree data set for processing. To do this, the user must first create a pedigree data file in the BOADICEA import/export digital data format (a simple plain text format designed to facilitate pedigree data exchange, described in Appendix A of the user guide ). Some commercial pedigree data management/drawing programs such as Progeny , Clinical Pedigree  and PED  now include a BOADICEA data file export utility which has helped to widen access to BOADICEA further.
When the input pedigree data set is complete, the user can review it prior to running a risk calculation. The 'Pedigree Table View' Web page (Figure 4b) lists details of all family members in a data table. Similarly, the user can draw the pedigree using the Kinship package  implemented in the R environment  (Figure 4c).
Once the input pedigree data set has been finalised, the user can compute risks at the push of a button. The BWA returns BRCA1 and BRCA2 mutation carrier probabilities and age specific breast/ovarian cancer risks in a 'Computed Results' Web page (Figure 4d). The user can also adjust key model data parameters such as the population BRCA1 and BRCA2 mutation frequencies and mutation search sensitivities, so that the risk calculation can be tailored to different populations and genetic testing methods.
Once BOADICEA risks have been computed, the user can download the input pedigree data set and results in a plain text data file and processing report PDF. The PDF is intended to help genetic counsellors communicate risks to patients. Whilst the PDF includes tables of BRCA1 and BRCA2 mutation carrier probabilities and breast/ovarian cancer risks, the presentation of these data could be refined to facilitate communication further (e.g. breast and ovarian cancer risks could be plotted on a graph with equivalent population risks to aid their interpretation in a wider context). Ongoing research projects aimed at communicating risks effectively will inform further improvement of risk reporting.
Downloading the pedigree data set and storing it locally ensures that it can be updated at a later date . Early in the BWA project, we considered the possibility of storing pedigree data sets on the BWA system. This would have enabled users to access their data from any location which would have made the program easier to use. However, this solution would also have conflicted with our requirement to conform to data protection principles (described in the Clinical data protection section below). Hence, the existing data storage solution may be viewed as a compromise intended to meet competing requirements for software usability and clinical data protection. When the user logs out, all data files created during the session are deleted from the server. Alternatively, if the user does not log out, the data files will be deleted after a period of 24 hours when the session times out.
The BWA has now made it much easier for clinical geneticists to run BOADICEA risk calculations in a clinical setting. As we develop the BWA further, we hope to achieve a better integration between the program and the wider genetic counselling workflow.
We found that an evolutionary software process was effective when we developed the BWA. The key CS development issues identified during the BWA project were: software reliability, Web security, clinical data protection and user feedback. These issues are described below.
In our view, software reliability is the most important issue in CS development. Clinical geneticists use the BWA to calculate BRCA1 and BRCA2 mutation carrier probabilities and breast/ovarian cancer risks, which are used to determine eligibility for genetic testing or to provide individually tailored clinical management. As a result, defects in the program could have implications for the management of patients.
Software defects cause programs to function unreliably. Testing can help to find defects in substantial computer programs, but it is impossible to eliminate them completely (e.g. Myers  noted that 'in general, it is impractical, often impossible, to find all the errors in a program'). Hence, we have to accept that some defects are likely to remain undetected. McConnell  suggested that the 'industry average is about 15 to 50 errors per 1000 lines of code for delivered software'. BWA v1 included approximately 40000 lines of code. Consequently, we took steps to ensure as far as possible that the program functioned reliably before it was released for general use. In-house tests revealed the most serious defects. Less serious defects were reported by users during the software evaluations. As part of the software process, we enforced in-house coding rules and used defensive programming techniques . We believe that these measures helped to prevent the introduction of faults and to make the software more reliable.
Web security is an additional challenge in CS development. All computers connected to the Internet are at risk of malicious attack, and Web servers running CGI applications are particularly vulnerable . Consequently, we took steps to minimise the effects of a malicious attack should one occur. To help secure the BWA server, we ensured that: (i) clinical data were always stored outside the document root; (ii) appropriate permissions were set on program files; (iii) a firewall was in place; (iv) the operating system was patched; (v) non-essential services were shutdown; (vi) data transmissions were encrypted; (vii) Web form data were validated on the server before use; (viii) Perl components ran in taint mode; and (ix) computers that were detected probing the server were subject to access restrictions with Fail2ban . The Cambridge University Computing Service also probed the server periodically to try to identify security vulnerabilities.
Clinical data protection is a further key issue in CS development. If healthcare professionals believe that a Web-based program may put patient data at risk, they will not use it. As a result, we took steps to conform as far as possible to the data protection principles set out in Schedule one to the Data Protection Act 1998 : (i) the program does not use data items that are regarded as strong patient identifiers  (in particular, the program does not use full name [only first name is used which is optional], address, date of birth, postcode, NHS number or local patient identifier); (ii) we collected only the minimum data required to enable users to compute and interpret risks; and (iii) data were deleted when the user logged out, or after a period of 24 hours when the session timed out.
It is important to consider whether the identity of an individual could be inferred from the data submitted by users. The BWA prompts for only three of the 14 key items of patient-identifiable information set out in Appendix 7 of the Caldicott Committee Report : forename (which is optional), sex and ethnic group. The Caldicott Committee Report states that 'an individual item from this list, taken with another item from a particular flow, may in certain circumstances enable identity to be inferred', and it gives as an example 'age linked to a diagnosis'. Since BWA users supply details of age and cancer diagnosis, it may be possible to infer an individual's identity from these data, or possibly from other combinations of data. However, in practice, the ease with which this can be accomplished depends on the nature of the data items and the context of their disclosure.
We also configured the BWA so that a user could only view data submitted during his/her current session. As a result, if for example an unauthorised person were to use John Smith's username and password to access the system, then any data sets submitted concurrently by John Smith would still be inaccessible to the unauthorised person, as they would be stored elsewhere.
As soon as we released the BWA v1 trial software, we urged users to report computed results or software behaviours that differed significantly from their own expectation. Whenever a user queried a result, we first attempted to replicate their result using the BCP alone to determine whether the problem was associated with the BCP or the Web interface (results generated with the BCP had been compared with data from UK genetics clinics , but we were aware that the BCP could still include defects). We then sought to explain why the user's result failed to meet expectation. In most cases, users reported problems: (i) when the BOADICEA model behaved in a way that was unexpected to them (e.g. users were sometimes surprised by the effect that different mutation search sensitivities had on computed risks); and (ii) when key input data parameters were excluded unintentionally from the risk calculation. As a result, we learned that it was important to inform users of some specific BOADICEA model behaviours and key input data requirements. In this way, user inquiries formed the basis of many frequently asked questions on the project Web site . By verifying user results, we gained further confidence that the BWA was functioning reliably.
We found that an evolutionary software process was effective when we developed the BWA. Our approach to Web implementation was conservative and we used conventional software engineering tools and techniques. The software process had to be sufficiently flexible to accommodate evolving Web interface designs. The key CS development issues identified during the BWA project were: software reliability, Web security, clinical data protection and user feedback.
The software requirements specification document ensured that requirements were properly understood, and that problematic design issues were considered from the outset.
We were unable to design a Web interface that would fully accommodate the diverse needs of clinical geneticists, researchers and members of the public. Our main aim was to design a program that would be easy to use in a clinical setting.
We had to devise a novel design solution for the online pedigree building module.
We implemented the BWA CGI module in Perl and made extensive use of open-source technologies.
In-house software tests were planned and executed systematically, and test results were documented. We prioritised tests to exercise modules that had the potential to generate the most harmful defects. The most serious defects were uncovered by in-house tests.
In our view, software reliability was the most important issue in CS development. Software testing helped us to find defects, but we had to accept that some defects were likely to remain undetected.
We took the following steps to improve software reliability: (i) we enforced in-house coding rules; (ii) we used defensive programming techniques; (iii) we tested the software extensively; and (iv) we urged users to report computed results and software behaviours that differed substantially from their own expectation.
We found that in-house tests and user evaluations were complementary as these activities were based on internal and external views of the program respectively. As users developed a better understanding of the program, so we developed a better understanding of how they were using it.
We took steps to minimise the effect of a malicious attack on the BWA server should one occur.
We took steps to conform as far as possible to the data protection principles set out in Schedule one to the Data Protection Act 1998.
We learned that it was important to inform users of some specific BOADICEA model behaviours and key input data requirements.
User feedback was extremely important to us and it helped to shape the software process. User inquiries formed the basis of many frequently asked questions on the project Web site. By verifying computed results queried by users, we gained further confidence that the BWA was functioning reliably.
Project name: BOADICEA
Project home page: [http://www.srl.cam.ac.uk/genepi/boadicea/boadicea_home.html]
Operating system: The BWA is implemented on an Ubuntu Linux computer. Users access the software via the Web.
Other requirements: Modern Web browser with active scripting enabled
Common gateway interface
BOADICEA Web Application
Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm
BOADICEA core program.
This work was supported by Cancer Research UK grants C12292/A11174 and C1287/A10118. ACA is a Cancer Research UK Senior Cancer Research Fellow and DFE is a Cancer Research UK Principal Research Fellow. We are grateful to the many BWA users who provided valuable advice and feedback during the course of the project. Stephen Miller implemented the BOADICEA data file export utility for Progeny. Cyril Chapman implemented the BOADICEA data file export utility for Clinical Pedigree. Hansjoerg Plendl implemented the BOADICEA data file export utility for PED. Adjoa Tamakloe drafted the BWA software license agreement. The following individuals provided advice on data collection and processing: Lisa Walker, Jonathan Morrison, Paul Pharoah, Lesley McGuffog, Richard Hardy, Adam Dickinson, Helen Field, Robert Luben, Susan Peock, Christiana Kartsonaki, Mitul Shah, Nasim Mavaddat, Andrew Lee, Jing Hua Zhao and Terry Therneau. We thank David Euhus, Eleni Kaldoudi and Cyril Chapman for their constructive reviews.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.