- Research article
- Open Access
- Open Peer Review
Generating prior probabilities for classifiers of brain tumours using belief networks
BMC Medical Informatics and Decision Makingvolume 7, Article number: 27 (2007)
Numerous methods for classifying brain tumours based on magnetic resonance spectra and imaging have been presented in the last 15 years. Generally, these methods use supervised machine learning to develop a classifier from a database of cases for which the diagnosis is already known. However, little has been published on developing classifiers based on mixed modalities, e.g. combining imaging information with spectroscopy. In this work a method of generating probabilities of tumour class from anatomical location is presented.
The method of "belief networks" is introduced as a means of generating probabilities that a tumour is any given type. The belief networks are constructed using a database of paediatric tumour cases consisting of data collected over five decades; the problems associated with using this data are discussed. To verify the usefulness of the networks, an application of the method is presented in which prior probabilities were generated and combined with a classification of tumours based solely on MRS data.
Belief networks were constructed from a database of over 1300 cases. These can be used to generate a probability that a tumour is any given type. Networks are presented for astrocytoma grades I and II, astrocytoma grades III and IV, ependymoma, pineoblastoma, primitive neuroectodermal tumour (PNET), germinoma, medulloblastoma, craniopharyngioma and a group representing rare tumours, "other". Using the network to generate prior probabilities for classification improves the accuracy when compared with generating prior probabilities based on class prevalence.
Bayesian belief networks are a simple way of using discrete clinical information to generate probabilities usable in classification. The belief network method can be robust to incomplete datasets. Inclusion of a priori knowledge is an effective way of improving classification of brain tumours by non-invasive methods.
The current "gold standard" for brain tumour diagnosis is histopathology which requires a sample of tumour obtained at operation. These operations have an inherent risk of morbidity and mortality. Magnetic Resonance Imaging (MRI), Magnetic Resonance Spectroscopy (MRS) and other imaging modalities may offer a non-invasive way of making a diagnosis, but no method has yet attained sufficient accuracy to replace histopathology. MRS in particular has been shown to provide useful information about the biochemical content of a brain tumours  and numerous methods for classifying brain tumours based on magnetic resonance spectra have been presented [2–6].
When making a classification decision it is intuitively sensible to use as much relevant information as possible, but very few of the published classifiers have attempted to combine information from different modalities and sources (but see [7–10]). This work details a method which uses data from the West Midlands Regional Childhood Tumour Registry (WMRCTR) to produce probabilities of brain tumour class, given its anatomical location.
The WMRCTR provides data from the last five decades on over 1700 childhood cancer patients, mostly in free-text form. During that period the format of the stored data has changed: knowledge of the exact anatomical location has improved with the advent of MRI and the classification scheme for tumours has changed to the WHO  system. This presents a considerable challenge to its use in computer-based systems.
The discriminating power of "anatomical location" as a feature for a classifier is not sufficient to make classifications based on this variable alone. However it is envisaged that the probabilities obtained from the WMRCTR data could be used as "informative priors" in existing classification methods. In this work we demonstrate their impact on a simple MRS based classifier. It is worth emphasising that this work focusses on paediatric brain tumours, which are significantly more varied and more difficult to diagnose using MRI alone, than those in adults.
The approach to using the WMRCTR data is based on a graphical representation of Bayesian inference called belief networks. Since anatomical location and tumour class are discrete random variables, probabilities can be estimated directly from the data, without the need to rely on assumptions about the form of probability density functions. In the following sections we introduce the belief network method, present some examples and discuss the construction of the final network from the data in the WMRCTR. Finally, the network is presented and demonstrated on some test-cases.
A Bayesian belief network or often just belief network is a graphical representation of the joint probability distribution function of a collection of variables [12, 13]. A belief network makes exactly the same inferences as would be made by applying Bayes' rule to a series of probabilities, but the graphical construction often provides insight into the problem. The network is represented as a weighted, acyclic, directed graph, each vertex representing a discrete variable/event (see Figure 1). To use the terminology of Russell and Norvig , each of these vertices fall into one of three categories:
1. query variables, i.e. events the probability of which is of interest (in Figure 1 these are "medulloblastoma" and "astrocytoma");
2. evidence variables, i.e. events known to have occurred (in Figure 1 these are taken to be "posterior fossa" and "supratentorial");
3. hidden variables, i.e. events which may occur but cannot be measured (in Figure 1 these are taken to be "IV ventricle" and "cerebellum").
The weighted edges connecting vertices represent the probability that the target vertex is true, conditioned on the source vertex. Here, vertices represent anatomical locations or tumour types. If an edge connects two anatomical locations then its weight is the probability that the tumour was in the target vertex, given that it is known to be in the source vertex. If an edge connects an anatomical location to a tumour type then its weight is the probability that the tumour is of the type specified by the target vertex, given that it is known to have occurred in the source vertex.
To illustrate the utility of the method, consider the following example. Referring to Figure 1, suppose it is known that the tumour occurs in the posterior fossa (vertex v 1) and the probability that the tumour is a medulloblastoma is sought. Working backward from vertex v 5 (the medulloblastoma) and applying Bayes' rule:
Of course, if it was known that the tumour occurred in the IV ventricle then the expression could have stopped there and the probability obtained by inspection, thus hidden variables may sometimes be evidence variables depending on the particular sample. The important point is that different "resolution" information can be used, this is particularly important if a tumour spans several regions, as will be discussed later. It also means that data of lower resolution can be incorporated into the network. For example, many tumours in the WMRCTR are just listed as having location: "posterior fossa". Working back from the query variables is relatively complicated to implement. An easier and equivalent way is to work forward from the evidence variables. As such it is convenient to represent a graph as an adjacency matrix, for the example in Figure 1 this is:
Each element a ij of A refers to the weighted connection from vertex v i to v j , i.e. the row index refers to the source vertex, the column index to the target.
The adjacency matrix representation permits easy calculation of the vector of class probabilities, given knowledge of which evidence variables to use. To find the probabilities of each class, given any evidence variable the following procedure is used (more computationally efficient methods are given in ):
1. Construct the n-dimensional column vector x where n is the number of vertices (variables) and set all the elements to zero.
2. Set the single element of x that corresponds to the evidence variable known be to true, to one.
3. Compute x← A T x until x stops changing. At every iteration, those vertices connected to those with non-zero entries x will become non-zero.
4. Those elements of x corresponding to output variables have the probability that the tumour belongs to each class, given the evidence. All other elements of x will be zero.
Clearly, the axioms of probability require that the sum of all elements in x is unity. It is important to note that the terminating vertices (v 5 and v 6 in Figure 1) need to be connected to themselves so that the method just described will converge to the correct value. If they are not present, x will converge to the zero vector. As well as giving probabilities of class membership given a single location, tumours that span adjacent anatomical regions can also be considered; for every region in which the tumour is present compute the output vector, then average these to produce the final vector of probabilities. Although this is an intuitively sensible property, it has not been evaluated in this work.
The WMRCTR database was made available as a spreadsheet giving hand-typed strings for the diagnosis and location of each case. Occasionally, grade of tumour was also specified. In total there were 1712 cases available. By hand, each record was examined and modified. If the tumour type was classified with the WHO system, it was left unchanged. If a WHO equivalent existed for a tumour classified using the old scheme, then it was changed; otherwise the record was removed. This reduced the number of cases to 1367. These cases were then further reviewed, and only those tumours with location specified were included, reducing the final number of cases used to 1333. Of these, the site of the primary lesion was often only specified vaguely, e.g. "posterior fossa" or "cerebrum". Tumours specified with a greater degree of accuracy were then grouped (by hand) under these broader headings, as well as maintaining their original information. A graph showing the anatomical distinctions made is given in Figure 2. Very occasionally, the location was specified in great detail (e.g. "foramen of Munro") but this was very rare and these samples were marked as being in the appropriate containing location.
The classes used were: astrocytoma grades I and II, astrocytoma grades III and IV, ependymoma, pineoblastoma, PNET, germinoma, medulloblastoma, craniopharyngioma and a group representing rare tumours, "other" (classes represented by fewer than 15 cases). It is common practice to group paediatric astrocytomas of different grades in this way as they thought to be very similar diseases.
Results and discussion
The simplest method to represent the results would be an adjacency matrix, but this is too large for publication in its direct format. Instead, the graph is specified by Table 1; an adjacency list representation. A subgraph of the final belief network is shown in Figure 2, giving the paths necessary to generate a probability for a medulloblastoma.
Figure 2 indicates that medulloblastomas have been recorded in the database as occurring in several locations in the brain. Typically however, they are thought to arise from the cerebellum. The IV ventricle and the brain stem are adjacent to the cerebellum and are often invaded by these tumours making it impossible to be certain of the location from which the tumour originated. This illustrates the power of the belief network in dealing with this situation; the possibility of the tumour being a medulloblastoma is not discounted if it does not occur in the typical position.
Figure 2 also illustrates another feature of the network arising from the fact that a large amount of historical data was used: a small number of medulloblastomas were also recorded as being present in the parietal and temporal lobes, which are distant from the cerebellum. These cases may be due to an incorrect assumption being made at the time of diagnosis that a meta-static deposit from the primary medulloblastoma tumour was actually the primary tumour itself. Again this rare but known clinical scenario is well accounted for by the belief network.
To validate the method presented above, two simple classifiers were investigated using data from 46 recent patients forming part of an ongoing study of MRS of childhood brain tumours . Each patient had a tumour from one of seven classes: astrocytoma grade I and II (16 cases, v 24), medulloblastoma (13 cases, v 30), ependymoma (3 cases, v 29), germinoma (3 cases, v 27), PNET (3 cases, v 28), astrocytoma grade III and IV (2 cases, v 31), and "other" (6 cases, v 32).
The first classifier used only the belief network in the classification; assigning to each sample the label of the class with the highest probability as predicted by the network. This classifier had an error rate of 59%, compared with an error rate of 65% when using probabilities predicted by class prevalence.
The second classifier investigated the effect of using the network to augment a basic MRS classifier. Each of the 46 samples available was a short-echo time (30 ms) single voxel spectroscopy acquisition acquired on a Siemens Symphony 1.5T scanner. The free induction decay (FID) contained 1024 points and was sampled at 1000 Hz. Post-acquisition residual water was removed using the HSVD method  to model the water component ± 30 Hz either side of the water signal. Each FID was then Fourier transformed with no line-broadening to give the magnitude spectrum and then normalised to have unit length in the l 2-norm. The normalised spectra were feature-reduced using principal components analysis (PCA) to 10 dimensions. Gaussian functions were then used as the discriminant with mean estimated for each tumour class and a common estimate of the covariance matrix shared across all samples. Two scenarios were then investigated, prior probabilities based on class prevalence and prior probabilities using the belief network. Prior probabilities were applied by multiplying the value of each Gaussian discriminant. Classifier performance was measured using three metrics: apparent error, leave-one-out cross-validation error and the 632+ error rate estimator .
With prior probabilities based on class prevalence the apparent error rate was 20%, corresponding to a correct classification of 37 out of 46 tumours. However, the cross validation and 632+ error rates were 41% and 48% respectively, indicating a poor generalisation to unseen cases. With prior probabilities based on the belief network the apparent error rate was 15% corresponding to a correct classification of 39 out of 46 tumours, the cross validation and 632+ error rates were 32% and 37% respectively, indicating that the prior probabilities measurably improve the generalisation performance of the classifier. When using belief network prior probabilities, five of the incorrectly classified samples were the same as those incorrectly classified when using class prevalence priors, the remaining two (one ependymoma, one astrocytoma grade I/II) were correctly classified using prevalence information.
The distribution of the error rate among the classes was approximately the same for both methods of generating prior probabilities, although there were slight differences. Complete results are presented in Table 2. In nearly all incorrectly classified cases, the class with the second highest posterior probability was the correct class; although this was true for both methods of generating priors. In the incorrectly classified cases, the difference in posterior probability between the predicted label and the correct label's probability was small (≈ 0.01) for about half the misclassifications and large (≈ 0.4) for the other half; this was also true for both methods of generating prior probabilities.
The simple classifier presented here attempts only to demonstrate that the belief network method can be useful and that the data used for its construction is sufficiently accurate. The application of the belief network to other classifiers depends on the choice of classifier, but many classifiers have a natural way to use prior probabilities either directly or in the form of weights.
Data from a large clinical database, collected over five decades, was used to construct a Bayesian belief network suitable for generating probabilities of tumour class. The network was shown to enhance a simple probability-based classifier that uses PCA reduced raw MRS spectra for features. It is suggested that additional (discrete) information could be incorporated into the belief network to further enhance classifier performance.
Preul M, Caramanos Z, Collins D, Villemure J, Leblanc R, Olivier A, Pokrupa R, Arnold D: Accurate, noninvasive diagnosis of human brain tumors by using proton magnetic resonance spectroscopy. Nature Medicine. 1996, 2 (3): 323-325. 10.1038/nm0396-323.
Tate A, Majos C, Moreno A, Howe FA, Griffiths J, Arus C: Automated Classification of Shot Echo Time in In Vivo 1H Brain Tumor Spectra: A Multicenter Study. Magnetic Resonance in Medicine. 2003, 49: 29-36. 10.1002/mrm.10315.
Lukas L, Suykens J, Vanhamme L, Howe F, Majós C, Moreno-Torres A, van der Graaf M, Tate A, Arús C, Van Huffel S: Brain tumour classification based on long echo proton MRS signals. Artifical Intelligence in Medicine. 2004, 31: 73-89. 10.1016/j.artmed.2004.01.001.
Devos A, Lukas L, Suykens J, Vanhamme L, Tate A, Howe F, Majos C, Moreno-Torres A, van der Graff M, Arus C, Van Huffel S: Classification of brain tumours using short echo time 1H MR Spectra. Journal of Magnetic Resonance. 2004, 170: 164-175. 10.1016/j.jmr.2004.06.010.
Tate A, Underwood J, Acosta D, Julià-Sapé M, Majós C, Moreno-Torres A, Howe F, van der Graaf M, Lefournier V, Murphy M, Loosemore A, Ladroue C, Wesseling P, Bosson JL, nas MEC, Simonetti AW, Gajewicz W, Calvar J, Capdevila A, Wilkins P, Bell BA, Rémy C, Heerschap A, Watson D, Griffiths J, Arús C: Development of a decision support system for diagnosis and grading of brain tumours using in vivo magnetic resonance single voxel spectra. NMR in Biomedicine. 2006, 19: 411-434. 10.1002/nbm.1016.
Opstad K, Ladroue C, Bell B, Griffiths J, Howe F: Linear discriminant analysis of brain tumour 1H MR spectra: a comparison of classification using whole spectra versus metabolite quantification. NMR in Biomedicine.
Galanaud D, Nicoli F, Chinot O, Confort-Gouny S, Figarella-Branger D, Roche P, Fuentès S, Le Fur Y, Ranjeva JP, Cozzone P: Noninvasive Diagnostic Assessment of Brain Tumours Using Combined In Vivo MR Imaging and Spectroscopy. Magnetic Resonance in Medicine. 2006, 55: 1236-1245. 10.1002/mrm.20886.
de Edelenyi FS, Rubin C, Rubin C, Exteve F, Grand S, Decorps M, Lefournier V, Bas JFL, Remy C: A new approach for analyzing proton magnetic resonance spectroscopic images of brain tumours: nosologic images. Nature Medicine. 2000, 6: 1287-1289. 10.1038/81401.
Simonetti AW, Melssen WJ, de Edelenyi FS, van Asten JJA, Heerschap A, Buydens LMC: Combination of feature-reduced MR spectroscopic and MR imaging data for improved brain tumor classification. NMR in Biomedicine. 2005, 18: 34-43. 10.1002/nbm.919.
Luts J, Heerschap A, Suykens J, Van Huffel S: A combined MRI and MRSI based multiclass system for brain tumour recognition using LS-SVMs with class probabilities and feature selection. Artificial Intelligence in Medicine. 2007, 40 (2): 87-102. 10.1016/j.artmed.2007.02.002.
Kleihues P, Burger P, Scheithauer B: The new WHO classification of brain tumours. Brain Pathology. 1993, 3 (3): 255-68.
Russell S, Norvig P: Artifical Intelligence : A Modern Approach. 2003, New Jersey: Prentice Hall
Han J, Kamber M: Data Mining : Concepts and Techniques. 2001, San Francisco: Morgan Kaufman
Peet AC, Lateef S, Natarajan K, Sgouros S, Grundy RG: Short Echo Time 1H Magnetic Resonance Spectroscopy of Childhood Brain Tumours. Diseases of the Childs Nervous System. 2007, 23: 163-169. 10.1007/s00381-006-0206-4.
Barkhuijsen H, de Beer R, Ormondt DV: Improved Algorithm for Noniterative and Time-Domain Model Fitting to Exponentially Damped Magnetic Resonance Signals. Journal of Magnetic Resonance. 1987, 73: 553-557.
Efron B, Tibshirani R: Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Assocation. 1997, 92 (438): 548-560. 10.2307/2965703.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/7/27/prepub
Greg Reynolds holds a Ph.D. studentship funded by the EPSRC. Andrew Peet holds a Department of Health Clinician Scientist Award. The authors wish to acknowledge the help of the West Midlands Regional Childhood Tumour Registry, Birmingham Children's Hospital NHS Foundation Trust, in particular Sheila Parkes for providing the data in a manageable form. We thank the reviewers for their improving comments.
The author(s) declare that they have no competing interests.
All authors were involved in developing the concept of using prior probabilities in childhood brain tumour classification. GMR constructed the belief networks processed the data and drafted the paper, which was reviewed by TNA and ACP. All authors read and approved the final manuscript.