Generating prior probabilities for classifiers of brain tumours using belief networks
© Reynolds et al; licensee BioMed Central Ltd. 2007
Received: 12 May 2007
Accepted: 18 September 2007
Published: 18 September 2007
Numerous methods for classifying brain tumours based on magnetic resonance spectra and imaging have been presented in the last 15 years. Generally, these methods use supervised machine learning to develop a classifier from a database of cases for which the diagnosis is already known. However, little has been published on developing classifiers based on mixed modalities, e.g. combining imaging information with spectroscopy. In this work a method of generating probabilities of tumour class from anatomical location is presented.
The method of "belief networks" is introduced as a means of generating probabilities that a tumour is any given type. The belief networks are constructed using a database of paediatric tumour cases consisting of data collected over five decades; the problems associated with using this data are discussed. To verify the usefulness of the networks, an application of the method is presented in which prior probabilities were generated and combined with a classification of tumours based solely on MRS data.
Belief networks were constructed from a database of over 1300 cases. These can be used to generate a probability that a tumour is any given type. Networks are presented for astrocytoma grades I and II, astrocytoma grades III and IV, ependymoma, pineoblastoma, primitive neuroectodermal tumour (PNET), germinoma, medulloblastoma, craniopharyngioma and a group representing rare tumours, "other". Using the network to generate prior probabilities for classification improves the accuracy when compared with generating prior probabilities based on class prevalence.
Bayesian belief networks are a simple way of using discrete clinical information to generate probabilities usable in classification. The belief network method can be robust to incomplete datasets. Inclusion of a priori knowledge is an effective way of improving classification of brain tumours by non-invasive methods.
The current "gold standard" for brain tumour diagnosis is histopathology which requires a sample of tumour obtained at operation. These operations have an inherent risk of morbidity and mortality. Magnetic Resonance Imaging (MRI), Magnetic Resonance Spectroscopy (MRS) and other imaging modalities may offer a non-invasive way of making a diagnosis, but no method has yet attained sufficient accuracy to replace histopathology. MRS in particular has been shown to provide useful information about the biochemical content of a brain tumours  and numerous methods for classifying brain tumours based on magnetic resonance spectra have been presented [2–6].
When making a classification decision it is intuitively sensible to use as much relevant information as possible, but very few of the published classifiers have attempted to combine information from different modalities and sources (but see [7–10]). This work details a method which uses data from the West Midlands Regional Childhood Tumour Registry (WMRCTR) to produce probabilities of brain tumour class, given its anatomical location.
The WMRCTR provides data from the last five decades on over 1700 childhood cancer patients, mostly in free-text form. During that period the format of the stored data has changed: knowledge of the exact anatomical location has improved with the advent of MRI and the classification scheme for tumours has changed to the WHO  system. This presents a considerable challenge to its use in computer-based systems.
The discriminating power of "anatomical location" as a feature for a classifier is not sufficient to make classifications based on this variable alone. However it is envisaged that the probabilities obtained from the WMRCTR data could be used as "informative priors" in existing classification methods. In this work we demonstrate their impact on a simple MRS based classifier. It is worth emphasising that this work focusses on paediatric brain tumours, which are significantly more varied and more difficult to diagnose using MRI alone, than those in adults.
The approach to using the WMRCTR data is based on a graphical representation of Bayesian inference called belief networks. Since anatomical location and tumour class are discrete random variables, probabilities can be estimated directly from the data, without the need to rely on assumptions about the form of probability density functions. In the following sections we introduce the belief network method, present some examples and discuss the construction of the final network from the data in the WMRCTR. Finally, the network is presented and demonstrated on some test-cases.
1. query variables, i.e. events the probability of which is of interest (in Figure 1 these are "medulloblastoma" and "astrocytoma");
2. evidence variables, i.e. events known to have occurred (in Figure 1 these are taken to be "posterior fossa" and "supratentorial");
3. hidden variables, i.e. events which may occur but cannot be measured (in Figure 1 these are taken to be "IV ventricle" and "cerebellum").
The weighted edges connecting vertices represent the probability that the target vertex is true, conditioned on the source vertex. Here, vertices represent anatomical locations or tumour types. If an edge connects two anatomical locations then its weight is the probability that the tumour was in the target vertex, given that it is known to be in the source vertex. If an edge connects an anatomical location to a tumour type then its weight is the probability that the tumour is of the type specified by the target vertex, given that it is known to have occurred in the source vertex.
Each element a ij of A refers to the weighted connection from vertex v i to v j , i.e. the row index refers to the source vertex, the column index to the target.
The adjacency matrix representation permits easy calculation of the vector of class probabilities, given knowledge of which evidence variables to use. To find the probabilities of each class, given any evidence variable the following procedure is used (more computationally efficient methods are given in ):
1. Construct the n-dimensional column vector x where n is the number of vertices (variables) and set all the elements to zero.
2. Set the single element of x that corresponds to the evidence variable known be to true, to one.
3. Compute x← A T x until x stops changing. At every iteration, those vertices connected to those with non-zero entries x will become non-zero.
4. Those elements of x corresponding to output variables have the probability that the tumour belongs to each class, given the evidence. All other elements of x will be zero.
Clearly, the axioms of probability require that the sum of all elements in x is unity. It is important to note that the terminating vertices (v 5 and v 6 in Figure 1) need to be connected to themselves so that the method just described will converge to the correct value. If they are not present, x will converge to the zero vector. As well as giving probabilities of class membership given a single location, tumours that span adjacent anatomical regions can also be considered; for every region in which the tumour is present compute the output vector, then average these to produce the final vector of probabilities. Although this is an intuitively sensible property, it has not been evaluated in this work.
The classes used were: astrocytoma grades I and II, astrocytoma grades III and IV, ependymoma, pineoblastoma, PNET, germinoma, medulloblastoma, craniopharyngioma and a group representing rare tumours, "other" (classes represented by fewer than 15 cases). It is common practice to group paediatric astrocytomas of different grades in this way as they thought to be very similar diseases.
Results and discussion
Adjacency list representation of final belief network. The destination vertices from each vertex are shown in the "Connections" column
Connections (vertex, weight)
(v 3, 222/631), (v 4, 109/631), (v 5, 300/631)
(v 6, 276/702), (v 7, 67/702), (v 8, 21/702), (v 9, 17/702), (v 10, 99/702), (v 11, 104/702), (v 12, 67/702), (v 13, 51/702)
(v 24, 189/222), (v 28, 1/222), (v 29, 3/222), (v 30, 4/222), (v 31, 6/222), (v 32, 19/222)
(v 24, 16/109), (v 28, 3/109), (v 29, 24/109), (v 30, 56/109) (v 32, 10/109)
(v 24, 168/300), (v 28, 9/300), (v 29, 12/300), (v 30, 87/300), (v 31, 3/300), (v 32, 21/300)
(v 14, 4/230), (v 15, 55/230), (v 16, 14/230), (v 17, 70/230), (v 18, 87/230)
(v 24, 3/67), (v 26, 19/67), (v 27, 17/67), (v 28, 1/67) (v 32, 27/67)
(v 25, 12/21), (v 32, 9/21)
(v 24, 5/17), (v 29, 1/17), (v 31, 1/17), (v 32, 10/17)
(v 19, 25/99), (v 20, 74/99)
(v 24, 25/104), (v 25, 51/104), (v 27, 10/104), (v 28, 1/104), (v 29, 1/104), (v 31, 1/104), (v 32, 15/104)
(v 21, 52/67), (v 22, 14/67), (v 23, 1/67)
(v 24, 26/51), (v 25, 4/51), (v 29, 3/51), (v 30, 2/51), (v 31, 1/51), (v 32, 15/51)
"other cerebral area"
(v 24, 3/4), (v 32, 1/4)
(v 24, 19/55), (v 25, 3/55), (v 28, 4/55), (v 29, 5/55), (v 31, 8/55), (v 32, 16/55)
(v 24, 7/14), (v 28, 2/14), (v 29, 2/14), (v 32, 3/14)
(v 24, 37/70), (v 28, 13/70), (v 29, 4/70), (v 30, 2/70), (v 31, 3/70), (v 32, 11/70)
(v 24, 51/87), (v 28, 2/87), (v 29, 7/87), (v 30, 1/87), (v 31, 5/87), (v 32, 21/87)
(v 24, 22/25), (v 25, 1/25), (v 32, 2/25)
(v 24, 73/74), (v 32, 1/74)
(v 24, 35/52), (v 28, 1/52), (v 29, 1/52), (v 31, 3/52) (v 32, 12/52)
(v 24, 10/14), (v 27, 1/14), (v 32, 3/14)
(v 24, 1/1)
astrocytoma G1, G2 and optic pathway glioma
(v 24, 1)
(v 25, 1)
(v 26, 1)
(v 27, 1)
(v 28, 1)
(v 29, 1)
(v 30, 1)
astrocytoma G3, G4
(v 31, 1)
(v 32, 1)
Figure 2 indicates that medulloblastomas have been recorded in the database as occurring in several locations in the brain. Typically however, they are thought to arise from the cerebellum. The IV ventricle and the brain stem are adjacent to the cerebellum and are often invaded by these tumours making it impossible to be certain of the location from which the tumour originated. This illustrates the power of the belief network in dealing with this situation; the possibility of the tumour being a medulloblastoma is not discounted if it does not occur in the typical position.
Figure 2 also illustrates another feature of the network arising from the fact that a large amount of historical data was used: a small number of medulloblastomas were also recorded as being present in the parietal and temporal lobes, which are distant from the cerebellum. These cases may be due to an incorrect assumption being made at the time of diagnosis that a meta-static deposit from the primary medulloblastoma tumour was actually the primary tumour itself. Again this rare but known clinical scenario is well accounted for by the belief network.
To validate the method presented above, two simple classifiers were investigated using data from 46 recent patients forming part of an ongoing study of MRS of childhood brain tumours . Each patient had a tumour from one of seven classes: astrocytoma grade I and II (16 cases, v 24), medulloblastoma (13 cases, v 30), ependymoma (3 cases, v 29), germinoma (3 cases, v 27), PNET (3 cases, v 28), astrocytoma grade III and IV (2 cases, v 31), and "other" (6 cases, v 32).
The first classifier used only the belief network in the classification; assigning to each sample the label of the class with the highest probability as predicted by the network. This classifier had an error rate of 59%, compared with an error rate of 65% when using probabilities predicted by class prevalence.
The second classifier investigated the effect of using the network to augment a basic MRS classifier. Each of the 46 samples available was a short-echo time (30 ms) single voxel spectroscopy acquisition acquired on a Siemens Symphony 1.5T scanner. The free induction decay (FID) contained 1024 points and was sampled at 1000 Hz. Post-acquisition residual water was removed using the HSVD method  to model the water component ± 30 Hz either side of the water signal. Each FID was then Fourier transformed with no line-broadening to give the magnitude spectrum and then normalised to have unit length in the l 2-norm. The normalised spectra were feature-reduced using principal components analysis (PCA) to 10 dimensions. Gaussian functions were then used as the discriminant with mean estimated for each tumour class and a common estimate of the covariance matrix shared across all samples. Two scenarios were then investigated, prior probabilities based on class prevalence and prior probabilities using the belief network. Prior probabilities were applied by multiplying the value of each Gaussian discriminant. Classifier performance was measured using three metrics: apparent error, leave-one-out cross-validation error and the 632+ error rate estimator .
With prior probabilities based on class prevalence the apparent error rate was 20%, corresponding to a correct classification of 37 out of 46 tumours. However, the cross validation and 632+ error rates were 41% and 48% respectively, indicating a poor generalisation to unseen cases. With prior probabilities based on the belief network the apparent error rate was 15% corresponding to a correct classification of 39 out of 46 tumours, the cross validation and 632+ error rates were 32% and 37% respectively, indicating that the prior probabilities measurably improve the generalisation performance of the classifier. When using belief network prior probabilities, five of the incorrectly classified samples were the same as those incorrectly classified when using class prevalence priors, the remaining two (one ependymoma, one astrocytoma grade I/II) were correctly classified using prevalence information.
Breakdown of Classification Errors
Share of Error (Prevalence)
Share of Error (Belief Network)
astrocytoma G1, G2 v 24
medulloblastoma v 30
ependymoma v 29
germinoma v 27
PNET v 28
astrocytoma grade G3, G4 v 31
other v 32
The simple classifier presented here attempts only to demonstrate that the belief network method can be useful and that the data used for its construction is sufficiently accurate. The application of the belief network to other classifiers depends on the choice of classifier, but many classifiers have a natural way to use prior probabilities either directly or in the form of weights.
Data from a large clinical database, collected over five decades, was used to construct a Bayesian belief network suitable for generating probabilities of tumour class. The network was shown to enhance a simple probability-based classifier that uses PCA reduced raw MRS spectra for features. It is suggested that additional (discrete) information could be incorporated into the belief network to further enhance classifier performance.
Greg Reynolds holds a Ph.D. studentship funded by the EPSRC. Andrew Peet holds a Department of Health Clinician Scientist Award. The authors wish to acknowledge the help of the West Midlands Regional Childhood Tumour Registry, Birmingham Children's Hospital NHS Foundation Trust, in particular Sheila Parkes for providing the data in a manageable form. We thank the reviewers for their improving comments.
- Preul M, Caramanos Z, Collins D, Villemure J, Leblanc R, Olivier A, Pokrupa R, Arnold D: Accurate, noninvasive diagnosis of human brain tumors by using proton magnetic resonance spectroscopy. Nature Medicine. 1996, 2 (3): 323-325. 10.1038/nm0396-323.View ArticlePubMedGoogle Scholar
- Tate A, Majos C, Moreno A, Howe FA, Griffiths J, Arus C: Automated Classification of Shot Echo Time in In Vivo 1H Brain Tumor Spectra: A Multicenter Study. Magnetic Resonance in Medicine. 2003, 49: 29-36. 10.1002/mrm.10315.View ArticlePubMedGoogle Scholar
- Lukas L, Suykens J, Vanhamme L, Howe F, Majós C, Moreno-Torres A, van der Graaf M, Tate A, Arús C, Van Huffel S: Brain tumour classification based on long echo proton MRS signals. Artifical Intelligence in Medicine. 2004, 31: 73-89. 10.1016/j.artmed.2004.01.001.View ArticleGoogle Scholar
- Devos A, Lukas L, Suykens J, Vanhamme L, Tate A, Howe F, Majos C, Moreno-Torres A, van der Graff M, Arus C, Van Huffel S: Classification of brain tumours using short echo time 1H MR Spectra. Journal of Magnetic Resonance. 2004, 170: 164-175. 10.1016/j.jmr.2004.06.010.View ArticlePubMedGoogle Scholar
- Tate A, Underwood J, Acosta D, Julià-Sapé M, Majós C, Moreno-Torres A, Howe F, van der Graaf M, Lefournier V, Murphy M, Loosemore A, Ladroue C, Wesseling P, Bosson JL, nas MEC, Simonetti AW, Gajewicz W, Calvar J, Capdevila A, Wilkins P, Bell BA, Rémy C, Heerschap A, Watson D, Griffiths J, Arús C: Development of a decision support system for diagnosis and grading of brain tumours using in vivo magnetic resonance single voxel spectra. NMR in Biomedicine. 2006, 19: 411-434. 10.1002/nbm.1016.View ArticlePubMedGoogle Scholar
- Opstad K, Ladroue C, Bell B, Griffiths J, Howe F: Linear discriminant analysis of brain tumour 1H MR spectra: a comparison of classification using whole spectra versus metabolite quantification. NMR in Biomedicine.
- Galanaud D, Nicoli F, Chinot O, Confort-Gouny S, Figarella-Branger D, Roche P, Fuentès S, Le Fur Y, Ranjeva JP, Cozzone P: Noninvasive Diagnostic Assessment of Brain Tumours Using Combined In Vivo MR Imaging and Spectroscopy. Magnetic Resonance in Medicine. 2006, 55: 1236-1245. 10.1002/mrm.20886.View ArticlePubMedGoogle Scholar
- de Edelenyi FS, Rubin C, Rubin C, Exteve F, Grand S, Decorps M, Lefournier V, Bas JFL, Remy C: A new approach for analyzing proton magnetic resonance spectroscopic images of brain tumours: nosologic images. Nature Medicine. 2000, 6: 1287-1289. 10.1038/81401.View ArticlePubMedGoogle Scholar
- Simonetti AW, Melssen WJ, de Edelenyi FS, van Asten JJA, Heerschap A, Buydens LMC: Combination of feature-reduced MR spectroscopic and MR imaging data for improved brain tumor classification. NMR in Biomedicine. 2005, 18: 34-43. 10.1002/nbm.919.View ArticlePubMedGoogle Scholar
- Luts J, Heerschap A, Suykens J, Van Huffel S: A combined MRI and MRSI based multiclass system for brain tumour recognition using LS-SVMs with class probabilities and feature selection. Artificial Intelligence in Medicine. 2007, 40 (2): 87-102. 10.1016/j.artmed.2007.02.002.View ArticlePubMedGoogle Scholar
- Kleihues P, Burger P, Scheithauer B: The new WHO classification of brain tumours. Brain Pathology. 1993, 3 (3): 255-68.View ArticlePubMedGoogle Scholar
- Russell S, Norvig P: Artifical Intelligence : A Modern Approach. 2003, New Jersey: Prentice HallGoogle Scholar
- Han J, Kamber M: Data Mining : Concepts and Techniques. 2001, San Francisco: Morgan KaufmanGoogle Scholar
- Peet AC, Lateef S, Natarajan K, Sgouros S, Grundy RG: Short Echo Time 1H Magnetic Resonance Spectroscopy of Childhood Brain Tumours. Diseases of the Childs Nervous System. 2007, 23: 163-169. 10.1007/s00381-006-0206-4.View ArticleGoogle Scholar
- Barkhuijsen H, de Beer R, Ormondt DV: Improved Algorithm for Noniterative and Time-Domain Model Fitting to Exponentially Damped Magnetic Resonance Signals. Journal of Magnetic Resonance. 1987, 73: 553-557.Google Scholar
- Efron B, Tibshirani R: Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Assocation. 1997, 92 (438): 548-560. 10.2307/2965703.Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/7/27/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.