BMC Medical Informatics and Decision Making

Background: Numerous methods for classifying brain tumours based on magnetic resonance spectra and imaging have been presented in the last 15 years. Generally, these methods use supervised machine learning to develop a classifier from a database of cases for which the diagnosis is already known. However, little has been published on developing classifiers based on mixed modalities, e.g. combining imaging information with spectroscopy. In this work a method of generating probabilities of tumour class from anatomical location is presented.


Background
The current "gold standard" for brain tumour diagnosis is histopathology which requires a sample of tumour obtained at operation. These operations have an inherent risk of morbidity and mortality. Magnetic Resonance Imaging (MRI), Magnetic Resonance Spectroscopy (MRS) and other imaging modalities may offer a non-invasive way of making a diagnosis, but no method has yet attained sufficient accuracy to replace histopathology. MRS in particular has been shown to provide useful information about the biochemical content of a brain tumours [1] and numerous methods for classifying brain tumours based on magnetic resonance spectra have been presented [2][3][4][5][6].
When making a classification decision it is intuitively sensible to use as much relevant information as possible, but very few of the published classifiers have attempted to combine information from different modalities and sources (but see [7][8][9][10]). This work details a method which uses data from the West Midlands Regional Childhood Tumour Registry (WMRCTR) to produce probabilities of brain tumour class, given its anatomical location.
The WMRCTR provides data from the last five decades on over 1700 childhood cancer patients, mostly in free-text form. During that period the format of the stored data has changed: knowledge of the exact anatomical location has improved with the advent of MRI and the classification scheme for tumours has changed to the WHO [11] system. This presents a considerable challenge to its use in computer-based systems.
The discriminating power of "anatomical location" as a feature for a classifier is not sufficient to make classifications based on this variable alone. However it is envisaged that the probabilities obtained from the WMRCTR data could be used as "informative priors" in existing classification methods. In this work we demonstrate their impact on a simple MRS based classifier. It is worth emphasising that this work focusses on paediatric brain tumours, which are significantly more varied and more difficult to diagnose using MRI alone, than those in adults.
The approach to using the WMRCTR data is based on a graphical representation of Bayesian inference called belief networks. Since anatomical location and tumour class are discrete random variables, probabilities can be estimated directly from the data, without the need to rely on assumptions about the form of probability density functions. In the following sections we introduce the belief network method, present some examples and discuss the construction of the final network from the data in the WMRCTR. Finally, the network is presented and demonstrated on some test-cases.

Belief networks
A Bayesian belief network or often just belief network is a graphical representation of the joint probability distribution function of a collection of variables [12,13]. A belief network makes exactly the same inferences as would be made by applying Bayes' rule to a series of probabilities, but the graphical construction often provides insight into the problem. The network is represented as a weighted, acyclic, directed graph, each vertex representing a discrete variable/event (see Figure 1). To use the terminology of Russell and Norvig [12], each of these vertices fall into one of three categories: 1. query variables, i.e. events the probability of which is of interest (in Figure 1 these are "medulloblastoma" and "astrocytoma"); 2. evidence variables, i.e. events known to have occurred (in Figure 1 these are taken to be "posterior fossa" and "supratentorial"); 3. hidden variables, i.e. events which may occur but cannot be measured (in Figure 1 these are taken to be "IV ventricle" and "cerebellum").
The weighted edges connecting vertices represent the probability that the target vertex is true, conditioned on the source vertex. Here, vertices represent anatomical locations or tumour types. If an edge connects two anatomical locations then its weight is the probability that the tumour was in the target vertex, given that it is known to be in the source vertex. If an edge connects an anatomical location to a tumour type then its weight is the probability that the tumour is of the type specified by the target vertex, given that it is known to have occurred in the source vertex.
A simplified belief network showing conditioned probabilities of events Figure 1 A simplified belief network showing conditioned probabilities of events. The vertex numbers are shown in brackets, refer to the adjacency matrix representation in (1). The numbers shown are purely for pedagogical purposes, for the correct and complete graph refer to Table 1 and Figure 2. To illustrate the utility of the method, consider the following example. Referring to Figure 1, suppose it is known that the tumour occurs in the posterior fossa (vertex v 1 ) and the probability that the tumour is a medulloblastoma is sought. Working backward from vertex v 5 (the medulloblastoma) and applying Bayes' rule: Of course, if it was known that the tumour occurred in the IV ventricle then the expression could have stopped there and the probability obtained by inspection, thus hidden variables may sometimes be evidence variables depending on the particular sample. The important point is that different "resolution" information can be used, this is particularly important if a tumour spans several regions, as will be discussed later. It also means that data of lower resolution can be incorporated into the network. For example, many tumours in the WMRCTR are just listed as having location: "posterior fossa". Working back from the query variables is relatively complicated to implement. An easier and equivalent way is to work forward from the evidence variables. As such it is convenient to represent a graph as an adjacency matrix, for the example in Figure 1 this is: Each element a ij of A refers to the weighted connection from vertex v i to v j , i.e. the row index refers to the source vertex, the column index to the target.
The adjacency matrix representation permits easy calculation of the vector of class probabilities, given knowledge of which evidence variables to use. To find the probabilities of each class, given any evidence variable the following procedure is used (more computationally efficient methods are given in [12]): 1. Construct the n-dimensional column vector x where n is the number of vertices (variables) and set all the elements to zero.
2. Set the single element of x that corresponds to the evidence variable known be to true, to one.
3. Compute x ← A T x until x stops changing. At every iteration, those vertices connected to those with non-zero entries x will become non-zero.
4. Those elements of x corresponding to output variables have the probability that the tumour belongs to each class, given the evidence. All other elements of x will be zero.
Clearly, the axioms of probability require that the sum of all elements in x is unity. It is important to note that the terminating vertices (v 5 and v 6 in Figure 1) need to be connected to themselves so that the method just described will converge to the correct value. If they are not present, x will converge to the zero vector. As well as giving probabilities of class membership given a single location, tumours that span adjacent anatomical regions can also be considered; for every region in which the tumour is present compute the output vector, then average these to produce the final vector of probabilities. Although this is an intuitively sensible property, it has not been evaluated in this work.

Data processing
The WMRCTR database was made available as a spreadsheet giving hand-typed strings for the diagnosis and location of each case. Occasionally, grade of tumour was also specified. In total there were 1712 cases available. By hand, each record was examined and modified. If the tumour type was classified with the WHO system, it was left unchanged. If a WHO equivalent existed for a tumour classified using the old scheme, then it was changed; otherwise the record was removed. This reduced the number of cases to 1367. These cases were then further reviewed, and only those tumours with location specified were included, reducing the final number of cases used to 1333. Of these, the site of the primary lesion was often only specified vaguely, e.g. "posterior fossa" or "cerebrum". Tumours specified with a greater degree of accuracy were then grouped (by hand) under these broader headings, as well as maintaining their original information. A graph showing the anatomical distinctions made is given in Figure 2. Very occasionally, the location was specified in great detail (e.g. "foramen of Munro") but this was very rare and these samples were marked as being in the appropriate containing location.
The classes used were: astrocytoma grades I and II, astrocytoma grades III and IV, ependymoma, pineoblastoma, PNET, germinoma, medulloblastoma, craniopharyngioma and a group representing rare tumours, "other" (classes represented by fewer than 15 cases). It is common practice to group paediatric astrocytomas of different grades in this way as they thought to be very similar diseases.

Results and discussion
The simplest method to represent the results would be an adjacency matrix, but this is too large for publication in its direct format. Instead, the graph is specified by Table 1; an adjacency list representation. A subgraph of the final belief network is shown in Figure 2, giving the paths necessary to generate a probability for a medulloblastoma. Figure 2 indicates that medulloblastomas have been recorded in the database as occurring in several locations in the brain. Typically however, they are thought to arise from the cerebellum. The IV ventricle and the brain stem are adjacent to the cerebellum and are often invaded by these tumours making it impossible to be certain of the location from which the tumour originated. This illustrates the power of the belief network in dealing with this situation; the possibility of the tumour being a medulloblastoma is not discounted if it does not occur in the typical position. Figure 2 also illustrates another feature of the network arising from the fact that a large amount of historical data was used: a small number of medulloblastomas were also recorded as being present in the parietal and temporal The probabilities (weights) for each connection are expressed as a fraction, giving the final quantities obtained from the WMRCTR. For example, 222 cases of the 631 tumours in the posterior fossa, were in the brain stem.
lobes, which are distant from the cerebellum. These cases may be due to an incorrect assumption being made at the time of diagnosis that a meta-static deposit from the primary medulloblastoma tumour was actually the primary tumour itself. Again this rare but known clinical scenario is well accounted for by the belief network.
The first classifier used only the belief network in the classification; assigning to each sample the label of the class with the highest probability as predicted by the network. This classifier had an error rate of 59%, compared with an error rate of 65% when using probabilities predicted by class prevalence.
The second classifier investigated the effect of using the network to augment a basic MRS classifier. Each of the 46 samples available was a short-echo time (30 ms) single voxel spectroscopy acquisition acquired on a Siemens Symphony 1.5T scanner. The free induction decay (FID) contained 1024 points and was sampled at 1000 Hz. Postacquisition residual water was removed using the HSVD method [15] to model the water component ± 30 Hz either side of the water signal. Each FID was then Fourier transformed with no line-broadening to give the magnitude spectrum and then normalised to have unit length in the l 2 -norm. The normalised spectra were feature-reduced using principal components analysis (PCA) to 10 dimensions. Gaussian functions were then used as the discriminant with mean estimated for each tumour class and a common estimate of the covariance matrix shared across all samples. Two scenarios were then investigated, prior probabilities based on class prevalence and prior probabilities using the belief network. Prior probabilities were applied by multiplying the value of each Gaussian discriminant. Classifier performance was measured using three metrics: apparent error, leave-one-out cross-validation error and the 632+ error rate estimator [16].
With prior probabilities based on class prevalence the apparent error rate was 20%, corresponding to a correct classification of 37 out of 46 tumours. However, the cross validation and 632+ error rates were 41% and 48% respectively, indicating a poor generalisation to unseen cases. With prior probabilities based on the belief network the apparent error rate was 15% corresponding to a correct classification of 39 out of 46 tumours, the cross validation and 632+ error rates were 32% and 37% respectively, indicating that the prior probabilities measurably improve the generalisation performance of the classifier. When using belief network prior probabilities, five of the incorrectly classified samples were the same as those incorrectly classified when using class prevalence priors, the remaining two (one ependymoma, one astrocytoma grade I/II) were correctly classified using prevalence information.
The distribution of the error rate among the classes was approximately the same for both methods of generating prior probabilities, although there were slight differences. Complete results are presented in Table 2. In nearly all incorrectly classified cases, the class with the second highest posterior probability was the correct class; although this was true for both methods of generating priors. In the incorrectly classified cases, the difference in posterior probability between the predicted label and the correct Each percentage is the apportionment of total classification errors attributable to each class, obtained over the 920 trials used to estimate the 632+ error.
Part of the complete belief network, showing the locations common to all tumour types but just one tumour classifica-tion path Figure 2 Part of the complete belief network, showing the locations common to all tumour types but just one tumour classification path. The complete specification, including weights and paths for all tumour types covered is shown in Table 1.
label's probability was small (≈ 0.01) for about half the misclassifications and large (≈ 0.4) for the other half; this was also true for both methods of generating prior probabilities.
The simple classifier presented here attempts only to demonstrate that the belief network method can be useful and that the data used for its construction is sufficiently accurate. The application of the belief network to other classifiers depends on the choice of classifier, but many classifiers have a natural way to use prior probabilities either directly or in the form of weights.

Conclusion
Data from a large clinical database, collected over five decades, was used to construct a Bayesian belief network suitable for generating probabilities of tumour class. The network was shown to enhance a simple probabilitybased classifier that uses PCA reduced raw MRS spectra for features. It is suggested that additional (discrete) information could be incorporated into the belief network to further enhance classifier performance.