A combination of 3-D discrete wavelet transform and 3-D local binary pattern for classification of mild cognitive impairment

Background The detection of Alzheimer’s Disease (AD) in its formative stages, especially in Mild Cognitive Impairments (MCI), has the potential of helping the clinicians in understanding the condition. The literature review shows that the classification of MCI-converts and MCI-non-converts has not been explored profusely and the maximum classification accuracy reported is rather low. Thus, this paper proposes a Machine Learning approach for classifying patients of MCI into two groups one who converted to AD and the others who are not diagnosed with any signs of AD. The proposed algorithm is also used to distinguish MCI patients from controls (CN). This work uses the Structural Magnetic Resonance Imaging data. Methods This work proposes a 3-D variant of Local Binary Pattern (LBP), called LBP-20 for extracting features. The method has been compared with 3D-Discrete Wavelet Transform (3D-DWT). Subsequently, a combination of 3D-DWT and LBP-20 has been used for extracting features. The relevant features are selected using the Fisher Discriminant Ratio (FDR) and finally the classification has been carried out using the Support Vector Machine. Results The combination of 3D-DWT with LBP-20 results in a maximum accuracy of 88.77. Similarly, the proposed combination of methods is also applied to distinguish MCI from CN. The proposed method results in the classification accuracy of 90.31 in this data. Conclusion The proposed combination is able to extract relevant distribution of microstructures from each component, obtained with the use of DWT and thereby improving the classification accuracy. Moreover, the number of features used for classification is significantly less as compared to those obtained by 3D-DWT. The performance of the proposed method is measured in terms of accuracy, specificity and sensitivity and is found superior in comparison to the existing methods. Thus, the proposed method may contribute to effective diagnosis of MCI and may prove advantageous in clinical settings.

person-year. Researchers have identified the following changes in the autopsy studies of people suffering from MCI [6]: Abnormal clusters of beta-amyloid protein (plaques) Microscopic protein clumps of tau characteristic of AD (tangles) Lewy bodies, which are microscopic clumps of another protein associated with forms of dementia like AD and Small strokes or reduced blood flow through brain blood vessels.
MCI can be detected by either clinical tests or by brain scans. A medical professional determines the presence or absence of MCI by evaluating a person's cognitive and behavioural changes and by using professional judgement about the possible causes and severity of the symptoms [6]. Some of the common clinical tests used for the detection of MCI are Mini Mental State Examination, Clock Test, Logical Memory, Rey Auditory Verbal Learning Test, Digit Span Category Fluency Tests, Trail Making Test A-B, Boston Naming Test, American National Adult Reading Test, Alzheimer's Disease Assessment Scale-Cognitive Behaviour, Geriatric Depression Scale and Functional Assessment Questionnaire [7,8]. In recent years, the veracity of the brain imaging techniques has been used for the classification of a) MCI and controls and b) MCI-C and MCI-NC.
Manual assessment of MCI requires more time, resources and expertise, which is highly inconvenient and costly to the patient. In the past two decades, detection of MCI has drawn the attention of Image Processing (IP) and Machine Learning (ML) community. ML based methods can be used to distinguish MCI patients from AD patients or controls. These methods require lesser manual intervention of experts and may be less costly. Moreover, these techniques provide better visualization of the huge data compared to manual methods.
The s-MRI can be used to access the structural changes in the brain associated with MCI. Researchers have used cortical atrophy for diagnosing MCI [27]. Moreover, the regions of the brain namely hippocampus, amygdala and ehorhinal cortex have found to be important in the diagnosis of MCI [28]. Researchers [27][28][29] have explored the Regions of Interest (ROIs) based analysis for automatic MCI diagnosis. These ROIs have been determined either by predefinition or by adaptive parcellation. These methods can be segregated into single ROI methods and multiple ROIs methods. The hippocampal volume was used to discriminate MCI and NC patients by Chupin [15]. The combination of hippocampus features and cerebrospinal fluid (CSF) volume was used for this task by Ahmed et al. [23]. Magnin et al. [26] used 90 features to represent 90 ROIs of the whole brain, where each feature describes relative weight of GM compared to WM and CSF. These approaches miss out the effect of the other regions on the disease and also subdue the fact that the regions of the brain are interconnected.
Texture classification acts like a significant protagonist in the bids of computer vision like image retrieval, video retrieval, and medical diagnosis [30]. In order to carry out texture classification, it is essential to extract good features from an image to distinguish diverse textures. Texture analysis methods like Gray Level Co-occurrence Matrices (GLCM) [31,32], Discrete Wavelet Transform (DWT) [33], Local Binary Pattern (LBP) [34] etc. have been used in medical image analysis. GLCM measures the average degree correlation between pairs of pixels in different aspects [32]. The discrimination capabilities of GLCM depends on the choice of the separation distance between pixels, which is difficult to ascertain. The Wavelet Transform (WT) allows localization in both spatial and local transients like surfaces in 3D volumes, which help in apprehending finer minutiae of brain MRI data present in different directions. However, the number of features obtained by the DWT is huge. LBP, another popular feature extraction technique, can be used to gauge the statistical and structural information to represent the image [34]. It captures the underlying distribution of various microstructures like edges etc. [35][36][37]. Moreover, it also represents original data with lesser number of features. However, it does not retain the spatial distribution of different patterns present in the image.
In this paper, in order to generate a rich representation of anatomical structures, which will be more discriminative to separate different groups of subjects, a combination of 3D-DWT and a variant of 3D-LBP is proposed. First level decomposition of an MRI using 3D-DWT provides one approximate and seven detailed components. Each detailed component captures different orientation of micro-structures. However, the number of features obtained from the seven detailed components is large. Each of these seven detailed components is represented compactly using 3D-LBP.
The 3D-LBP with 18 neighbours results in 262,144 features. Further reduction in the number of features can be done by applying the rotation invariant and uniform variants of the LBP. We have investigated i) basic 3D-LBP, ii) rotation invariant 3D-LBP and iii) uniform 3D-LBP and out of these the one which gives the best result has been clubbed with 3D-DWT. To the best of our knowledge, no research work carried out till date has applied a combination of 3D-DWT and 3D-LBP on the s-MRI data to distinguish MCI-C from MCI-NC and MCI from CN. Further, Fisher Discriminant ratio (FDR) is applied to determine a set of relevant features and the well-known Support Vector Machine is used to develop a decision model. The performance of the proposed method is compared with existing methods on a publicly available ADNI data.

ADNI database
The datasets supporting the conclusions of this article are available in the Alzheimer's Disease Neuroimaging Initiative (ADNI) repository, Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether Magnetic Resonance Imaging (MRI), PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.

Retrieval of data
The ADNI database was queried for controls (CN), those converted to AD (MCI-C) and those not converted to AD (MCI-NC). The protocol of data selection and image acquisition of the subjects takes into consideration age matching, appropriate number of slices, required parameters etc. and has been adopted from paper [38]. However, the data of all the CN subjects was not available and hence more CN subjects were selected from the database. This study uses 75 MCI-C, 89 CN and 112 MCI-NC processed NIFTI images of patients. The MCI-NC patients ranged between the ages 56 and 88. The controls were of the ages between 63 and 90 and MCI-C patients ranged between the ages 55 and 87. All the patents had Mini Mental State Examination score between 18 and 27 and a CDR of 0.5 or 1. The T1 weighted s-MRI images collected had the following field strength: 1.5 Tesla, TE = 3.6099 ms, TE = 3000 ms. Table 1 shows the relevant data of the patients.

Pre-processing
In the literature, Structural Magnetic Resonance Imaging (s-MRI) has been widely used to detect MCI. However, before extracting features from an image, pre-processing is required. For this, Statistical Parametric Mapping (SPM) is used [39].
The pre-processing of the s-MRI images includes the following: a) Slice time correction, which is required if the temporal dynamics of evoked responses are important. This was performed as it may improve the performance; b) Head motion correction, which is referred to as realignment; c) Spatial Normalization, which is the coregistration with the standard MNI template in order to overcome brain shape variability; d) Special smoothening, in which the weighted average of the neighbouring voxels is found and the intensity value of a voxel is replaced by it and e) Tissue segmentation, which is the segregation of brain tissues into three tissue classes namely gray matter, white matter and cerebro-spinal fluid. In the literature, it has been found that gray matter atrophy is responsible for Mild Cognitive Impairments [40,41]. For this reason, gray matter is used for building a ML based model to diagnose MCI. The most common method to measure differences in local concentrations of brain tissue is Voxel Based Morphometry (VBM). In VBM a voxel-wise comparison of the local concentration of gray matter between the two groups of subject is carried out [42]. However, in VBM also, pre-processing is required which includes registration to a standard template, followed by smoothening and segmentation.

3D-discrete wavelet transform
The Fourier Transform (FT) is commonly used to determine the frequency spectrum of a signal, for better analysis of the signal. However, in the case of a nonstationary signal, FT is not of much use. For signals where time localization of spectral components is needed, one solution is to adopt Short-Time-Fourier-Transform (STFT) to get frequency components of local time intervals of fixed duration. However, in the case of signals having non-periodic fast-transitions (i.e. high frequency content for short duration), wavelet transform (WT) is suggested to be a better option in literature [43]. The WT analyses a signal at different frequencies with different resolutions, which makes it to be an excellent tool for the analysis of transient signals. WT can be categorized on the basis of orthogonal property of wavelet into Continuous wavelet transform (CWT), which uses non-orthogonal wavelet and Discrete Wavelet Transform (DWT) which uses orthogonal wavelet. The CWT is often used to characterize singularities in functions, but disadvantages like infinite number of wavelets, redundancy, no analytical solutions for most of the functions in CWT, etc. make it difficult to use [43]. Hence, DWT is commonly used in literature.
There are two filters involved in the analysis bank of DWT, one is the wavelet (detailed) filter, and the other is the scaling (averaging) filter. Wavelet expansion of a discrete function f(x) in terms of wavelet ψ(x) and scaling function τ(x) is defined as follows [43]: where 1 ffiffi ffi A p is normalizing factor, P is the decomposition level, ψ p, a (x) are detailed or wavelet coefficients and τ P, a (x) are averaging or scaling coefficients are discrete functions in and where a ¼ f 0; 1; 2; …; A 2 p −1g [44]. The scaling and detailed coefficients are computed as: 1D DWT can be extended to 3D DWT for 3D brain volumes. In 3D DWT [44], we have one 3D approximate coefficient (scaling function) τ(a, b, c) and seven 3D detailed coefficients ψ i (l, m, n), where i ∈ {1, 2, …, 7}. The function τ(a, b, c) in 3-D, is the product of τ(a), τ(b) and τ(c). Also, each ψ i (a, b, c) is the product of all seven possible combinations of 1-D τ and ψ, with at least one ψ. The above functions help us to find W i ψ ðp; a; b; c Þ. The functions have been defined as follows [44]: Wavelet expansion of 3D image volume of size A × B × C can be expressed as: ð14Þ τ P;a;b;c l; m; n ð Þ¼2 P 2 τ 2 P l−a; 2 P m−b; 2 P n−c This study uses the 'db2' wavelet function.

Local binary pattern and its variants
Local Binary Pattern (LBP) is a common feature extraction technique [35]. The technique can be used to gauge the statistical and structural information [35]. The LBP values capture the underlying distribution of various microstructures like edges etc. The LBP value of each pixel is calculated by comparing the pixel value with the intensity of its neighbours using the following formula. where Here, Q is the total number of neighbours and R is the radius from the central pixel, I center . After the computation of LBP value for each pixel, a histogram of 2 Q bins is obtained.
The 2-dimensional LBP can be extended to 3dimension as follows. For each voxel, its intensity is compared with the intensity of the neighbouring voxels. The 18 neighbouring voxels present at the Axial, Coronal, Sagittal and diagonal planes have been considered.
This work considers 18 neighbours and threshold their intensity with respect to the central one, I center .
The above technique results in a histogram with 262, 144 bins. This method henceforth would be referred to as the LBP-3D method.
It may be further noted that the number of bins in the histogram, so obtained, is humungous. This can be further reduced by applying the uniform and rotation invariant methods in 3D. The Uniform LBP is one that contains at most two '0 to 1' or '1 to 0' transitions [35,36]. In case of an n-bit binary number (n-neighbourhood), there are n × (n − 1) + 3 uniform binary patterns. The corresponding histogram of a uniform LBP would therefore contain a lesser number of bins as compared to the conventional LBP, which contain 2 n bins. In this work, 18 neighbourhoods have been used, so by the application of uniform LBP, the number of features is reduced to 309. This method would be henceforth referred to as LBP-309.
The uniform-rotation-invariant LBP considers a pattern and all the patterns obtained by shifting the given pattern one bit to the right till the same pattern is obtained [35]. For an n-bit number, the uniform-rotation-invariant LBP has (n + 2) bins thus reducing the number of features to a great extent. For example, in case of 18 neighbourhoods in 3-D, the pattern can be described as a 18-bit number and the number of features to represent each slice 20 in the case of the rotation-invariant version. This method would henceforth be referred to as LBP-20.

3D DWT + LBP-20
The experiments bring forth the point that the performance of LBP-20 exceeds the rest of the variants in terms of specificity, accuracy and sensitivity. Moreover, the number of features in LBP-20 is despondently low. For this reason, this work proposes a combination of 3D-DWT and LBP-20 for extracting features. This method would be henceforth referred to as 3D-DWT + LBP-20. Feature Selection.
Some features extracted in brain imaging may be redundant or noisy and even negatively affect the performance of the decision model. This makes the feature selection an indispensable step before performance classification. Fisher Discriminant Ratio (FDR) is a simple and effective method which measures the discrimination power for a given feature between data of two different classes [45]. The FDR score of the i th feature is calculated as follows: Where, m i 1 is the mean of samples, of the i th feature, that belong to the first class; m i 2 is the mean of samples, of the i th feature, that belong to the second class, σ i 1 is the standard deviation of samples, of the i th feature, that belong to the first class; σ i 2 is the standard deviation of samples, of the i th feature, that belong to the second class.
Feature with a higher value is considered to be more relevant. Hence, the calculation of the FDR values of features is followed by the arrangement of the features in descending order of FDR values.

Classification and evaluation
For a two-class problem, the Support Vector Machine crafts a hyperplane, which separates the data points of the two classes by maximum margin [46]. For the purpose of classification, linear kernel is used. To evaluate the performance of the proposed system, accuracy (ACC), sensitivity (SEN), and specificity (SPE) are used, which are calculated as follows: where FP, FN, TP and TN and are the number of false positive samples, false negative samples, true positive samples and true negative samples and respectively. In order to calculate the performance of the proposed decision system and to set the system parameters, nested 10-fold validation scheme is used. In this scheme, the data is divided randomly into 10 equal sized subsets. In each fold 9 subsets are used for training and validation and one subset is used for testing. This process is repeated 10 times and each subset is used exactly once for testing. This constitutes outer folds. In each outer fold, 10 inner fold are made such that the training and validation data is further divided into 10 equal subsets. In each inner fold 9 subsets are used for training data and one for validation. The experimental results of the inner folds are used for setting the system parameters.
In the experiment, for each feature the average accuracy was found over the outer 10 CV-Fold. The variation of this average accuracy with the number of features was then noted. The number of features for which the performance is best is then reported. The average and the standard deviation are then reported as the performance of the proposed system.
The procedure for the proposed framework is summarized as follows. It is also depicted in Fig. 1.

Diagnosis of MCI
For the given data 1) Divide the data into train and test set. 2) Perform the following computation for each MRI volume of the train set: a. Extract 7 detailed components from each MRI using 3D-DWT. b. Obtain features from each of the 7 detailed components using 3D-LBP-20.
c. Concatenate the features obtained from the 7 volumes to obtain 140 features. 3) Apply FDR to order the features obtained from the step 2 in terms of their relevance (decreasing values of their FDR values). The indices so obtained would be used in testing phase. 4) Train the model using the train set. 5) For the test set apply all the steps of 2. Use the indices obtained in step 3) to represent the MRI volume. Obtain accuracy, specificity and sensitivity of the proposed method.  Table 2.

Discussion
From the results presented in Table 2, the following can be inferred: The combination of 3D-DWT with LBP20 after a certain number of features gives better performance than individual feature extraction methods such as 3D-  The comparison of the accuracy, specificity and sensitivity of the proposed model with the existing models is presented in Table 3. It can be observed from the table that the proposed method performs well not only in terms of classification accuracy but also in terms of specificity and sensitivity for both i) MCI-C vs MCI-NC and ii) MCI vs CN.
Although each of the seven detailed components obtained with the application of 3D-DWT captures localized edges of different orientation, the number of Fig. 2 The comparison of the normalized features obtained with LBP-20 and 3D-DWT+ LBP-20: Here, DWT1, DWT2 etc. are the seven detailed components obtained using the 3D-DWT features so obtained is quite large. Because of this, it requires large amount of memory and computation time to build a decision model. On the other hand, LBP-20 provides global distribution of patterns present in MRI volume with few features but omits finer details. The combination of 3D-DWT and LBP-20 brings advantages of the two methods. With the use of combination, the MRI volume is compactly represented in terms of distributions of different patterns from each of the seven detailed components obtained from 3D-DWT without omitting any relevant details. Hence, the combination of two methods provides better distinguishing power to differentiate two different types of MRI volumes. This is also reflected in our empirical analysis.
To understand the better performance of the proposed technique in comparison to LBP-20, we initially normalized the individual features of both the methods i.e. LBP-20 and 3D-DWT + LBP-20. Subsequently the average value of individual feature is computed class wise, separately for both the methods. The comparison of the two methods to distinguish the two classes is shown in Fig. 2. The maximum difference between the values, in case of LBP-20 is 0.302 whereas the maximum difference between the values, in case of 3D-DWT + LBP20 is 0.613. This may be attributed to the ability of capturing more relevant patterns to distinguish the data of the two classes using the proposed method in comparison the LBP-20. It may also be noted that there are two or three peaks in most of the graphs, indicating that the corresponding bins may contribute to the classification of the two classes owing to a marked difference between the values of the two classes.

Conclusion
In this paper, 3D variants of LBP called LBP-3D, LBP-309 and LBP-20 have been proposed. LBP-20 gives a better performance among the three variants of 3D-LBP and is combined with 3D-DWT in the proposed model, 3D-DWT + LBP-20, to extract relevant features from MRI for the classification between i) MCI-C and MCI-NC and ii) MCI and CN. The experimental results on publicly available ADNI datasets show that the proposed pipeline is quite effective to distinguish between the above-mentioned classes. It is also noted that the proposed combination of 3D-DWT and LBP-20 provides a better performance in comparison to 3D-LBP and its variants and it also performs better as compared to 3D-DWT in terms of specificity, accuracy and sensitivity. Also, the proposed method provides a superior performance with lesser number of features in comparison to the existing methods. This is attributed to the representation of MRI volume in terms of relevant and compact features, which are obtained with the application of LBP-20 on each of the seven detailed components of 3D.
In the future, the proposed method will be extended for multi-class classification. Also, in the future works, multivariate methods will be used for feature selection. Moreover, the analysis will be extended for other methods capable of finding out a smaller subset of relevant features which are more discriminative of the above-mentioned classes. As per Antoine Marie Jean-Baptiste Roger "It is only with the heart that one can see rightly; what is essential is invisible to the eye." The above investigation suggests that "heart" can be replaced by "pertinent feature extraction."