Texture features in the Shearlet domain for histopathological image classification

Background A various number of imaging modalities are available (e.g., magnetic resonance, x-ray, ultrasound, and biopsy) where each modality can reveal different structural aspects of tissues. However, the analysis of histological slide images that are captured using a biopsy is considered the gold standard to determine whether cancer exists. Furthermore, it can reveal the stage of cancer. Therefore, supervised machine learning can be used to classify histopathological tissues. Several computational techniques have been proposed to study histopathological images with varying levels of success. Often handcrafted techniques based on texture analysis are proposed to classify histopathological tissues which can be used with supervised machine learning. Methods In this paper, we construct a novel feature space to automate the classification of tissues in histology images. Our feature representation is to integrate various features sets into a new texture feature representation. All of our descriptors are computed in the complex Shearlet domain. With complex coefficients, we investigate not only the use of magnitude coefficients, but also study the effectiveness of incorporating the relative phase (RP) coefficients to create the input feature vector. In our study, four texture-based descriptors are extracted from the Shearlet coefficients: co-occurrence texture features, Local Binary Patterns, Local Oriented Statistic Information Booster, and segmentation-based Fractal Texture Analysis. Each set of these attributes captures significant local and global statistics. Therefore, we study them individually, but additionally integrate them to boost the accuracy of classifying the histopathology tissues while being fed to classical classifiers. To tackle the problem of high-dimensionality, our proposed feature space is reduced using principal component analysis. In our study, we use two classifiers to indicate the success of our proposed feature representation: Support Vector Machine (SVM) and Decision Tree Bagger (DTB). Results Our feature representation delivered high performance when used on four public datasets. As such, the best achieved accuracy: multi-class Kather (i.e., 92.56%), BreakHis (i.e., 91.73%), Epistroma (i.e., 98.04%), Warwick-QU (i.e., 96.29%). Conclusions Our proposed method in the Shearlet domain for the classification of histopathological images proved to be effective when it was investigated on four different datasets that exhibit different levels of complexity.

stored in digital form [2]. The inspection of histological slides manually by a histopathologist is indispensable. However, computational techniques from image processing and machine learning can be of a great asset in the field of histopathology to assist in applying pre-screening/classification of easy cases. Therefore, more time can be consumed in studying the challenging histological slides. More importantly, computer-assisted diagnosis in histopathology can play a significant role in minimizing (and ultimately eradicating) man-made mistakes, e.g. by the pathologist [3].
Therefore, the early identification of cancer is crucial for the pathologist to propose an appropriate treatment for the patients. The process of histopathological tissue classification is tackled in different ways. We divide our review of such techniques into three groups: texturebased, Shearlet-based, and deep feature-based methods.
Texture-based techniques are frequently investigated for the analysis and classification of histopathological tissues. For instance, Kather et al. [4] proposed computing various texture features and classify colorectal cancer histology using SVM (i.e., using 10-fold cross-validation) into eight classes. The fusion of different texture features delivered an accuracy of 87.4%.
Similarly, Linder et al. [5] investigated a different number of descriptors: LBP, Haralick texture attributes, and Gabor filters to classify their introduced dataset which is called, Epistroma. As such, those extracted descriptors are fed to an SVM model to distinguish between epithelium and stroma tissues. Comparably, Spanhol et al. [3] established a new dataset called, breast cancer histopathology dataset (BreakHis). This dataset consists of benign and malignant tissues. Spanhol et al. used different techniques to classify BreakHis tissues into benign or malignant.
Bruno et al. [6] proposed applying LBP on the curvelet coefficients of the transformed image)(i.e., authors used the Digital Database for Screening Mammography, Breast Cancer Digital Repository, and UCSB biosegmentation benchmark). To reduce the number of descriptors, the authors used statistical analysis of variance (ANOVA). In comparison to Bruno et al., we also use descriptors computed from a directional wavelet transform, but we demonstrate that it is advantageous to integrate various descriptors computed in the Shearlet domain to create the feature space. A more relative idea to our technique is proposed by Ribeiro et al. [7]. Ribeiro et al. computed descriptors from both spatial images and curvelet coefficients to classify colorectal histology tissues. In contrast to Ribeiro et al., we utilized both magnitude and phase coefficients of the complex Shearlet domain. A similar approach to ours is proposed by Vo et al. [8] who extracted both, phase and magnitude descriptors for textured image retrieval, but applied to non-medical texture image samples.
The Shearlet transform has been previously used in different studies. Such a transform has the advantage of constructing an anisotropic system of a wavelet. However, only the magnitude coefficients are utilized for the classification of textured images. For instance, He et al. [9] classified textured images by proposing Shearlet-based descriptors. Authors in this study, quantize and encode the local energy descriptors computed from the Shearlet coefficients. Thereafter, the energy histograms of all levels are cumulated to form the image characteristics. Instead, Zhou et al. [10] utilized only specific levels of the decomposition of the Shearlet domain for breast tumor ultrasound image classification.
Dong et al. [11] suggested a technique for textured images classification and retrieval, where the dependencies of adjacent Shearlet subbands are modeled using linear regression. To represent the Shearlet subbands for classification, energy descriptors are computed. However, the textured image retrieval consists of both statistics in the contourlet and Shearlet domains.
Meshkini and Ghassemian [12] proposed to classify textured images using the inner product of the cooccurrence matrix and magnitude coefficients of Shearlet transform. Different to many published studies, we do not only use the magnitude coefficients, but also the phase coefficients of the Shearlet transform in our work.
Deep feature descriptors which are extracted from a pre-trained deep learning model (particularly, convolutional neural network (CNN)) which are typically trained on non-medical images. As such, these models are either used without fine-tuning (i.e., as unsupervised feature extractors) or fully/partially retrained for biomedical images. For example, Song et al. [13] proposed classifying the BreakHis dataset using feature vectors extracted from a CNN. As such, the extracted descriptors from the CNN are encoded using the Fisher Vector method. Similarly, Gupta et al. [14] extracted deep features from a fine-tuned DenseNet, but from multiple layers to classify BreakHis dataset. Differently, Wang et al. [15] utilized color deconvolution to obtain the hematoxylin and eosin channels separately. Subsequently, one CNN model is trained using hematoxylin components and another CNN trained using eosin components, and finally, the outputs of the two CNNs are fused to obtain the final prediction.
As has been discussed above, computational techniques have been applied previously to predict the class of a tissue type in histological images. As such, conventional and deep learning (DL) techniques have been developed [16]. However, with the scarcity of well-curated histopathological datasets for training/testing deep neural networks [17,18], training a deep learning model can be a challenging approach. Therefore, in our study, we present a novel Shearlet-based hand-engineered texture descriptors to classify tissue types. Namely, co-occurrence texture descriptors [19,20], Local Binary Patterns (LBP) [21], Local Oriented Statistics Information Booster (LOSIB) [22], and Segmentation-based Fractal Texture Analysis (SFTA) [23] are used in our study which each of these set of descriptors are computed in the Shearlet domain [24]. Then, these features are utilized to train/test two classifiers: Support Vector Machine (SVM) [25] and a Decision Tree Bagger (DTB) [26].
Notably, our modeling technique is taking advantage of the directionality in the complex Shearlet transform where we utilized both the magnitude and phase coefficients. As such, those coefficients are summarized using various textural methods that can capture local and global attributes [19,23] of histopathological tissues. Most interestingly, computing such statistics from the directional sub-bands can potentially lead to capturing significant information that can be missed in the spatial domain because of the complexity of histopathological tissues.
In our research, we investigate both parametric and non-parametric (i.e., robust in classification [27]) classification models. We include DTB because it is a ML method that is considered to lead to explainable decisions, unlike SVM which is considered as a block box classifier [28]. We show that the fusion of some sets of descriptors can result in a vigorous feature representation. Thereafter, we employ principal component analysis to further enhance the classification results while having a reduced set of features.
This paper is an extension of work previously presented at the Biomedical and Health Informatics (BHI) Workshop 2019 [29]. Our main contributions in this extension are summarized below: • We propose and present a comprehensive justification for our feature space, namely, the Shearlet-based texture descriptors for histopathological image classification. • We demonstrate that when these attributes are used to train a conventional ML model (i.e., SVM and DTB in this extended version), they provide better classification performance than several existing methods on the four standard datasets used in this research. • We present an extended study of our feature representations expressed in principal components that decreases computational cost without significantly compromising accuracy. Figure 1 provides an overview of our proposed system for the classification of histopathological images. A detailed description of each component of our method is provided in the following sections. MATLAB ® 2017b is utilized for the implementation of our techniques.

Methods
Our proposed method consists of three steps: • Step 1: For a given histopathological image, we apply the complex Shearlet transform. With the complex coefficients, we calculate the magnitude and relative phase (RP). • Step 2: We then extract four sets of features from the directional sub-bands of the RP and magnitude: cooccurrence based texture features, local binary pattern, local oriented statistics information, and segmentation-based fractal texture analysis. • Step 3: We then apply one of two different classifiers: DTB or SVM.

Datasets
Our proposed technique has been evaluated on four different histopathological datasets that exhibit different levels of complexity:  [3] proposed to use each magnification factor as a separate dataset. However, in our study, we combine all magnification factors as one dataset as Jonnalagedda [30]. This is motivated by the fact that each magnification factor captures different information [31]. 3 Epistroma dataset contains variable size histo-pathology images that belong to two tissue types (as shown in Fig. 2c): stroma (551 samples) and epithelium (825 samples) [5]. 4 Warwick-QU dataset obtained with a magnification factor of 20× from colon histology sections. It is a binary dataset of benign (74 samples) or malignant (91 samples) [32]. Examples of the tissue types are shown in Fig. 2d.

Complex Shearlet transform
In this study, we present a new perspective for computing attributes that summarize the statistical information/distribution of the Shearlet magnitude/phase coefficients for every scale and orientation [24]. Given a complex coefficient, C = x + iy where first term express the real part and the second term express the imaginary part, then we can compute the magnitude (as ρ = 2 x 2 + y 2 ) and phase (as θ = tan −1 (y/x) ) components. In contrast to existing studies [9,10], we do not only use the energy of the complex Shearlet transform, but we also examine the Fig. 2 Samples of each dataset that we have used in our study usefulness of the phase components and their potential for providing more robust characterization for medical image classification. We experimentally verify that using phase alongside with magnitude coefficients can, in fact, boost the classification performance (See Section ). Our work is motivated by research completed by Vo et al. [8]. Vo et al. build a feature space (i.e., consist of magnitude and relative phase (RP)) that is computed from a complex directional filter bank for textured image retrieval. However, In our study, we acquire such an idea but alternatively compute the relative Shearlet phase components. The complex Shearlet transform can be applied on a histopathological image that has a size of M × M to be transformed to S scales where every scale consists of K directionalities. Let θ sk (i, j) at location (i, j) to represent the phase angle component at at scale s and directionality k, where s = 1, 2, ..., S and k = 1, 2, ..., K . In our study, we use S = 4 and K = 8 per scale. Now, we can compute the relative Shearlet phase for a phase component at position (i, j) of a directional subband in the following manner: We choose the differences of vertical and horizontal because of the orientation of shearing in the Shearlet transform. Such a transform has the advantage of being multi-scale and multi-directional which in turn can be a significant tool for a multi-resolution analysis of histopathological tissues. Therefore, we utilize each directional sub-band (i.e., from both magnitude and RP) to calculate statistical attributes.
In this study, we use a publicly available implementation of complex Shearlet transform, called the Shear-Lab [36]. After each histology image is transformed, we then summarize the magnitude and RP of the Shearlet components using four various techniques. Each technique is briefly detailed as follows:

Co-occurrence matrix (CM)
The CM was introduced by Haralick et al. [20]. Later, other studies presented other types of statistics that can be computed from the CM [19,37]. However, given two pixels (i.e., i and j) that are apart from each other by a distance (PD), then the content of a gray-level image can be formulated as a relative frequencies (i.e., F ij ) matrix. Also, there is another hyper-parameter that can be adjusted while computing the CM which is at which orientation to compute the relative frequencies. As a result, we have a CM that consists of relative frequencies for quantized orientation and distance between neighboring pixels. In our application of CM on the directional sub-bands of Shearlet coefficients, we calculate the CMs using a constant distance of PD = 1 , but changing the orientation = ( 0 • , 45 • , 90 • , and 135 • ); hence, we have four CMs. To obtain rotation invariant statistics from the CMs, we compute the mean of those four CMs [4]. Thereafter, from this CM, we extract twenty textural features (i.e., contrast, correlation, energy, autocorrelation, cluster prominence, cluster shade, dissimilarity, entropy, homogeneity, maximum probability, sum of squares, variance, sum average, sum variance, sum entropy, difference variance, difference entropy, information measure of correlation, inverse difference normalized, and inverse difference moment normalized). Following such a common practice to compute those statistical information out of CM for each magnitude/RP components, we then have a total of 640 attributes for each magnitude and RP directional sub-bands (i.e., 20 × 32(# of directional subbands).
In our study, we obtain the CM for the Shearlet magnitude and RP coefficients for each directional sub-band instead of the spatial domain. In the spatial domain, CM analyzes the statistical information of the graylevel image, but in our application, we analyze the crucial directionality characteristics in the Shearlet domain which potentially leading to robust feature space for histopathological image classification.

Local binary pattern (LBP)
The LBP [21] texture attributes are rotationally invariant of the local occurrence of the gray-level of an image. Such, occurrence is naturally classified as 'uniform' patterns. Again, we are rather concerned with the Shearlet's magnitude and RP coefficients of a histopathological image (i.e., can exhibit complex patterns). Therefore, utilizing the Shearlet coefficient which encapsulates field dominant direction chraterstics [38,39] can potentially lead to robust summarization of such directionalities (i.e., represent crucial structural details, e.g., step edges [40]).
Hence, we apply the LBP on the magnitude and RP of each directional sub-band of the Shearlet coefficients to encode the field dominant directions. For a given Shearlet (SH) coefficient (i.e., representing the magnitude or RP) at the location (i, j), the LBP of the SH coefficient is computed as follows: (2) SH c and SH p are the central RP (or magnitude) values, and P represents the surrounding RP (or magnitude) values in the circular neighborhood. In our application of LBP, we compute a feature vector for every directional sub-band while utilizing a radius of R = 2 and a neighborhood of P = 8 , where the window size is set to be equal to the directional subband size. Therefore, we obtain a feature vector for each sub-band of length of (P + 2) [21]. After concatenating all feature vectors of all directional sub-bands, we get a feature vector representing a histopathological image of length 10 × 32 = 320 attributes for each magnitude and RP.

Local oriented statistic information booster (LOSIB)
The LOSIB [22] technique first compute the absolute difference d p for a given central Shearlet coefficient with P neighboring coefficients (i.e., completed for all central magnitude (or RP) coefficients c of a directional sub-band) as follows: Then, the mean of the differences across the same directionality is computed, as follows: where N and M represent the height and width of the image, respectively.
In our application of LOSIB, we set the radius R = 1 and neighborhood P = 8 . Therefore, we obtain a feature vector for every directional subband of length P. However, the total number of descriptors for each of the magnitude and RP is 8 × 32(# of directional subbands) = 256.

Segmentation-based fractal texture analysis (SFTA)
Our application of SFTA on the directional sub-bands is adopted as provided by Costa et al. [23]. As such, SFTA first process the input directional utilizing a Two-Threshold Binary Decompositions (TTBD). As a result, various binary images are generated from the following attributes are computed: the dimension of the fractal boundaries, the average gray level, and the count of pixels belonging to the region.
In our utilization of the technique by Costa et al. [23], we set the number of thresholds to n t = 4 . As a result, for every directional sub-band, we get a 21 attribute. Hence, in total, we have 21 × 32(# of directional subbands) = 672 attributes for each, magnitude and RP.

Fusion of feature sets
The aforementioned set of descriptors are examined individually for their robustness while being used for training a classifier. It is worth noting that each set of descriptors captures different intrinsic statistical information. Therefore, we examine some combinations of our descriptors which we expect to improve the classification performance. We investigate the following combinations:

Principal component analysis (PCA) for feature reduction
In the previous section, we investigate feature fusion. Evidently, with the combination of different descriptors, the number of features increases. Therefore, the best achieving fusion strategy is processed with PCA to find a reduced subspace. The PCA coefficients are rotated to maximize the orthomax criterion [41], and to obtain a final basis with simple structure [42]. Being the principal components (PCs) rotated to maximize varimax, then they are utilized to project the combined descriptors to a decorrelated space.

Classification algorithms
In this work, we introduce a new enhanced technique for analysis and feature extraction from the Shearlet transform. We also analyze the performance of two classifiers for different tissue types and show that the classifier has a minimal effect once robust features are extracted. In particular, we consider one of widely used in many of bio-medical research: a Support Vector Machine (SVM). In addition to SVM, we consider one type of decision tree which is a Decision Tree Bagger (DTB) [26] that has the trait of being non-parametric. As such, the distance between feature vectors is not computed in constructing decision trees. In contrast, SVM performance is based on kernels which can influence the classification performance [43]. Both SVM and DTB are briefly described below:

Support vector machine (SVM)
In our study, we utilize a SVM with pairwise classification (i.e., one-versus-one class decisions) [44]. Prior to training, we normalize all feature vectors to have equal mean and variance. The kernel function transforms the input data into a higher-dimensional feature representation, from which a hyperplane is formulated for classifying the input tissue dataset. SVM training requires the solution of a quadratic programming (QP) problem. To simplify the solution of the problem, a sequential minimal optimization (SMO) solver is used in our study [45]. This solver simplifies the QP into a smaller series of QP for the training of SVM. SVM training is impacted by the choice of the kernel function. In our study, we choose a radial basis function as a kernel function as it universally approximates the training dataset accurately. Therefore, the regularization parameter (C) is the only remaining parameter that can be optimized. In the attempt of choosing the best value of C, trials of values between [1,5] with an increment of 1 were conducted. We found that in general C = 5 delivered the best classification performance on the testing dataset. Therefore, the value of C = 5 is set for all of our experiments. More details about SVM can be found in [46].

Decision tree bagger (DTB)
DTB is an ensemble classifier where each classifier in the ensemble is a classic decision tree generated using a random selection of attributes at each node to determine the split as in a random forest [47]. DTB generates multiple bootstraps (i.e., replicas of the training set) to train a decision tree with replacement from the provided training set. Thereafter, a bagging technique is used to combine the results of several decision trees, as an advantage this minimizes the susceptibility to over-fitting.

Cross-validation (CV)
We utilize cross-validation (CV) to obtain robust statistical results and to be able to generalize the classification results with each classification model. This approach is commonly used in the literature to examine a proposed technique. Therefore, our choice of the number of folds to split a dataset is based on previous studies. As such, the following dataset are divided using 10-fold CV: multiclass Kather's(as Kather et al. [4]), Epistroma(as Ramalho et al. [48]), Warwick-QU(as Ribeiro et al. [7]).
The datasets are divided into mutually exclusive folds with approximately equal size (i.e., some of the datasets are imbalanced): 9-folds are used to build a model, and 1-fold is used to test the model. The process is repeated 10-times, such that the test set is different each time.
Finally, the overall performance metrics are estimated by taking the mean from the tested 10 independently built models. However, the BreakHis dataset is split into 7-folds CV for training and testing as in a previous study [30].

Performance metric
To evaluate the performance of each built model, commonly used measures are computed in our study: accuracy (ACC), AUC (area under the receiver operating characteristic (ROC) curve), sensitivity (Sen), and Precision (Prec). The computation of each evaluation metric is calculated as follows:

Results and discussion
We first apply the Shearlet transform as detailed previously to four different datasets. We compute the RP and magnitude from the complex Shearlet coefficients. Then, to evaluate the strength of our proposed techniques, we utilize four commonly used measures for classification: ACC, AUC, Sen, and Prec. In the literature, there exist descriptors extracted from multi-directional and multi-resolution wavelets (e.g., Shearlet, contourlet, and others). In this regard and to the best of our knowledge, we have implemented descriptors of previously published studies. These descriptors are applied on the transformed (i.e., using complex Shearlet transform) histopathological dataset: • Vo et al. [8]: proposed to compute from the RP the circular mean and circular variance, but computed only the mean from the magnitude of the decomposed textured image while using a complex direc-tional filter bank. However, we adopt the same descriptors, but we compute them from each of the Shearlet directional sub-band and then concatenate them to form the feature vector. • Meshkini and Ghassemian [12]: proposed first to find the magnitude of the Shearlet coefficients and graylevel co-occurrence matrix. Then, the inner product of both is used as a feature vector. • Zhou et al. [10]: proposed to compute three sets of descriptors from the magnitude coefficients only: (1) the co-occurrence matrix is computed (i.e., from which the following texture features are obtained: entropy, correlation, contrast and, energy) from the first layer only of the horizontal cone of Shearlet transform; (2) the mean, variance, and energy are calculated from the Shearlet transform, only, from the first and third layer in the horizontal and vertical cones; (3) The maximal values are obtained from each column only from the high-frequency of the Shearlet transform. Then, these three sets of descriptors are concatenated to form the feature vector. • Dong et al. [11]: proposed to calculate from the Shearlet magnitude coefficients the following statistics: the mean and standard deviation. As such, these statistics are computed from each directional subband and concatenated to form the feature vector of an image.
Part (1)   When utilizing the multi-class Kather's dataset, our proposed descriptors computed from the Shearlet coefficients of both magnitude and RP can accomplish reasonable accuracy between 82% to 86% when using an SVM model, but DTB yields accuracy spans only between 79% to 81% as presented in Part (2) of Fig. 3 (See Table 1). The highest AUC value = 0.9773 is obtained while classifying Kather's dataset utilizing LOSIB attributes coupled with the SVM model. Furthermore, we compute the Sen and Prec which the highest values are achieved using LOSIB coupled with SVM (i.e., 0.8632 & 0.8664, respectively). However, when incorporating various descriptors together, for instance Fusion #3 improves the accuracy about 6.22% points as presented in Part (3) Table 1 when using an SVM model.
We outline the classification performance on the BreakHis dataset, such that we consider all four magnification factors as one dataset (as seen in Fig. 4). We observe that SFTA has attained the highest accuracy of 89.72% on the validation split (with a corresponding AUC = 0.9527, Sen = 0.8040, and Prec = 0.8593) while utilizing both magnitude and RP. As presented in Part (3) of Fig. 4 (See also Table 2), Fusion #1 has led to a higher accuracy of 91.28% (with an AUC = 0.9650, Sen = 0.8391, and Prec = 0.8775). Evidently, due to the high skewness in the BreakHis dataset, the classifier is biased toward the majority class (i.e., malignant cases) as it is observed with lower values in sensitivity and precision.
Similarly, we conduct our technique on the Epistroma dataset (results reported in Fig. 5 & Table 3). We observe when descriptors extracted from both, magnitude and RP, utilizing CM leads to an accuracy of 97.24% with AUC = 0.9917, Sen = 0.9733, and Prec = 0.9807. In Part (3), Fusion #3 enhances the accuracy to 97.46% with a AUC value = 0.9925, Sen = 0.9769, and Prec = 0.9809.  Moreover, the classification performance on Warwick-QU are is presented in Fig. 6 (See also Table 4). Once more, CM textural features of both, magnitude and RP lead to the best accuracy of 95.70% (with corresponding AUC = 0.9860, Sen = 0.9446, and Prec = 0.9589) of our proposed individual feature representation. Part (3) presents that Fusion #2 leads to the highest accuracy of 96.29% with an AUC = 0.9860, Sen = 0.9571, and Prec = 0.9589. It is worth noting that the outlined standard deviations corresponding with classification performance on Warwick-QU dataset are higher than the other datasets because the number of validation examples of this dataset is significantly smaller.
Further, Fig. 7 shows the classification performance while attempting to find a reduced set of our feature space for each dataset. In the previous sections, we have identified the best achieving fusion descriptors. Therefore, we further process these descriptors with PCA. As such, we aim to retrieve the PCs that maintain or improve the classifier performance in comparison to using all attributes. We have established from Tables 1, 2, 3 and 4 that the SVM model has strong capabilities to classify histopathological tissues which therefore coupled with the best fusion of the corresponding dataset to find a reduced set of features. However, we are able to capable of achieving reasonable accuracy across the four histopathological image datasets, the best fusion which can be used to achieve the highest potential accuracy classifying the histological image dataset is not always the same (as shown in Fig. 7). Considering Fusion #3 of our Shearlet-based descriptors expressed in the principal components, we can observe different patterns of classification performance while using an increment of 50 PCs in Fig. 9. Therefore, our Shearlet-based descriptors appear to be general enough for any histological image analysis, and carefully choosing a type of fusion that maximize the recognition of the tissue type is sufficient.
In the literature exist state-of-the-art techniques that use the same datasets that we have used in our study. We report their results in this section to show that our proposed techniques are on par with those techniques and achieve excellent performance when reporting performance in terms of accuracy and AUC. Our Shearlet-based descriptors can attain robust classification performance in different scenarios; hence, it can be an appealing system in the clinical settings for medical image classification. In comparison, Wang et al. [15] proposed to use a bilinear CNN model for classifying multiclass Kather's dataset. The reported accuracy was 92.6% (and AUC = 0.985). However, our proposed technique can achieve similar accuracy, yet better AUC = 0.9905.
When it comes to the BreakHis dataset, Jonnalagedda et al. [30]  = 0.92). Our approach on BreakHis dataset is efficient to achieve higher AUC value, but with somewhat lower accuracy.
When it comes to Epistroma dataset classification, Ramalho et al. [48] proposed a structural approach using co-occurrence statistics. Their approach led to an accuracy of ≈ 95%; our approach attains better classification accuracy.
In the case of Warwick-QU dataset classification, Ribeiro et al. [7] proposed to extract descriptors from the spatial and curvelet domain (i.e., Fractal measures and Haralick features) for which they report a higher AUC of 0.994 but they did not report accuracy.
Evidently, our proposed descriptors compete with the state-of-the-art including those based on CNN in terms of classification accuracy and AUC. In contrast to the resource extensive CNN-based techniques, we handengineer our descriptors, yet realize similar results. Our approach is based on different descriptors in the Shaerlet domain which is capable of handling the scarcity of biomedical datasets. The highest accuracies in our study are obtained using an SVM model which is considered as a black box and the classifications made by such models are difficult to interpret and explain [28]. In addition to SVM, we explored DTB which is capable of achieving reasonable results, and might be a more favorable approach in certain applications as they have higher interpretability [49]. However, navigating through a rule to understand a decision made by a DTB is still a challenge since our main descriptors are computed based on Shearlet coefficients. Therefore, linking the computed statistical information from the Shearlet transform into a particular region of interest in a tissue type can be difficult. It is worth noting that our main concern in this study is to establish a robust approach for classifying histological images.

Conclusion and future work
In this study, we have constructed a novel feature representation for classifying histology tissues. Our technique integrates various sets of textural descriptors which are obtained in the complex Shearlet domain instead of directly computing the descriptors from the gray-level images. Those computed sets of descriptors are based on methods that capture local and global statistics: descriptors from the co-occurrence matrix, local binary patterns, local oriented statistic information booster, and segmentation-based textural features. As a result, we exploit the multi-directionality and multi-resolution of complex Shearlet transform; hence, we investigate the benefits of not only using the magnitude but also the relative phase (RP) components of the complex Shearlet coefficients. We have concluded that in general using both the magnitude and RP can lead to rigorous and effective classification results for histopathological image datasets. We utilize PCA to obtain a reduced set of our proposed integrated feature representation in the Shearlet domain which on some dataset can reduce feature set size while maintaining or increasing classification performance with traditional machine learning. We also show that the machine learning method has only limited influence on the results and hence it is possible to use DTB if desired because decisions of the classifier are considered interpretable. Our proposed method attains state-of-the-art classification results on the four histopathological datasets that we have utilized in this research. Our expectation that our technique is capable to generalize also on other histopathological datasets.
In the future, we plan to further benchmark our proposed techniques with different classification models (e.g., multilayer perceptron, Random Under Sampling Boost decision tree, and others). Additionally, we plan to investigate other feature reduction methodologies to aggressively reduce the number of descriptors without compromising the classification performance. Finally, the problem of a highly imbalanced dataset (i.e., observed in the BreakHis dataset) can be eliminated using sampling methods (e.g., Synthetic minority oversampling (SMOTE) technique) on the training dataset.