A deep semantic segmentation correction network for multi-model tiny lesion areas detection

Background Semantic segmentation of white matter hyperintensities related to focal cerebral ischemia (FCI) and lacunar infarction (LACI) is of significant importance for the automatic screening of tiny cerebral lesions and early prevention of LACI. However, existing studies on brain magnetic resonance imaging lesion segmentation focus on large lesions with obvious features, such as glioma and acute cerebral infarction. Owing to the multi-model tiny lesion areas of FCI and LACI, reliable and precise segmentation and/or detection of these lesion areas is still a significant challenge task. Methods We propose a novel segmentation correction algorithm for estimating the lesion areas via segmentation and correction processes, in which we design two sub-models simultaneously: a segmentation network and a correction network. The segmentation network was first used to extract and segment diseased areas on T2 fluid-attenuated inversion recovery (FLAIR) images. Consequently, the correction network was used to classify these areas at the corresponding locations on T1 FLAIR images to distinguish between FCI and LACI. Finally, the results of the correction network were used to correct the segmentation results and achieve segmentation and recognition of the lesion areas. Results In our experiment on magnetic resonance images of 113 clinical patients, our method achieved a precision of 91.76% for detection and 92.89% for classification, indicating a powerful method to distinguish between small lesions, such as FCI and LACI. Conclusions Overall, we developed a complete method for segmentation and detection of WMHs related to FCI and LACI. The experimental results show that it has potential clinical application potential. In the future, we will collect more clinical data and test more types of tiny lesions at the same time.


Background
White matter hyperintensities (WMHs) are features of very small vessel disease of the brain [1,2], which they present as hyperintense regions on fluid-attenuated inversion recovery (FLAIR) images. Accurately Open Access *Correspondence: wbz99@sina.com † Yue Liu and Xiang Li have contributed equally to this work 1 Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao 266112, China Full list of author information is available at the end of the article identifying and classifying WMHs can help radiologists diagnose diseases and determine the course of such diseases [3,4].
Among them, detection and recognition for WMHs related to focal cerebral ischemia (FCI) and lacunar infarction (LACI) signals are of great significance for the clinical diagnosis and prevention of lacunar cerebral infarction. FCI presents high value signals on T2 FLAIR images and no obvious signal on T1 FLAIR images. These abnormal signals are mainly caused by demyelination [5,6]. In patients who experienced an ischemic event, treating the underlying cause in time is critical for the prevention of further episodes. If its duration is long enough, FCI will lead to irreversible brain tissue necrosis or infarction in the ischemic areas [7]. In other words, FCI can generally be cured, while infarction cannot. LACI is a common type of infarction. Patients considered to have had LACI usually present high value signals similar to those observed in FCI on T2 FLAIR images and low value signals on T1 FLAIR images. These abnormal signals are caused by demyelination and structural changes. Thus, patients with LACI who undergo diagnostic imaging should be educated on common stroke symptoms and how to manage the onset of stroke [8]. In addition, continuous follow-up with a physician is necessary for these patients so that the physician can monitor drug dosage and risk factors [9]. Therefore, it is important to recognize between these two abnormal signals.
Because the signals of FCI and LACI are indistinguishable on T2 FLAIR images, diagnosis with both T1 FLAIR and T2 FLAIR images is usually required [10]. Clinically, the conventional method of finding and distinguishing focal abnormal signals from a patient relies on careful examination by multiple radiologists. The manual method is time-consuming and can easily cause missed diagnosis and misdiagnosis, especially after doctors review a large number of radiological images at once. A practical tool that can assist radiologists in finding and distinguishing focal abnormal signals is urgently needed.
However, it is a challenge to accurately detect and distinguish focal abnormal signals of the brain. Although many researchers have worked on lesion segmentation, the solution of cerebral focal abnormal signal segmentation lacks relevant experience. First, focal abnormal signals are always tiny objects and are difficult to detect accurately. Existing solutions of medical image analysis often underperform with tiny objects. Second, lesions cannot be accurately diagnosed using single-modal magnetic resonance imaging (MRI) data; however, how to apply multi-modal MRI data effectively is a challenge. Most of the existing multi-modal methods use feature fusion, which is not suitable for focal abnormal signal segmentation. Figure 1a shows the T1 and T2 FLAIR images of a patient with both FCI and LACI. In clinical diagnosis, radiologists first observed the T2 FLAIR images for focal abnormal signals and then compared these images at the same location on T1 FLAIR images. To our knowledge, existing methods have never been explored in a similar process. Finally, the size and number of lesions vary greatly for different subjects. Figure 1b shows a comparison of two patients with focal abnormal signals on T2 FLAIR images. It is difficult to use the same model to intelligently determine the number of lesions in a different subject.
To overcome the aforementioned challenges, we herein propose a new framework to estimate the lesion areas via segmentation and correction processes, in which we simultaneously train two models: a segmentation network and a correction network. The segmentation network was first used to extract and segment potentially diseased areas on T2 FLAIR images. The correction network was then used to classify these areas at the corresponding locations on T1 FLAIR images to assess the probability that a patient has had LACI. Through the Fig. 1 a Comparison of FCI and LACI signals on T2 FLAIR and T1 FLAIR images.This patient had both of these lesions on the same slice. It is observed that the signals of FCI and LACI can only be distinguished on T1 FLAIR images. b Two slices with strong differences in the number and brightness of abnormal signals. These differences make it difficult for the segmentation model to accurately segment both types of slices at the same time combination of these two networks, we achieved semantic segmentation of two different lesions.

Segmentation of white matter hyperintensities
Most related work focuses on automate segmentation of WMHs. The abnormal signals of FCI and LACI also belong to WMHs but are difficult to use for classifying and grading lesions. Moreover, all the existing work focuses on abnormal signals with large areas, and there is a lack of exploration on the segmentation of abnormal signals with very small areas. These methods can be divided into unsupervised, semi-supervised, and supervised methods.

Unsupervised segmentation
The advantage of unsupervised segmentation methods is that manual labeling is not required. Most of these methods use intensity-based clustering methods, such as fuzzy C-means methods [11], EM-based algorithms [12], and Gaussian mixture models [13]. Some studies have designed probabilistic generative models for stroke lesion segmentation, such as those proposed in reference numbers [14,15]. Additionally, several studies have focused on the fact that WMHs are best observed on FLAIR magnetic resonance images and identified differently on T1-weighted magnetic resonance images [16,17]. These studies generated synthetic images and then compared them with real FLAIR images to detect any abnormalities. An important disadvantage of these methods is that they are not designed to find and distinguish between FCI and LACI. Therefore, these methods cannot accurately segment the diseased area nor can they distinguish whether the infarct has occurred in the segmented area.

Semi-supervised segmentation
Existing semi-supervised segmentation methods mainly depend on regional growth techniques. Kawata et al. [18] proposed a region-growing method, which was adaptive selection on WMH regions based on a support vector machine. Qin et al. [19] proposed an algorithm to optimize the kernel-based max-margin objective function. Although these methods are well motivated and have yielded some progress, transferring useful knowledge from unlabeled data remains a challenge. Therefore, semi-supervised WMH segmentation methods cannot completely replace supervised methods.

Supervised segmentation
Recently, a variety of convolutional neural networks (CNNs) have been widely utilized in the medical field and have often been reported to be the state of the art [20][21][22]. Guerrero et al. [23] used a CNN that is able to segment hyperintensities and differentiate between WMHs and stroke lesions. Brosch et al. [24] proposed a deep convolutional encoder network for the prediction of MS lesions. Kamnitsas et al. [25] proposed a dualpathway 3D CNN for brain lesion segmentation. Ghafoorian et al. [26] proposed several deep CNN architectures to consider multi-scale patches or take explicit location features. However, none of these methods can achieve semantic segmentation of WMHs related to FCI and LACI.

Tiny objects
Processing tiny objects is notoriously challenging. The most common methods of distinguishing tiny objects are increasing the input image resolution [27] and fusing high-resolution features from low-resolution images [28,29]. However, these methods greatly increase the computational overhead and do not address the class imbalance between tiny objects and backgrounds. Li et al. [30] proposed a perceptual generative adversarial network (PGAN). The PGAN lifts representations of tiny objects to "super-resolved" ones, achieving characteristics similar to those of large objects. Improving tiny object proposals was proposed on the basis of different resolution layers in a region proposal network [31]. Ren et al. [32] leveraged the context information and thereby further improving the performance of tiny object detection.

Our contributions
The main contributions of this research are as follows: • For the first time, we achieved accurate segmentation of and discrimination between FCI and LACI signals. • We proposed a novel auxiliary network for discriminating between FCI and LACI signals in the T1 FLAIR modality. • We proposed a series of oversampling and augmentation strategies to achieve tiny lesion segmentation.

Methods
Our proposed framework consisted of a primary network and a secondary network, as shown in Fig. 2. The primary network was used to focus on the segmentation of tiny brain lesions. Thereafter, the secondary network was used to receive the ROI that was extracted either from FCI masks or LACI masks and to output a scalar of 0 or 1 for each ROI to identify the type of lesion. Finally, the results of the secondary network were used to correct the segmentation results. Besides, to enable the model to process magnetic resonance images of different courses at the same time, we used a series of oversampling and augmentation strategies, each designed for the characteristics of tiny lesion segmentation in the brain.

Primary network
To segment the lesions accurately and reliably, we deployed a primary network based on T2 FLAIR images. The proposed primary network had a symmetric model structure, as shown in Fig. 3. U-Net [33] forms the foundation of this architecture. It consisted of an encoding path and a decoding path. The encoding path comprised 10 convolutions with a kernel size of 3 × 3 for generating a set of feature maps. These maps were applied by batch normalization (BN) and a rectified linear unit (ReLU). After each two continuous stack of convolution + BN + ReLU, a 2 × 2 max-pooling layer was applied for downsampling. Each decoding path had an up-sampling process of the feature map with a 2 × 2 deconvolution that halved the number of feature channels, each followed by a BN layer and a ReLU layer; each feature map was connected to the coding path after up-sampling. Thereafter, further feature extraction and selection were conducted for the concat feature map-based continuous convolution + BN + ReLU operations. At the final process, two continuous 1 × 1 convolutions were applied for mapping

Secondary network
Although the primary network segmented the lesions, semantic segmentation of WMHs related to FCI and LACI had not yet been implemented. We deployed the secondary network to implement this challenge based on T1 FLAIR images. The proposed secondary network comprised five convolutional layers with 3 × 3 kernels, five ReLUs, three average pooling layers, and two fully connected layers with dropout, as shown in Fig. 4. After each two continuous convolution + BN + ReLU layers, a 2 × 2 max-pooling layer for down-sampling was applied. At the final process, two continuous 1 × 1 convolutional layers were used to discriminate the class of the entire import image. With regard to training, the secondary network first received either the predicted FCI masks or the LACI masks from the primary network. Thereafter, the secondary network outputted a single scalar to determine whether the predicted masks had the characteristics of FCI or LACI. When the secondary network successfully discriminated the type of input image, this result was returned to correct the segmentation results, and the semantic segmentation result was then generated.

Global optimization
The updates of our network were based on the crossentropy loss function. For optimization, we used the RMSProp algorithm [34] as follows: where g t denotes the gradient of the cost function; E[g 2 ] t denotes the gradient's mean value of t times square; α is the moving average parameter set to 0.9; η is the base learning rate set to 0.01; and ǫ is a parameter added to prevent division by zero.

Original data and ground truth
The proposed framework was evaluated on a dataset of 113 clinical patients (61 men and 52 women). The average age of the patients was 52 ± 26 years. All MRI data were acquired using the GE Signa Horizon HDxt 1.
(2) images), and 37 healthy patients (152 images). According to the imaging diagnostic criteria, the training set was labeled with the boost of two experienced radiologists. The basic facts of the classification were extracted from clinical reports and reviewed by two doctors.

Oversampling and augmentation strategies
The lesion region of WMHs related to FCI and LACI is usually very small and hard to recognize. In the experiment, we found that if we simply select images randomly for training, the model would classify all pixel points into negative pixels regardless of how the network sets the objective weight. According to Van Nguyen et al. [17], we speculate that this result is attributed to two factors: (1) Fewer images contain lesions, and (2) the lesions are not apparent in the image containing them. To solve this problem, we designed oversampling and augmentation strategies to encourage the model to focus on lesion regions.

Oversampling
We addressed the issue of relatively fewer images containing lesions by oversampling those images during training [35]. Based on the characteristics of tiny objects, we identified the slices with a large number of tiny lesions for pre-training. After the model converges, we input all the data used for training to the network.

Dataset augmentation
To focus on lesion segmentation, we introduced a dataset augmentation strategy. After careful confirmation by two radiologists on different MRI modalities, we finally selected 48 slices that had more than 5 WMHs related to FCI and LACI from 113 clinical patients. Each slice was rotated 5 • to 10 • clockwise or counterclockwise. All images and labels were shuffled and checked one by one to ensure that they had no errors after data augmentation.

Images used by secondary network
The images used by the secondary network were derived from the centroids of each connected component of the segmentation mask. Each image was an ROI from a centroid with a size of 32 × 32. To facilitate the calculation, we up-sampled the image to a size of 64 × 64 before being used as the input to the secondary network.
Owing to the very small size of these extracted images, the difference between the minimum and maximum values of the pixel intensity was within 100. To make the features of these images easier to extract, we leveraged the gamma transformations of the image intensity. To moderately stretch the pixels with high-intensity levels in the image and compress the pixels with low-intensity levels, we set the γ value to 1.5.

Experiments
We used the standard five-fold cross-validation method for the performance evaluation of our proposed method [36]. We divided the data into five parts, each time selecting four groups for training and one group for testing. When all the experiments were completed, we calculated the average value as the model metrics. Finally, all slices were used for training and testing, and each slice was used for testing only once. We implemented the proposed method using Python 3.7 based on the TensorFlow 1.13 library on a workstation equipped with GPUs of NVIDIA TESLA V100.

Performance evaluation
We evaluated our proposed method in three dimensions: segmentation, detection, and classification. Among them, the evaluation index of segmentation was the dice coefficient, and that of detection and classification was precision. Precision was utilized to evaluate the accuracy of the results. In general, precision is defined as the ratio of true positives to all positives. A correct detection is only counted as a true positive detection if the predicted mask or bounding box has an intersection over union (IoU) higher than 0.6. In this study, our primary interest was precision in distinguishing each lesion region.

Results
We proposed a novel method for segmentation of WMHs related to FCI and LACI. In our experiment on 113 sets of clinical patients' MRI scans, our method achieved a precision of 91.76% for detection and 92.89% for classification. The results demonstrated the validity of the proposed method for semantic segmentation of WMHs related to FCI and LACI.
To investigate the effectiveness of our proposed method, we compared the experimental results with those from other networks. We analyzed the effects of the secondary network, oversampling, and augmentation on the detection results. Table 1 provides the experimental configurations and results. The results showed that our proposed method could distinguish the WMHs related to FCI and LACI effectively. Notably, there is no classification metric for the primary network in Table 1 because it only completes semantic segmentation according to the lesion area.

Discussion
The experimental results of this study are presented in Table 1. The experiment is similar to an ablation study previously conducted to demonstrate the effect of each novel module. In the experiment, each index of the primary network had two results. This is because in the early stage of this research, we first thought that the processes of segmentation and classification were completed simultaneously; however, the results obtained were poor (reference to task a in Table 1 for more details). Although the introduction of data augmentation and oversampling strategy improved the results in the later stage, it still could not meet the requirements of high-precision segmentation and recognition. Therefore, we introduced a two-stage learning strategy, in which the primary network was only focused on segmenting the lesion areas, and the task of identifying the lesions was completed by the secondary network (reference to task b in Table 1 for more details). This two-stage network originated from clinical practice and completely simulated the diagnostic process used by radiologists. The final results also proved the effectiveness of our two-stage network. After deployment of the secondary network, the model significantly improved the classification effect for FCI and LACI. In addition, the experiments showed that oversampling is necessary for tiny lesions, which may be attributed to the small proportion of the lesions. Data augmentation technology effectively improves the ability of the model to detect lesions. This shows that in future research, collecting more data may be the key point for improving the accuracy of the model.

Conclusions
In this study, we developed a complete method for segmentation of WMHs related to FCI and LACI. This is the first method to distinguish between small lesions, such as FCI and LACI. The experiments with 113 sets of clinical data showed that our method is accurate and reliable. This method first leverages the primary network to achieve segmentation of the lesions. Thereafter, the secondary network is deployed to classify the lesion type. Although existing studies have achieved semantic segmentation of multiple lesions, some of them (e.g., tiny lesions) have not been fully considered. In the future, we will collect more clinical data and test more types of tiny lesions at the same time.  Table 1 Effectiveness of our proposed method on the dice coefficient, precision, for methods including the primary network, twostage network, and augmentation(Aug) + oversample strategies Task a: segmentation by category information (e.g., background, FCI and LACI); Task b: segmentation by lesion area (e.g., background and lesion area)