Volume 17 Supplement 1

## Selected articles from the 6th Translational Bioinformatics Conference (TBC 2016): medical informatics and decision making

# Cascade recurring deep networks for audible range prediction

- Yonghyun Nam
^{1}, - Oak-Sung Choo
^{2}, - Yu-Ri Lee
^{2}, - Yun-Hoon Choung
^{2}Email author and - Hyunjung Shin
^{1}Email author

**17(Suppl 1)**:56

https://doi.org/10.1186/s12911-017-0452-2

© The Author(s). 2017

**Published: **18 May 2017

## Abstract

### Background

Hearing Aids amplify sounds at certain frequencies to help patients, who have hearing loss, to improve the quality of life. Variables affecting hearing improvement include the characteristics of the patients’ hearing loss, the characteristics of the hearing aids, and the characteristics of the frequencies. Although the two former characteristics have been studied, there are only limited studies predicting hearing gain, after wearing Hearing Aids, with utilizing all three characteristics. Therefore, we propose a new machine learning algorithm that can present the degree of hearing improvement expected from the wearing of hearing aids.

### Methods

The proposed algorithm consists of cascade structure, recurrent structure and deep network structure. For cascade structure, it reflects correlations between frequency bands. For recurrent structure, output variables in one particular network of frequency bands are reused as input variables for other networks. Furthermore, it is of deep network structure with many hidden layers. We denote such networks as cascade recurring deep network where training consists of two phases; cascade phase and tuning phase.

### Results

When applied to medical records of 2,182 patients treated for hearing loss, the proposed algorithm reduced the error rate by 58% from the other neural networks.

### Conclusions

The proposed algorithm is a novel algorithm that can be utilized for signal or sequential data. Clinically, the proposed algorithm can serve as a medical assistance tool that fulfill the patients’ satisfaction.

### Keywords

Hearing Aids Hearing improvement Neural networks Deep learning Cascade structure Recurrent structure## Background

As individuals have longer life expectancies, the quality of life has become more and more important nowadays as well as auditory rehabilitation of hearing impaired persons. In the recent years, the need of hearing aid(HA)s for patients with hearing loss is spreading widely however, the satisfaction levels of HAs are quite diverse among individuals. One of the many causes that affect the satisfaction levels is the composite interactions between the variables that affect the outcomes of HAs. Walden et al [1] investigated the correlations between the patients’ demographic information and hearing test results in 50 patients that were successfully using HAs, and determined that age is a major variable for successful HA use. In a study of acute hearing loss, 83 patients were classified into four levels in a range of 1 to 4 according to their degrees of recovery. The factors that affected the hearing recovery were analyzed using nonparametric statistical analysis methods. The results indicated that the presence of tinnitus and/or dizziness, duration of hearing loss, pure tone audiometry patterns, degree of hearing loss, and age have statistically significant effects on recovery [2]. Although other studies have also reported the reasons for HA failure and methods to improve HA outcomes, studies with large number of patients for HAs have not been investigated.

For successful use of HAs, a highly reliable prediction model attributing to specific characteristics of HAs and frequencies is essential. Although there are algorithms for fitting of HAs developed by various HA manufacturing companies, objective information on the degree of hearing improvement obtained by different HAs in patients are sometimes inaccurate or concealed. Mulrow et al. [3] proposed a logistic regression model for variables such as age, education, functional limitations, and the degree of hearing loss using data from 176 patients but this model showed low accuracy of training data and testing data (75–88% and 54–84%, respectively). Cvorovic et al. [4] presented a multiple linear regression model with 541 Swiss patients that showed sudden sensorineural hearing loss symptoms as a diagnosis model [4]. However, this model has limitations as it was limited to sudden sensorineural hearing loss, the validity of the model was not verified, and its applicability in the clinical field was not evaluated.

Therefore, the purpose of this study was to develop a new model that can present the expected degrees of hearing gain followed by HA fitting based on the variables that can affect the outcomes of HAs. This model is expected to fulfill the patients’ expectation levels, motivate the patients to use and manage HAs, and help to maximize the hearing improvement through application of HAs in the clinical fields.

## Neural networks general

*n*instances, x = {

*x*

_{ i }|

*x*

_{ i }∈

*R*

^{ d },

*i*= 1, 2, …,

*n*} and

*k*target variables, y = {

*y*

_{ j }|

*y*

_{ j }∈

*R*,

*j*= 1, 2, …,

*k*}, the learning process of a neural network is to estimate the weight w that connects each nodes. The estimation of w is calculated in the direction of reducing the error, which is the difference between output variable f = {

*f*

_{ j }|

*f*

_{ j }∈

*R*,

*j*= 1, 2, …,

*k*} and target variable y. Thus, the objective is to minimize the sum of squared errors and the optimization problem is:

In this study, we propose a novel neural network that have three structural characteristics: cascade, recurrent, and deep network structure. Previous studies concerning each of the three structures are reviewed as the following.

### Cascade structure

When a neural network obtains high prediction performance, selecting an appropriate number of hidden nodes is important. Although the number of hidden nodes is selected by trial-and-error, in general, there has been extensive studies in determining the number of hidden nodes through algorithm. One of the representative algorithms is a cascade-correlation neural network which selects the optimal number of hidden nodes by adding a single hidden node at each step in the training process. In addition, it achieves faster training process than general neural networks [9, 10].

### Recurrent structure

Neural networks in recurrent structures are frequently utilized for the prediction of time series or sequential data. For instance, in predicting *k* target variables, y = [*y*
_{1}, *y*
_{2}, … *y*
_{
k
}], that has characteristics of time-series, the output variables are correlated with each other. In such case, higher performance can be obtained by utilizing previous step *y*
_{
t-1} as an input variable for prediction of t-time step *y*
_{
t
}. This is well consistent with the concept of Recurrent Neural Network that combines a general neural network with the notion of time series. In recurrent neural network, hidden nodes are utilized as storages that preserve the information in previous training step [11, 12].

### Deep network structure

In Recent works, it has shown that piling up many layers in a neural network leads to improvements in prediction performance. The basic concept of deep neural networks is to pile up many hidden layers between the input layer and the output layer [13]. However, if a neural network becomes deep with many layers, it leads to difficulty in learning weights and thus the overfitting problem [5]. To solve the overfitting problem, the Restricted Boltzmann Machine and the Deep Belief Network can be used. The Restricted Boltzmann Machine is a model made by removing connections between layers from Boltzmann machine and updates the entire parameters by piling up hidden layers one by one [14]. The Deep Belief Network is a model that piles up pre-trained layers with unsupervised learning [15].

In this study, we proposed a novel neural network algorithm by incorporating advantages of three neural network structures described above.

## Methods

*k*+ 1 hidden layers for

*k*output variables. CRDN consists of two phases; cascade phase and tuning phase. In cascade phase, a neural network for one output variable is configured and sufficiently trained. Then the trained network is used as

*base leaner*for constructing the next neural network and the process progressively carried out for other output variables. In tuning phase, the errors in the output values from cascade phase are corrected.

### Cascade phase

Cascade phase is step for training output variables in stepwise fashion. When we predict a specific output variable, other correlated output variable is utilized as an input variable. It is known that for training neural networks increase in the number of nodes leads to higher computational cost and difficulty in training the weight w [5]. Therefore, in cascade phase, for predicting many output variables, we first construct a feedforward neural network for one output variable and the trained network is used as *base leaner* for constructing the another neural network. The base learner is updated as the output variables are changed. Such processes have an advantage of drastically reducing the training time by reusing previously trained feedforward neural networks. Cascade phase progresses in bi-directional order; inclining step, which is a learning process in the forward direction, and declining step, which is a learning process in the reverse direction.

#### Inclining step

*k*target variables, Y = {y

_{ j }| y

_{ j }∈

*R*

^{ m },

*j*= 1, 2, …,

*k*}, and

*k*output variables, F = {f

_{ j }|f

_{ j }∈

*R*

^{ m },

*j*= 1, 2, …,

*k*}, we can construct a neural network to predict

**y**

_{ j }. There may be several dimensions of each target variable, but we only consider case of one dimension (m = 1) to easily describe the problem. For inclining step, firstly we construct a neural network

*net*

_{1}that predicts y

_{ 1 }. Output f

_{ 1 }is obtained by [

*f*

_{1}, w

^{(1)}] =

*net*

_{1}(w

^{(0)}, x

^{(1)},

*y*

_{1}) where w

^{(1)}is trained weight matrix of neural network for predicting y

_{ 1 }, w

^{(0)}is randomly configured weight matrix for initialization, and x

^{(1)}= [

*x*

_{1},

*x*

_{2}, …,

*x*

_{ d }] is input vector for predicting y

_{ 1 }. When the neural network

*net*

_{1}has been sufficiently trained, we construct neural network

*net*

_{2}that predicts y

_{ 2 }. For training

*net*

_{2}, the output value f

_{ 1 }is added to input vector x. Since the two target variables

**y**

_{ 1 }and y

_{ 2 }affect each other, weight matrix w

^{(1)}that has been previously trained is utilized as a

*base learner*. Then, output value f

_{ 2 }is obtained by [f

_{2}, w

^{(2)}] =

*net*

_{2}(w

^{(1)}, x

^{(2)}, y

_{2}) where

**w**

^{(2)}is trained weight matrix for predicting

**y**

_{ 2 }, x

^{(2)}= [

*x*

_{1},

*x*

_{2}, …,

*x*

_{ d }, f

_{1}] is input vector for predicting

**y**

_{ 2 }. Then, neural networks for the remaining output variable are learned in the same manner. In general, the final predicted value f

_{ k }is calculated by

where w
^{(k)} is resulting trained weight matrix for net_{k}, x
^{(k)} = [x_{1}, *x*
_{2}, …, *x*
_{
d
}, f
_{
k − 1}] is input vector for *net*
_{
k
}.

#### Declining step

where **w**
^{(k)} is trained weight matrix in final inclining step, x
^{(k − 1)} = [*x*
_{1}, *x*
_{2}, …, *x*
_{
d
}, f
_{
k
}] is input vector. When the declining step has been carried out for all output variables, the cascading phase is completed. The final output values f
^{
A
} = [*f*
_{1}, *f*
_{2}, …, *f*
_{
k
}] reflect the correlations between all adjacent variables.

### Tuning phase

Tuning phase is the step for correcting the errors in the final output values from cascade phase by constructing a error-correction network. The final output values F
^{
A
} = [f
_{1}, f
_{2}, …, f
_{
k
}] in cascade phase are input nodes and target variables Y = [y
_{1}, y
_{2}, …, y
_{
k
}] are output nodes for the error-correction network. The constructed network is a type of auto associative neural network in which input nodes and output nodes are similar to each other. The necessity of tuning phase can be justified as following. In cascade phase, the output variable *f*
_{
k
} of *k*
^{th} neural network *net*
_{
k
} is utilized as an input variable of the (*k* + 1)^{th} neural network. Since every output variable accompany errors the input variable f
_{
k
} contains errors that must be corrected. It is known that error-correction networks can improve prediction performance in deep neural networks [16, 17].

## Experiments

### Data

Data Description

Input Variables | |

Patient Information | Age, Sex, Underlying Diseases, Experience of Hearing Aids, Side of Hearing Aids |

Clinical Evaluations | Unaided Pure Tone Audiometry, Unaided Hearing in noise test, Threshold per frequency, Category of hearing loss, Degree of hearing loss, Type of hearing loss, Tinnitus status, Average air conduction hearing threshold, Average bone conduction hearing threshold, Mean word recognition score |

Hearing Aid(HA) Information | Models of HA, Number of channels, Types of Has, Tinnitus treatment option, Frequency Transposition, Type of microphone, Ventilation, Feedback cancellation |

Target Variables | |

Hearing gain | PTA after wearing HAs (250Hz, 500Hz, 1KHz, 2KHz, 4KHz, 8KHz) |

To check the hearing gain, PTA and WRS tests were done before and after 8 to 12 weeks of HA fitting, which were compared and analyzed. These results were used as output variables of the proposed algorithm. The PTAs were done using hearing thresholds from 0.25 to 8 KHz, and the average of hearing thresholds of 0.5,1,2, and 3 kHz were calculated. For WRS, patients were provided with test word sounds at their most comfortable listening level and instructed to repeat or write down the words and numbers that were accurately heard and understood. These results were then converted into percentages as WRS. Although the most comfortable listening levels before and after using HAs were not the same in some cases, the WRS were still compared due to the fact that they were values under the most comfortable threshold condition.

### Correlation in target variables

### Experimental settings

*net*

_{ j }is a three layered Perceptron structure ([Input Layer]-[Hidden Layer]-[Output Layer]) having 31 input nodes, 20 ~ 30 hidden nodes on average, and one output node [7]. In other words, it has Multi-Layer Perceptron structure of [Input: 31]-[Hidden: (20 ~ 30)]-[Output: 1]. Cascade Recurring Deep Network has the structure described in Fig. 2. In the inclining step of the cascading phase, the neural network is learned in the direction from the low frequency (250 Hz) to the high frequency (8 KHz). In the declining step, the neural network is learned in the reverse direction from the high frequency (8 KHz) to the low frequency (250Hz). For instance, in the first inclining step, neural network

*net*

_{250Hz }for predicting frequency band of 250Hz is sufficiently trained. Then, network

*net*

_{500Hz }for predicting frequency band of 500Hz utilizes

*net*

_{250Hz }as a

*base learner*. For training net

_{500Hz}, the output variable

*f*

_{250Hz }of net

_{250Hz}is utilized as an input variable and the weight w

^{(250Hz)}, which has been sufficiently trained, is utilized as the initial weight for

*net*

_{500Hz }. The process progressively carried out up to frequency band of 8KHz. In the declining step, the final network

*net*

_{8KHz}in the inclining step is used as a base learner, and similar process to inclining step is carried out in reverse direction from frequency band of 8KHz down to 250Hz. In the tuning phase, neural networks are trained by setting the final outputs in cascade phase

*f*

_{250Hz },

*f*

_{500Hz }, …,

*f*

_{6KHz }as input node and target variables

*y*

_{250Hz },

*y*

_{500Hz }, …,

*y*

_{6KHZ }as output nodes. Through the cascading phase and tuning phase, we can construct a neural network that incorporates all the effects of other frequency bands on one particular frequency band. Fig. 4a shows neural networks, MLP

_{1}, that predict each of six frequency bands, where the networks are independent Multi-Layered Perceptron for one output. Figure 5b shows neural network, MLP

_{6}, that predicts 6 frequency bands at once. Six MLP

_{1}s and MLP

_{6}only consider patient information and clinical evaluations as input variables. For comparison of prediction performances, the optimal number of hidden node was selected in between 20 to 30 for each networks. The percentages of randomly sampled training, validation, and test sets were 40, 30, and 30%, respectively. The whole experiments were repeated 10 times. The prediction performance was measured with Mean Absolute Percentage Error (MAPE) where smaller values imply higher prediction performance [18]. MAPE is expressed in percentages and is given as the following:

where, n is the number of entire data, *y*
_{
i
} is target values, *f*
_{
i
} is output values.

## Results and discussion

### Results for validity of CRDN

_{1}, and MLP

_{6}) by frequency bands. CRDN shows the lowest error rates in all frequency bands (Avg. 9.2%). On the contrary, six MLP

_{1}s and MLP

_{6}showed error rates of 21.8 and 16.8% on average respectively. We could deduce that CRDN, which reflects every correlation between target variables, shows highest performance, followed by MLP

_{6}, which can reflect some proportion of correlations between target variables. In addition, the error rates of inclining and declining in CRDN are 13.1 and 10.7% on average respectively. If we compare the error rate of CRDN (Avg. 9.2%) that had been trained up to the last step, tuning phase, we could deduce that considering adjacent frequencies more often leads to improvement in performance.

The low error rate of CRDN has meaningful implications in clinical aspects. Since hearing tests only select representative frequency bands, it is difficult to measure the actual thresholds in PTA of adjacent frequency bands. It only approximates the thresholds with median of measured PTA of adjacent frequency bands. For instance, a 4 K-dip from noise-induced hearing loss not only has high thresholds in PTA for 4KHz but also affects adjacent frequency bands. Since CRDN structure considers such influence, it is bound for successful results.

### Results for utility of CRDN

#### Case I: patients with sensorineural hearing loss

Figure 7a shows a case of a patient with sensorineural hearing loss with convex type who have hearing loss in frequency bands (1KHz, 2KHz). Sensorineural hearing loss occurs due to abnormalities in the cochlea or disorders in the nerves that connect the inner ear with the brain.

For convex type, the threshold of medium frequencies is higher than that of low frequencies and high frequencies by at least 15 dB. In such cases, the patients with hearing loss cannot hear the sounds of daily conversations. To resolve the hearing loss, we applied CRDN to patients with sensorineural hearing loss. The expected degrees of hearing gains with CRDN was fitted with actual hearing gain, and the results show high satisfaction levels.

#### Case II: patients with conductive hearing loss

Figure 7b shows a case of a patient with conductive hearing loss. Conductive hearing loss results from damage to the path for delivery of sounds from the external ear to the middle ear. In the case of this patient, all sound thresholds from low frequency bands to high frequency bands exceed 65 dB and the degree of hearing loss is severe. Since the hearing thresholds for daily conversations are 20 ~ 70 dB, the patient has difficulties in daily life. The CRDN provided outstanding hearing gains by amplifying the frequencies to appropriate levels.

Although the patients have different characteristics of hearing loss, patient information and clinical evaluations for both cases, CRDN can provide suitable hearing gains for individual patients. On the basis of results, CRDN can serve as a medical assistance tool that fulfill the patients’ satisfaction levels, and helps to maximize the hearing improvement through application of HAs in the clinical fields.

## Conclusion

In this study, we propose a novel neural network algorithm that provides expected degrees of hearing gain for patients with hearing loss who wear hearing aids. The proposed algorithm is a Cascade Recurring Deep Network that reflects correlations between adjacent frequencies. This is a deep network that piles up same number of hidden layers as that of target variables such that it can be utilized for signal or sequential data. Also, it takes the structural advantages of cascade-correlation networks and recurrent neural networks.

In algorithmic perspective, CRDN has novelty in following aspects. CRDN has a scalable structure that can be applied to various signal or sequential data, and achieves faster training time since it progressively piles up the layers. CRDN reuses neural networks that previously have been trained and it not only achieves fast training time but also reflects the correlations between output variables. In addition, since CRDN uses weights that previously have been learned, it can reduce the time of learning weights in predicting new output variables. In the experiments, the mean error rate of CRDN in six frequency bands (250Hz, 500Hz, 1KHz, 2KHz, 4KHz, 8KHz) was 9.2%. When compared to other neural network models that does not reflect correlations between frequencies, it showed that the mean error rate can be reduced by 58%. In this paper, we only compared performance of CRDN with two types of MLP. As an extension of the comparison, we can consider other machine learning algorithms such as support vector machine, regularized regression analysis, etc.

In clinical perspective, since CRDN provides more accurate information on expected hearing gain after wearing hearing aids, it can reduce the gap between expected and actual experience of wearing hearing aids. Therefore, it can serve as an effective medical assistance tool. Further works include applying the proposed algorithm to other types of signal or sequential data.

## Declarations

### Acknowledgements

This material is based upon work supported by the Ministry of Trade, Industry & Energy(MOTIE, Korea) under Industrial Technology Innovation Program(10049721). HJS would like to gratefully acknowledge support from the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015R1D1A1A01057178/2012-0000994). This work was supported by the Ajou University research fund.

### Funding

Publication of this article was funded by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012-0000994).

### Availability of data and materials

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request

### Authors’ contributions

HJS designed the idea and supervised the study process. YHN analyzed the data, implemented the results and wrote the manuscript. OSC, YRL, and YHC collected data and implemented the clinical results. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Consent for publication

No personally identifying information is reported in this article

### Ethics approval and consent to participate

Following the approval from the Institutional Review Boards of the Ajou University School of Medicine, a retrospective chart review was done.

### About this supplement

This article has been published as part of *BMC Medical Informatics and Decision Making* Volume 17 Supplement 1, 2017: Selected articles from the 6th Translational Bioinformatics Conference (TBC 2016): medical informatics and decision making. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-17-supplement-1

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Walden BE, Surr RK, Cord MT. Real‐world performance of directional microphone hearing aids. Hear J. 2003;56(11):40–2.View ArticleGoogle Scholar
- Ceylan A, Celenk F, Kemaloğlu Y, Bayazıt Y, Göksu N, Özbi S. Impact of prognostic factors on recovery from sudden hearing loss. J Laryngol Otol. 2007;121(11):1035–40.View ArticlePubMedGoogle Scholar
- Mulrow CD, Tuley MR, Aguilar C. Correlates of successful hearing aid use in older adults. Ear Hear. 1992;13(2):108–13.View ArticlePubMedGoogle Scholar
- Cvorovic L, Ðeric D, Probst R, Hegemann S. Prognostic model for predicting hearing recovery in idiopathic sudden sensorineural hearing loss. Otol Neurotol. 2008;29(4):464–9.View ArticlePubMedGoogle Scholar
- Bishop CM. Neural networks for pattern recognition. Oxford: Clarendon Press; 1995.Google Scholar
- Bridle JS. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing. Springer; 1990. pp. 227-236.Google Scholar
- Auer P, Burgsteiner H, Maass W. A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Netw. 2008;21(5):786–95.View ArticlePubMedGoogle Scholar
- Mézard M, Nadal J-P. Learning in feedforward layered networks: the tiling algorithm. J Phys A Math Gen. 1989;22(12):2191.View ArticleGoogle Scholar
- Fahlman SE. The recurrent cascade-correlation architecture. 1991.Google Scholar
- Fahlman SE, Lebiere C. The cascade-correlation learning architecture. 1989.Google Scholar
- Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S. Recurrent neural network based language model. In: Interspeech. 2010. p. 3.Google Scholar
- Botvinick MM, Plaut DC. Short-term memory for serial order: a recurrent neural network model. Psychol Rev. 2006;113(2):201.View ArticlePubMedGoogle Scholar
- Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: An overview. In: 2013 IEEE International Conference on acoustics, speech and signal processing, IEEE. 2013. p. 8599–603.View ArticleGoogle Scholar
- Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning, ACM. 2007. p. 791–8.Google Scholar
- Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.View ArticlePubMedGoogle Scholar
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 2016. p. 770-8.Google Scholar
- Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. J Mach Learn Res. 2016;17(39):1–40.Google Scholar
- Makridakis S. Accuracy measures: theoretical and practical concerns. Int J Forecast. 1993;9(4):527–9.View ArticleGoogle Scholar