Cascade recurring deep networks for audible range prediction
© The Author(s). 2017
Published: 18 May 2017
Hearing Aids amplify sounds at certain frequencies to help patients, who have hearing loss, to improve the quality of life. Variables affecting hearing improvement include the characteristics of the patients’ hearing loss, the characteristics of the hearing aids, and the characteristics of the frequencies. Although the two former characteristics have been studied, there are only limited studies predicting hearing gain, after wearing Hearing Aids, with utilizing all three characteristics. Therefore, we propose a new machine learning algorithm that can present the degree of hearing improvement expected from the wearing of hearing aids.
The proposed algorithm consists of cascade structure, recurrent structure and deep network structure. For cascade structure, it reflects correlations between frequency bands. For recurrent structure, output variables in one particular network of frequency bands are reused as input variables for other networks. Furthermore, it is of deep network structure with many hidden layers. We denote such networks as cascade recurring deep network where training consists of two phases; cascade phase and tuning phase.
When applied to medical records of 2,182 patients treated for hearing loss, the proposed algorithm reduced the error rate by 58% from the other neural networks.
The proposed algorithm is a novel algorithm that can be utilized for signal or sequential data. Clinically, the proposed algorithm can serve as a medical assistance tool that fulfill the patients’ satisfaction.
As individuals have longer life expectancies, the quality of life has become more and more important nowadays as well as auditory rehabilitation of hearing impaired persons. In the recent years, the need of hearing aid(HA)s for patients with hearing loss is spreading widely however, the satisfaction levels of HAs are quite diverse among individuals. One of the many causes that affect the satisfaction levels is the composite interactions between the variables that affect the outcomes of HAs. Walden et al  investigated the correlations between the patients’ demographic information and hearing test results in 50 patients that were successfully using HAs, and determined that age is a major variable for successful HA use. In a study of acute hearing loss, 83 patients were classified into four levels in a range of 1 to 4 according to their degrees of recovery. The factors that affected the hearing recovery were analyzed using nonparametric statistical analysis methods. The results indicated that the presence of tinnitus and/or dizziness, duration of hearing loss, pure tone audiometry patterns, degree of hearing loss, and age have statistically significant effects on recovery . Although other studies have also reported the reasons for HA failure and methods to improve HA outcomes, studies with large number of patients for HAs have not been investigated.
For successful use of HAs, a highly reliable prediction model attributing to specific characteristics of HAs and frequencies is essential. Although there are algorithms for fitting of HAs developed by various HA manufacturing companies, objective information on the degree of hearing improvement obtained by different HAs in patients are sometimes inaccurate or concealed. Mulrow et al.  proposed a logistic regression model for variables such as age, education, functional limitations, and the degree of hearing loss using data from 176 patients but this model showed low accuracy of training data and testing data (75–88% and 54–84%, respectively). Cvorovic et al.  presented a multiple linear regression model with 541 Swiss patients that showed sudden sensorineural hearing loss symptoms as a diagnosis model . However, this model has limitations as it was limited to sudden sensorineural hearing loss, the validity of the model was not verified, and its applicability in the clinical field was not evaluated.
Therefore, the purpose of this study was to develop a new model that can present the expected degrees of hearing gain followed by HA fitting based on the variables that can affect the outcomes of HAs. This model is expected to fulfill the patients’ expectation levels, motivate the patients to use and manage HAs, and help to maximize the hearing improvement through application of HAs in the clinical fields.
Neural networks general
In this study, we propose a novel neural network that have three structural characteristics: cascade, recurrent, and deep network structure. Previous studies concerning each of the three structures are reviewed as the following.
When a neural network obtains high prediction performance, selecting an appropriate number of hidden nodes is important. Although the number of hidden nodes is selected by trial-and-error, in general, there has been extensive studies in determining the number of hidden nodes through algorithm. One of the representative algorithms is a cascade-correlation neural network which selects the optimal number of hidden nodes by adding a single hidden node at each step in the training process. In addition, it achieves faster training process than general neural networks [9, 10].
Neural networks in recurrent structures are frequently utilized for the prediction of time series or sequential data. For instance, in predicting k target variables, y = [y 1, y 2, … y k ], that has characteristics of time-series, the output variables are correlated with each other. In such case, higher performance can be obtained by utilizing previous step y t-1 as an input variable for prediction of t-time step y t . This is well consistent with the concept of Recurrent Neural Network that combines a general neural network with the notion of time series. In recurrent neural network, hidden nodes are utilized as storages that preserve the information in previous training step [11, 12].
Deep network structure
In Recent works, it has shown that piling up many layers in a neural network leads to improvements in prediction performance. The basic concept of deep neural networks is to pile up many hidden layers between the input layer and the output layer . However, if a neural network becomes deep with many layers, it leads to difficulty in learning weights and thus the overfitting problem . To solve the overfitting problem, the Restricted Boltzmann Machine and the Deep Belief Network can be used. The Restricted Boltzmann Machine is a model made by removing connections between layers from Boltzmann machine and updates the entire parameters by piling up hidden layers one by one . The Deep Belief Network is a model that piles up pre-trained layers with unsupervised learning .
In this study, we proposed a novel neural network algorithm by incorporating advantages of three neural network structures described above.
Cascade phase is step for training output variables in stepwise fashion. When we predict a specific output variable, other correlated output variable is utilized as an input variable. It is known that for training neural networks increase in the number of nodes leads to higher computational cost and difficulty in training the weight w . Therefore, in cascade phase, for predicting many output variables, we first construct a feedforward neural network for one output variable and the trained network is used as base leaner for constructing the another neural network. The base learner is updated as the output variables are changed. Such processes have an advantage of drastically reducing the training time by reusing previously trained feedforward neural networks. Cascade phase progresses in bi-directional order; inclining step, which is a learning process in the forward direction, and declining step, which is a learning process in the reverse direction.
where w (k) is resulting trained weight matrix for netk, x (k) = [x1, x 2, …, x d , f k − 1] is input vector for net k .
where w (k) is trained weight matrix in final inclining step, x (k − 1) = [x 1, x 2, …, x d , f k ] is input vector. When the declining step has been carried out for all output variables, the cascading phase is completed. The final output values f A = [f 1, f 2, …, f k ] reflect the correlations between all adjacent variables.
Tuning phase is the step for correcting the errors in the final output values from cascade phase by constructing a error-correction network. The final output values F A = [f 1, f 2, …, f k ] in cascade phase are input nodes and target variables Y = [y 1, y 2, …, y k ] are output nodes for the error-correction network. The constructed network is a type of auto associative neural network in which input nodes and output nodes are similar to each other. The necessity of tuning phase can be justified as following. In cascade phase, the output variable f k of k th neural network net k is utilized as an input variable of the (k + 1)th neural network. Since every output variable accompany errors the input variable f k contains errors that must be corrected. It is known that error-correction networks can improve prediction performance in deep neural networks [16, 17].
Age, Sex, Underlying Diseases, Experience of Hearing Aids, Side of Hearing Aids
Unaided Pure Tone Audiometry, Unaided Hearing in noise test, Threshold per frequency, Category of hearing loss, Degree of hearing loss, Type of hearing loss, Tinnitus status, Average air conduction hearing threshold, Average bone conduction hearing threshold, Mean word recognition score
Hearing Aid(HA) Information
Models of HA, Number of channels, Types of Has, Tinnitus treatment option, Frequency Transposition, Type of microphone, Ventilation, Feedback cancellation
PTA after wearing HAs (250Hz, 500Hz, 1KHz, 2KHz, 4KHz, 8KHz)
To check the hearing gain, PTA and WRS tests were done before and after 8 to 12 weeks of HA fitting, which were compared and analyzed. These results were used as output variables of the proposed algorithm. The PTAs were done using hearing thresholds from 0.25 to 8 KHz, and the average of hearing thresholds of 0.5,1,2, and 3 kHz were calculated. For WRS, patients were provided with test word sounds at their most comfortable listening level and instructed to repeat or write down the words and numbers that were accurately heard and understood. These results were then converted into percentages as WRS. Although the most comfortable listening levels before and after using HAs were not the same in some cases, the WRS were still compared due to the fact that they were values under the most comfortable threshold condition.
Correlation in target variables
where, n is the number of entire data, y i is target values, f i is output values.
Results and discussion
Results for validity of CRDN
The low error rate of CRDN has meaningful implications in clinical aspects. Since hearing tests only select representative frequency bands, it is difficult to measure the actual thresholds in PTA of adjacent frequency bands. It only approximates the thresholds with median of measured PTA of adjacent frequency bands. For instance, a 4 K-dip from noise-induced hearing loss not only has high thresholds in PTA for 4KHz but also affects adjacent frequency bands. Since CRDN structure considers such influence, it is bound for successful results.
Results for utility of CRDN
Case I: patients with sensorineural hearing loss
Figure 7a shows a case of a patient with sensorineural hearing loss with convex type who have hearing loss in frequency bands (1KHz, 2KHz). Sensorineural hearing loss occurs due to abnormalities in the cochlea or disorders in the nerves that connect the inner ear with the brain.
For convex type, the threshold of medium frequencies is higher than that of low frequencies and high frequencies by at least 15 dB. In such cases, the patients with hearing loss cannot hear the sounds of daily conversations. To resolve the hearing loss, we applied CRDN to patients with sensorineural hearing loss. The expected degrees of hearing gains with CRDN was fitted with actual hearing gain, and the results show high satisfaction levels.
Case II: patients with conductive hearing loss
Figure 7b shows a case of a patient with conductive hearing loss. Conductive hearing loss results from damage to the path for delivery of sounds from the external ear to the middle ear. In the case of this patient, all sound thresholds from low frequency bands to high frequency bands exceed 65 dB and the degree of hearing loss is severe. Since the hearing thresholds for daily conversations are 20 ~ 70 dB, the patient has difficulties in daily life. The CRDN provided outstanding hearing gains by amplifying the frequencies to appropriate levels.
Although the patients have different characteristics of hearing loss, patient information and clinical evaluations for both cases, CRDN can provide suitable hearing gains for individual patients. On the basis of results, CRDN can serve as a medical assistance tool that fulfill the patients’ satisfaction levels, and helps to maximize the hearing improvement through application of HAs in the clinical fields.
In this study, we propose a novel neural network algorithm that provides expected degrees of hearing gain for patients with hearing loss who wear hearing aids. The proposed algorithm is a Cascade Recurring Deep Network that reflects correlations between adjacent frequencies. This is a deep network that piles up same number of hidden layers as that of target variables such that it can be utilized for signal or sequential data. Also, it takes the structural advantages of cascade-correlation networks and recurrent neural networks.
In algorithmic perspective, CRDN has novelty in following aspects. CRDN has a scalable structure that can be applied to various signal or sequential data, and achieves faster training time since it progressively piles up the layers. CRDN reuses neural networks that previously have been trained and it not only achieves fast training time but also reflects the correlations between output variables. In addition, since CRDN uses weights that previously have been learned, it can reduce the time of learning weights in predicting new output variables. In the experiments, the mean error rate of CRDN in six frequency bands (250Hz, 500Hz, 1KHz, 2KHz, 4KHz, 8KHz) was 9.2%. When compared to other neural network models that does not reflect correlations between frequencies, it showed that the mean error rate can be reduced by 58%. In this paper, we only compared performance of CRDN with two types of MLP. As an extension of the comparison, we can consider other machine learning algorithms such as support vector machine, regularized regression analysis, etc.
In clinical perspective, since CRDN provides more accurate information on expected hearing gain after wearing hearing aids, it can reduce the gap between expected and actual experience of wearing hearing aids. Therefore, it can serve as an effective medical assistance tool. Further works include applying the proposed algorithm to other types of signal or sequential data.
This material is based upon work supported by the Ministry of Trade, Industry & Energy(MOTIE, Korea) under Industrial Technology Innovation Program(10049721). HJS would like to gratefully acknowledge support from the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015R1D1A1A01057178/2012-0000994). This work was supported by the Ajou University research fund.
Publication of this article was funded by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012-0000994).
Availability of data and materials
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request
HJS designed the idea and supervised the study process. YHN analyzed the data, implemented the results and wrote the manuscript. OSC, YRL, and YHC collected data and implemented the clinical results. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
No personally identifying information is reported in this article
Ethics approval and consent to participate
Following the approval from the Institutional Review Boards of the Ajou University School of Medicine, a retrospective chart review was done.
About this supplement
This article has been published as part of BMC Medical Informatics and Decision Making Volume 17 Supplement 1, 2017: Selected articles from the 6th Translational Bioinformatics Conference (TBC 2016): medical informatics and decision making. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-17-supplement-1
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Walden BE, Surr RK, Cord MT. Real‐world performance of directional microphone hearing aids. Hear J. 2003;56(11):40–2.View ArticleGoogle Scholar
- Ceylan A, Celenk F, Kemaloğlu Y, Bayazıt Y, Göksu N, Özbi S. Impact of prognostic factors on recovery from sudden hearing loss. J Laryngol Otol. 2007;121(11):1035–40.View ArticlePubMedGoogle Scholar
- Mulrow CD, Tuley MR, Aguilar C. Correlates of successful hearing aid use in older adults. Ear Hear. 1992;13(2):108–13.View ArticlePubMedGoogle Scholar
- Cvorovic L, Ðeric D, Probst R, Hegemann S. Prognostic model for predicting hearing recovery in idiopathic sudden sensorineural hearing loss. Otol Neurotol. 2008;29(4):464–9.View ArticlePubMedGoogle Scholar
- Bishop CM. Neural networks for pattern recognition. Oxford: Clarendon Press; 1995.Google Scholar
- Bridle JS. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing. Springer; 1990. pp. 227-236.Google Scholar
- Auer P, Burgsteiner H, Maass W. A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Netw. 2008;21(5):786–95.View ArticlePubMedGoogle Scholar
- Mézard M, Nadal J-P. Learning in feedforward layered networks: the tiling algorithm. J Phys A Math Gen. 1989;22(12):2191.View ArticleGoogle Scholar
- Fahlman SE. The recurrent cascade-correlation architecture. 1991.Google Scholar
- Fahlman SE, Lebiere C. The cascade-correlation learning architecture. 1989.Google Scholar
- Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S. Recurrent neural network based language model. In: Interspeech. 2010. p. 3.Google Scholar
- Botvinick MM, Plaut DC. Short-term memory for serial order: a recurrent neural network model. Psychol Rev. 2006;113(2):201.View ArticlePubMedGoogle Scholar
- Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: An overview. In: 2013 IEEE International Conference on acoustics, speech and signal processing, IEEE. 2013. p. 8599–603.View ArticleGoogle Scholar
- Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning, ACM. 2007. p. 791–8.Google Scholar
- Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.View ArticlePubMedGoogle Scholar
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 2016. p. 770-8.Google Scholar
- Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. J Mach Learn Res. 2016;17(39):1–40.Google Scholar
- Makridakis S. Accuracy measures: theoretical and practical concerns. Int J Forecast. 1993;9(4):527–9.View ArticleGoogle Scholar