Data mining of audiology patient records: factors influencing the choice of hearing aid type
 Muhammad N Anwar^{1}Email author and
 Michael P Oakes^{1}
https://doi.org/10.1186/1472694712S1S6
© Anwar and Oakes; licensee BioMed Central Ltd. 2012
Published: 30 April 2012
Abstract
Background
This paper describes the analysis of a database of over 180,000 patient records, collected from over 23,000 patients, by the hearing aid clinic at James Cook University Hospital in Middlesbrough, UK. These records consist of audiograms (graphs of the faintest sounds audible to the patient at six different pitches), categorical data (such as age, gender, diagnosis and hearing aid type) and brief free text notes made by the technicians. This data is mined to determine which factors contribute to the decision to fit a BTE (worn behind the ear) hearing aid as opposed to an ITE (worn in the ear) hearing aid.
Methods
From PCA (principal component analysis) four main audiogram types are determined, and are related to the type of hearing aid chosen. The effects of age, gender, diagnosis, masker, mould and individual audiogram frequencies are combined into a single model by means of logistic regression. Some significant keywords are also discovered in the free text fields by using the chisquared (χ^{2}) test, which can also be used in the model. The final model can act a decision support tool to help decide whether an individual patient should be offered a BTE or an ITE hearing aid.
Results
The final model was tested using 5fold cross validation, and was able to replicate the decisions of audiologists whether to fit an ITE or a BTE hearing aid with precision in the range 0.79 to 0.87.
Conclusions
A decision support system was produced to predict the type of hearing aid which should be prescribed, with an explanation facility explaining how that decision was arrived at. This system should prove useful in providing a "second opinion" for audiologists.
Background
This research looks for factors influencing the choice between two common hearing aid types: BTE (worn behind the ear) or ITE (worn in the ear). This choice is typically made by audiology technicians working in outpatient clinics, on the basis of audiogram results and consultation with the patient. In many cases, the choice is clear cut, but at other times the technicians might benefit from a second opinion given by an automatic system with an explanation of how that second opinion was arrived at. The production of such a decision support system is the main goal of this paper. Our data set is unusual in that ITE hearing aids are not generally available on the British National Health Service in England, as they are more expensive than BTE hearing aids. However, both types of aid are prescribed at James Cook University Hospital in Middlesbrough, UK. The data, collected between 1992 and 2001, consists of the following types of records:

Audiograms (graphs of the auditory thresholds, or faintest sounds audible to the patient at six different pitches or frequencies, where 0 shows perfect hearing and higher thresholds show impaired hearing), e.g., 40, 35, 35, 35, 85, 70, 15, 20, 20, 30, 55, where the first six values are AC (air conduction) and the last five are for BC (bone conduction). AC is measured by placing headphones over the ears, and determines the overall level of hearing. BC is measured by placing the sound source tightly on the mastoid bone behind the ear, and measures the level of hearing of the inner part of the ear. A constraint on the data is that BC must always be the same or better than AC. The difference between the AC and the BC is called the airbone gap, and measures the hearing ability of the middle and outer parts of the ear.

Categorical data (such as gender, diagnosis and hearing aid type), e.g., M, TINNITUS, BE18.

Brief free text notes made by the technicians, e.g., IMPS. TAKEN FOR BINAURAL AIDS., where IMPS is an abbreviation for "impressions", and BINAURAL means "worn in both ears".
Methods
Principal component analysis on audiograms
Component coefficient vectors of PCA
PC1  PC2  PC3  PC4  

AC250  0.3001  0.3811  0.2988  0.1677 
AC500  0.3218  0.3619  0.2754  0.0166 
AC1000  0.3410  0.1999  0.2427  0.2643 
AC2000  0.3436  0.1440  0.1910  0.2697 
AC4000  0.3031  0.3673  0.2409  0.1742 
AC8000  0.2722  0.3186  0.2629  0.4684 
BC250  0.2510  0.2304  0.4890  0.5087 
BC500  0.2942  0.2404  0.4152  0.0846 
BC1000  0.3189  0.0760  0.3052  0.3595 
BC2000  0.3028  0.2699  0.2419  0.4088 
BC4000  0.2516  0.4870  0.2219  0.1299 
Each individual patient audiogram was classified into one of the main audiogram types identified, according to least Euclidean distance. A chisquared test was then performed to determine whether there was any association between the audiogram class of each patient and the type of hearing aid worn. This test was done on the set of 7,437 records where all AC and BC thresholds were available for the right ear, and either a BTE or an ITE aid was specified. In the final logistic regression model, rather than simply using the identified broad audiogram types, each individual hearing threshold was used.
Use of the chisquared test to discover other factors related to hearing aid type
In the previous section, it was shown that the choice of hearing aid type was related to the shape of the audiogram. This section describes how the simple chisquared test was used to discover which of the category data fields were significantly associated with the choice of hearing aid type, and also to discover freetext keywords which were significantly associated with either BTE or ITE hearing aids.
Observed and expected frequencies for ITE/BTE aid with gender
Hearing aid type  Male  Female  Row total 

BTE  3196 (3369.38) [8.92]  3850 (3676.62) [8.17]  7046 
ITE  3647 (3473.62) [8.66]  3617 (3790.38) [7.93]  7264 
Column total  6843  7467  14310 
Most significant positive and negative keywords in records with BTE/ITE aid [11]
Positive keywords  Negative keywords  

BTE  mould, be34, map, gp, 92, audio, inf, be52, ref, staff, reqd, be36, contact  fta, reshel, appt, it, nn, nfa, 2001, rev, lacquer, hn, km, imp, review, 2000 
ITE  fta, reshel, appt, it, nn, nfa, 2001, rev, lacquer, hn, km, imp, review, 2000, nh, vent, progress, aid, dt, taken  mould, be34, map, gp, 92, audio, inf, be52, ref, staff, reqd, be36, contact, tri, n, order 
Logistic regression (LR) model for ITE/BTE right ear hearing aids
In our case p is the probability that the patient should be fitted with an ITE aid, while (1 p) is the probability that the patient should be given a BTE aid. b_{0} is a constant, and b_{1} to b_{k} are called the coefficients of the model. The values x_{1} to x_{k} are all either 1 or 0, depending on whether a given attribute in the patient's record is present or absent. The overall value L is greater than 0 if it is more likely that the patient should be given a BTE aid, while it is less than 0 if it is more likely that the patient should be given an ITE aid.
Logistic regression for BC4000
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.09  0.08  1.12  0.26 
BC4000_ind1  0.15  0.11  1.33  0.18 
BC4000_ind2  0.20  0.09  2.12  0.03 
BC4000_ind3  0.09  0.09  1.01  0.31 
Logistic regression for age
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.08  0.05  1.49  0.13 
Age_ind1  0.13  0.08  1.73  0.08 
Age_ind2  0.26  0.08  3.48  0.00 
Age_ind3  0.14  0.08  1.88  0.06 
Logistic regression for diagnosis
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.37  0.39  0.96  0.34 
Diagnosis  1.05  0.44  2.37  0.02 
Logistic regression for masker
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.41  0.25  1.60  0.11 
Masker_{(No_masker, OTHERS)}  0.91  0.50  1.83  0.07 
Logistic regression for keywords
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.16  0.03  5.63  0.00 
APPT  0.06  0.15  0.37  0.71 
FTA  0.77  0.19  4.05  0.00 
GP  0.62  0.13  4.75  0.00 
MAP  2.32  0.53  4.39  0.00 
NFA  0.93  0.32  2.93  0.00 
REV  0.12  0.10  1.12  0.26 
Results
Principal component analysis (PCA)
The coefficients of the first PC (PC1) were all negative and approximately equal. This suggests that the main source of variation between the patients was simply the overall degree of hearing loss. The coefficients of the second PC (PC2) were negative for frequencies at or below 1000 Hz, but positive for higher frequencies, for both air and bone conduction, and thus differentiated patients according to whether they have a predominantly high frequency or low frequency hearing loss. The coefficients of the third PC (PC3) were positive for air conduction at all frequencies, but negative for bone conduction, showing a contrast between patients with and without an airbone gap. The fourth component (PC4) was similar to the third, but corresponded to an airbone gap at low frequencies. No clear patterns were seen for the fifth or subsequent principal components. The first four PCs corresponded to audiogram types frequently encountered in audiology clinics. The percentage of the overall variability in the data explained by the first four principal components respectively was 59.5, 13.4, 9.7, and 5.2, giving a total of 87.8%.
The thresholds corresponding to the first four Principal Components
Principal Component (PC)  Frequency (in Hz)  

250  500  1000  2000  4000  8000  
PC1: Flat hearing loss  42  41  40  39  42  44 
45  45  42  42  45  
PC2: High tone sensorineural loss  37  38  48  69  82  79 
46  46  55  76  89  
PC3: Airbone gap (flat)  78  77  75  71  75  76 
31  35  42  45  47  
PC4: Airbone gap (predominant at low tone)  50  59  76  76  50  32 
29  55  82  85  52 
Observed values (O)
Hearing aid type  PCA1  PCA2  PCA3  PCA4 

ITE  2036  1341  476  75 
BTE  1119  1166  1165  59 
Expected values (E)
Hearing aid type  PCA1  PCA2  PCA3  PCA4 

ITE  1666.38  1324.12  866.73  70.77 
BTE  1488.62  1182.88  774.27  63.23 
(OE)^{ 2 }/E values
Hearing aid type  PCA1  PCA2  PCA3  PCA4 

ITE  81.99  0.22  176.14  0.25 
BTE  91.78  0.24  197.18  0.28 
Chisquared test
The contingency table showing the relationship between gender and hearing aid type is shown in Table 12. The raw counts are given at the top of each cell, where for example there were 3196 male patients who wore BTE hearing aids. In each cell the Observed frequencies (O) are not enclosed in brackets, Expected frequencies (E) are in () and the quantity (O  E)^{ 2 } / E is in []. The overall chisquared value (the sum of the values in [] for all four cells) was 33.68, which for one degree of freedom is significant at p < 0.001. Males tended more to use ITE hearing aids and females tended more to use BTE hearing aids. For the relationship between hearing aid type and a diagnosis of tinnitus (ringing in the ear), the overall chisquared value was 31.75, again significant at p < 0.001 for one degree of freedom. Patients with tinnitus tended more to wear ITE hearing aids. The relationship between the wearing of a tinnitus masker (a soothing sound source designed to drown out tinnitus) and hearing aid type, among patients diagnosed with tinnitus, had the overall chisquared value of 17.16, which for one degree of freedom, was also significant at p < 0.001. The data for the crosstabulation of hearing aid type and age produced the overall chisquared value of 10.53, which for one degree of freedom, showed significance at p < 0.001. Mould type was also crosstabulated with hearing aid type and the overall chisquared value was 9844.18, which for 30 degrees of freedom was significant at p < 0.001. Thus all the category data types were significantly associated with hearing aid type. All the data in the patient records was used without considering confounding effects, where for example it might have been the choice of hearing aid type affecting the choice of mould, rather than vice versa. It is believed that this may have been the case, since many mould types never occurred in conjunction with one or the other hearing aid type.
The set of freetext keywords which tended to occur significantly more and less often (called positive and negative keywords respectively) in records where the patient wore either BTE or ITE aids are shown in Table 3. The association between these keywords and one or other type of hearing aid suggests the following: BTE aids were associated with high gain (amplification), e.g., be34, be36 and be52, and cases where changes had been made to the ear mould. ITE hearing aid types tended to use lacquer, had vents, required reshelling of ear impressions, had changes made to the hearing aid itself, were reviewed and the wearers were making progress.
Logistic regression (LR) model
Logistic regression for gender
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.23  0.04  5.93  0 
Gender  0.16  0.05  3.08  0 
Logistic regression for AC250
Regression coefficient b  Standard error se(b)  Z  P  

Constant  0.72  0.04  17.23  0 
AC250_ind1  0.54  0.07  8.15  0 
AC250_ind2  1.29  0.07  17.26  0 
AC250_ind3  2.18  0.12  17.91  0 
Predicted Log odds for AC250
AC250 group  Logistic regression equation  Predicted log odds 

0<AC250< = 40  Log odds = b_{constant}  0.72 
40<AC250< = 55  Log odds = b_{constant} + b_{AC250_ind1}  0.18 
55<AC250< = 75  Log odds = b_{constant} + b_{AC250_ind2}  0.57 
75<AC250  Log odds = b_{constant} + b_{AC250_ind3}  1.45 
Logistic regression  worked example
Candidate variables (database record)  Actual values  Predicted log odds  Overall predicted log odds 

Age  71  Notsignificant  0 
Gender  Male  0.23  0.23 
AC250  75  0.57  0.34 
AC500  70  0.72  1.06 
AC1000  80  2.08  3.14 
AC2000  90  1.19  4.33 
AC4000  100  0.40  4.73 
AC8000  100  0.09  4.82 
BC250  40  0.03  4.79 
BC500  60  0.56  5.35 
BC1000  65  0.56  5.91 
BC2000  70  0.14  6.05 
BC4000  70  Notsignificant  6.05 
Diagnosis  Tinnitus  Notsignificant  6.05 
Hearing aid type  BTE  To be found  6.05 
Masker  No masker  Notsignificant  6.05 
Mould  2107  4.09  10.14 
Freetext words  REV  0.16+0.12 = 0.04  10.1 
Overall results
Results  Number of records  Percentage 

Similar  1170  81.64 
Notsimilar  263  18.35 
Total  1433 
ITE/BTE aid Precision, Recall, Fscore
ITE  BTE  

Precision  0.81  0.82 
Recall  0.86  0.76 
Fscore  0.84  0.79 
ITE/BTE aid predicted results
Machine results (logistic regression model)  Human (actual data)  

ITE  BTE  Total  
ITE  676 (86%)  106 (14%)  782 
BTE  157 (24%)  494 (76%)  651 
Total  833  600  1433 
The theoretical upper bound of classifier performance is the interannotator agreement [2], in our case the rate at which two expert audiologists would assign the same hearing aid to the same patient. Unfortunately, we do not have data on this. Fivefold cross validation (repeated subsampling of the data to produce five nonoverlapping test sets for an unbiased estimation of model accuracy) was performed. The overall similarity was in the range 82 to 85%, precision was in the ranges 0.79 to 0.87 for ITE and 0.82 to 0.85 for BTE, recall was 0.84 to 0.88 for ITE and 0.74 to 0.85 for BTE, and the F measure was 0.83 to 0.86 for ITE and 0.79 to 0.83 for BTE. For most of the crossvalidation runs BC2000, BC4000, Diagnosis and Masker were discarded from the model, since these variables have p values of more than 0.05 for their constants. These results show that for each run, both the final model and the success rates were similar.
Discussion
Although this LR model did not find age as a significant factor, Meredith and Stephens [7] have found that the ITE hearing aid presents handling problems only in subjects over 75 years of age. Dillon [8] also found that BTE aids are easier to operate as they are larger in size, and thus would be more popular with older people. The literature shows men and women preferring the two types for different reasons. Martin, et al. [9] found that more males choose ITE aids than females, because they perceive them to be a more advanced technology  though in reality the same makes and specifications are available in both styles, and neither model is more advanced than the other. They also found more females reporting that ITE aids are easier to handle than BTE. Mueller, et al. [10] found no difference in how embarrassed males and females feel about using a BTE aid. This LR model did not include diagnosis (as mentioned above for Table 6), although the authors previously found [11] that there was a significant association between the choice of BTE hearing aid and a diagnosis other than tinnitus (ringing in the ear), by using the chisquared test. We also found, by using the chisquared test that BTE hearing aids were atypical of tinnituswithmasker. Other factors mentioned in the literature which could not be tested with this data are the greater cosmetic acceptability of the smaller ITE aids, comfort in wear, ease of use with spectacles, and sound quality [12].
Conclusions
The associations between hearing aid type and audiogram type were confirmed by both the PCA/chisquared and LR experiments described in this paper, and also by the authors' previous work on associations between words found in the database and hearing aid type, and the previous findings by audiologists [3]. These approaches will form the basis for an audiology decision support system, where unseen patient records would be presented to the system, and the relative probability that the patient should be fitted with an ITE aid as opposed to a BTE hearing aid would be returned. The advantage of these techniques for the combination of evidence is that it is easy to see which variables contributed to the final decision.
It is planned to validate these results by obtaining feedback from a professional audiologist, and by using an approach (Bayesian networks) which constructs model with interaction between variables. A major advantage of both Naïve Bayes and Logistic Regression is that they enable an explanation facility to be incorporated into any decision support tool, since it is easy to read back and see exactly which variables contributed exactly how much to the final decision of whether to fit a BTE aid or an ITE aid.
Declarations
Acknowledgements
We wish to thank Maurice Hawthorne, Graham Clarke and Martin Sandford at the Ear, Nose and Throat Clinic at James Cook University Hospital in Middlesbrough, UK, for making the large set of audiology records available to us.
This article has been published as part of BMC Medical Informatics and Decision Making Volume 12 Supplement 1, 2012: Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBio 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcmedinformdecismak/supplements/12/S1.
Authors’ Affiliations
References
 Porter MF: An algorithm for suffix stripping. Program. 1980, 14 (3): 130137. 10.1108/eb046814.View ArticleGoogle Scholar
 Altman DG: Practical Statistics for Medical Research. 1991, Chapman & Hall, 351358. 403404Google Scholar
 Stephens SD: Hearingaid selection: an integrated approach. Br J Audiol. 1984, 18: 199210. 10.3109/03005368409078949.View ArticlePubMedGoogle Scholar
 Anwar MN, Oakes MP, Wermter S, Heinrich S: Clustering audiology data. 19th Annual BelgianDutch Conference on Machine Learning (BeneLearn 2010); Leuven, Belgium. 2010, (Last accessed: 8th August 2011), [http://dtai.cs.kuleuven.be/events/Benelearn2010/submissions/benelearn2010_submission_7.pdf]Google Scholar
 Manning CD, Raghavan P, Schütze H: Introduction to Information Retrieval. 2008, Cambridge University Press, 142144.View ArticleGoogle Scholar
 Manning CD, Schütze H: Foundations of Statistical Natural Language Processing. 1999, Cambridge, Massachusetts, London, England: The MIT Press, 233234.Google Scholar
 Meredith R, Stephens D: Intheear and behindtheear hearing aids in the elderly. Scand Audiol. 1993, 22: 211216. 10.3109/01050399309047471.View ArticlePubMedGoogle Scholar
 Dillon H: Hearing Aids. 2001, Boomerang Press, 282284.Google Scholar
 Martin H, Kane S: Do NHS patients still want ITE aids? (Poster), South Tees Hospitals, NHS Trust. British Academy of Audiology Conference (UK). 2008, 5Google Scholar
 Mueller GH, Budinger AC: Selection of hearing aid style. Hearing Instrumentation and Technology. 1990, 2: 510.Google Scholar
 Anwar MN, Oakes MP, McGarry K: Chisquared, Yule's Q and likelihood ratios in tabular audiology data. Electrical Engineering and Applied Computing. Edited by: Ao SL, Gelman L. 2011, Dordrecht: Springer Netherlands, 90: 465376.View ArticleGoogle Scholar
 Brooks DN: Some factors influencing the choice of type of hearing aid in the UK: behindtheear or intheear. Br J Audiol. 1994, 28: 9198. 10.3109/03005369409077919.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.