Skip to main content

Deafness mutation mining using regular expression based pattern matching



While keyword based queries of databases such as Pubmed are frequently of great utility, the ability to use regular expressions in place of a keyword can often improve the results output by such databases. Regular expressions can allow for the identification of element types that cannot be readily specified by a single keyword and can allow for different words with similar character sequences to be distinguished.


A Perl based utility was developed to allow the use of regular expressions in Pubmed searches, thereby improving the accuracy of the searches.


This utility was then utilized to create a comprehensive listing of all DFN deafness mutations discussed in Pubmed records containing the keywords "human ear".

Peer Review reports


Biological research has yielded a vast amount of research data, which can often provide novel insights when the data can be viewed in an aggregated fashion, and thus recent studies have employed computational methods of information extraction from the biomedical literature. These studies have dealt with a wide range of information extractions, including the names of genes and proteins[1], intermolecular relationships [2], and molecular biological descriptors [3].

Pubmed currently catalogs citation and abstract information for over 4,400 biomedical research journals and houses a citation database of over 12.8 million citations[4]. With any database of this size the return of relevant query results is often a difficult task, given the large number of potential matches there likely are for any single query term. These difficulties are compounded even further, given that Pubmed records are all natural language records and searches cannot readily be conducted using a predefined set of terms, as is the case for many relational databases. Thus Pubmed employs a word-matching algorithm, which seeks to match query words to the contents of citation records, and will return all records containing that word in their order of publication starting with the most recent.

For certain types of queries, such as mutations, basic word matching is an ineffective search strategy, since an effective query cannot be specified as a single word, but rather is better expressed as a textual pattern, such as [Residue] [Position] [MutantResidue] [5]. The use of textual pattern matching, however, has a wide array of uses that extend beyond the location of mutations within Pubmed records, and include the ability to distinguish between articles which discuss pKa values as opposed to articles that discuss Protein Kinase A (PKA), which would both be yielded by a Pubmed search for the "pKa" word. These above examples, illustrate the two major applications that text patterns offer to Pubmed searching; 1) the identification of elements that cannot be specified by a single word and 2) distinguishing between two different words that are comprised of a similar sequence of characters [6]. Textual patterns are commonly matched via the use of regular expressions and studies that involve the extraction of biochemical mutation data from biomedical literature have demonstrated a high degree of success [5, 7]. This study seeks to develop a Perl based utility Perl Regular Expressions for Pubmed (, See Additional File 1, which allows the searching of Pubmed citation records for the presence of textual patterns and for the placement of match containing records into an HTML formatted output file. This Perl based utility will then be utilized to construct a comprehensive listing of DFN mutations discussed in Pubmed records containing the "human ear" keywords.


The PREP utility

The script interacts with Pubmed via NCBI's E-Utilities interface [4] and the LWP module handles all HTTP based communication. The script begins by using the ESearch method to query Pubmed for all records containing a user defined search term, such as "lysozyme" or "HIV". Pubmed ID numbers of all matching records are temporarily stored on the Pubmed server and can be accessed using the EFetch method and an assigned Web environment variable and query key, which is returned by the ESearch method. Records returned by the EFetch method are requested in XML format, since the well-defined hierarchical structures of XML documents greatly simplifies parsing tasks [6]. This script makes use of the XML::LibXML Perl module for XML parsing, and from each Pubmed record the title of the article, the journal information, the abstract, and the Pubmed ID of the record, are extracted, based on their corresponding XML tag names.

A user specified regular expression is then used to search the abstract and title fields of each record and look for a textual pattern match. Only the title and abstract fields are searched, since these are the fields in which pattern matches are most likely to be found, and the elimination of other fields reduces the potential for false positives. If a match occurs the journal information, the abstract, and title are output to an HTML file (Figure 1).

Figure 1
figure 1

A screen shot of the PREP program output for a search for DFN deafness.

The title is output in the format of a hyperlink to the Pubmed record that corresponds to that article, to allow for easy retrieval of any additional information pertaining to the article that the output file does not provide or in certain cases easy retrieval of the entire article. The generation of an HTML output allows for the results to be easily shared among users working on disparate computing platforms. Records that contain no matches to the text pattern of interest are not written to an output file. On an AMD Athlon 2000+ the PREP script can process an average of 500 abstracts per minute.

PREP can be run from the command line of any Linux or Unix machine that has the XML::Lib::XML Perl module installed. The regular expression used within the script is modified by changing the value of the $regex variable within the script, as indicated by the code documentation. Command line script execution can be initiated using the standard Perl command line syntax of "perl Keywords". The utility was chosen to be implemented in a command line fashion since this makes the utility suitable for easy inclusion in more comprehensive data mining scripts where the search functionality of PREP may provide useful.

Validation of utility

As a test of the specificity obtainable by the PREP script, all Pubmed records that resulted from a search for the word "lysozyme" were checked for pattern matches to the Protein Kinase A abbreviation "PKA" by using the regular expression PKA within the PREP script. At the time the test was conducted there were 19,964 records returned, and the PREP script indicated that only 3 records contained the textual pattern "PKA". These findings were manually confirmed by going through all records returned by the lysozyme search. The textual pattern "pKa", however, is actually fairly common throughout the lysozyme record set, and the PREP script successfully eliminated these "pKa" containing records from the search results, whereas the standard Pubmed keyword search is unable to accomplish this. Thus, this test is indicative that with a well-formed regular expression a high degree of specificity and search refinement can be achieved between different words with like character compositions. While no false positives were noted during the manual confirmation of these search results, the potential source of false negatives for this search would be abstracts that discussed Protein Kinase A without mentioning the abbreviation PKA.

The ability of the PREP script to identify elements that cannot be specified as a single word was tested by searching for mutations in the records returned by a search for "hen egg white lysozyme" using the regular expression:



This expression allows for the identification of mutations written out in both the single letter amino acid notation as well as the three-letter notation. The "hen egg white lysozyme" search of Pubmed yielded 1146 records of which PREP identified 62 as matching the above regular expression pattern, and were manually confirmed. Of these 62 matches, 36 (58%) records contained actual mutations while the remaining 42% contained false positives, such as the abbreviation for T4 Lysozyme (T4L). In order to lessen the percentage of false positives, the false positives were examined and it became apparent that many of the same false positives occurred in repeated records. Thus a simple filter was created by defining a second regular expression, which explicitly matched the false positives, and prevented them from being recorded in the program output, thereby eliminating these repeating false positives. In this manner, the total number of PREP matches was reduced to 47, raising the percentage of valid positives up to 77% and reducing the number of false negatives to 23%. This is indicative that the PREP script can be an effective tool in reducing the search space necessary for manual processing by taking the 1146 initial records and narrowing down the list of possible records to 47, or 4% of the original search space. It should be further noted that the PREP script did not miss any records that contained matching patterns within the data set, and that the DFN prefix associated with deafness mutations is less likely to turn up false positives than the more generalized pattern associated with biochemical mutation data. This validation exercise, though, does demonstrate the utility of an application specific filter as a means of reducing false positives where warranted.


The textual pattern DFN [A-Z]\d+ was defined, where [A-Z] could be any letter between A and Z and \d+ could be a combination of one or more numeric digits and this pattern used to search through records returned by a PubMed search for the keywords "human ear". The search yielded 61,371 Pubmed records and out of those 117 contained a pattern match. All of the pattern matches corresponded to valid DFN deafness mutations and no false positives were returned. The DFN mutation found in the 117 matching records are summarized in Table 1. In cases where multiple records discussed a mutation, a representative record is listed in the source field, rather than every record, to limit table length.

Table 1 Mutations located in the PREP program results

Discussion & conclusion

The PREP script was able to process 61,371 Pubmed records displaying the keywords "human ear" and narrow the relevant search space down to 117 articles that contain different DFN deafness mutations. This is slightly less than 0.2% of the original search space, demonstrating the utility of pattern matching in aiding researchers in obtaining relevant information from the biomedical literature. Furthermore, the lack of false positives among the returned results demonstrates that the accuracy and utility of this approach can be further enhanced when the defined pattern possesses a high degree of specificity. The PREP approach to literature searching would therefore allow researchers to uncover a diversity of information pertaining to deafness mutations in a single search, whereas uncovering the same 45 DFN deafness mutations (Table 1) by standard keyword searches would take considerably more time and effort. However, when utilizing such an approach to literature searching, in addition to false positives, it is important to carefully consider the keywords presented to Pubmed. For example, this "human ear" keyword search failed to uncover the DFNB35 mutation [8] since it does not appear in an abstract that contains the words "human" and "ear". Potential sources of false negatives among search results include papers that do not utilize the DFN based nomenclature to discuss the mutation or articles that mention the abbreviation in the text, but not the abstract. Based on the validation tests, however, the false negative rate is expected to be low. Even with these limitations, however, the textual pattern based search methodology presented here can be of great value to researchers in the otolaryngological sciences as well as in other biomedical disciplines, since regular expressions can also be created to match other biological patterns, such as DNA or protein sequences, ions, enzyme names, and numerous other possibilities.

Availability & requirements

Project Name: PREP: Perl Regular Expressions for PubMed

Project Home Page:

Operating Systems: Linux/Unix

Programming Language: Perl

Other Requirements: XML::LibXML Perl Module

License: Perl Artistic License


  1. Leonard JE, Colombe JB, Levy JL: Finding relevant references to genes and proteins in Medline using a Bayesian approach. Bioinformatics. 2002, 18: 1515-1522. 10.1093/bioinformatics/18.11.1515.

    Article  CAS  PubMed  Google Scholar 

  2. Yoshida M, Fukuda K, Takagi T: PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary. Bioinformatics. 2000, 16: 169-175. 10.1093/bioinformatics/16.2.169.

    Article  CAS  PubMed  Google Scholar 

  3. Andrade MA, Bork P: Automated extraction of information in molecular biology. FEBS Lett. 2000, 476: 12-17. 10.1016/S0014-5793(00)01661-6.

    Article  CAS  PubMed  Google Scholar 

  4. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005, 33: D39-D45. 10.1093/nar/gki062.

    Article  CAS  PubMed  Google Scholar 

  5. Horn F, Lau AL, Cohen FE: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics. 2004, 20: 557-568. 10.1093/bioinformatics/btg449.

    Article  CAS  PubMed  Google Scholar 

  6. Frenz CM: Pro Perl Parsing. 2005, New York: Springer-Verlag

    Google Scholar 

  7. Rebholz-Schuhmann D, Marcel S, Albert S, Tolle R, Casari G, Kirsch H: Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 2004, 32: 135-142. 10.1093/nar/gkh162.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ansar M, Din MA, Arshad M, Sohail M, Faiyaz-Ul-Haque M, Haque S, Ahmad W, Leal SM: A novel autosomal recessive non-syndromic deafness locus (DFNB35) maps to 14q24.1–14q24.3 in large consanguineous kindred from Pakistan. Eur J Hum Genet. 2003, 11: 77-80. 10.1038/sj.ejhg.5200905.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lalwani AK, Jackler RK, Sweetow RW, Lynch ED, Raventos H, Morrow J, King MC, Leon PE: Further characterization of the DFNA1 audiovestibular phenotype. Arch Otolaryngol Head Neck Surg. 1998, 124: 699-702.

    Article  CAS  PubMed  Google Scholar 

  10. Zou D, Silvius D, Rodrigo-Blomqvist S, Enerback S, Xu PX: Eya1 regulates the growth of otic epithelium and interacts with Pax2 during the development of all sensory areas in the inner ear. Dev Biol. 2006, 298: 430-441. 10.1016/j.ydbio.2006.06.049.

    Article  CAS  PubMed  Google Scholar 

  11. Bolz H, Bolz SS, Schade G, Kothe C, Mohrmann G, Hess M, Gal A: Impaired calmodulin binding of myosin-7A causes autosomal dominant hearing loss (DFNA11). Hum Mutat. 2004, 24: 274-275. 10.1002/humu.9272.

    Article  PubMed  Google Scholar 

  12. Verhoeven K, Van Laer L, Kirschhofer K, Legan PK, Hughes DC, Schatteman I, Verstreken M, Van Hauwe P, Coucke P, Chen A, Smith RJ, Somers T, Offeciers FE, Van de Heyning P, Richardson GP, Wachtler F, Kimberling WJ, Willems PJ, Govaerts PJ, Van Camp G: Mutations in the human alpha-tectorin gene cause autosomal dominant non-syndromic hearing impairment. Nat Genet. 1998, 19: 60-62. 10.1038/ng0598-60.

    Article  CAS  PubMed  Google Scholar 

  13. De Leenheer EM, Bosman AJ, Kunst HP, Huygen PL, Cremers CW: Audiological characteristics of some affected members of a Dutch DFNA13/COL11A2 family. Ann Otol Rhinol Laryngol. 2004, 113: 922-929.

    Article  PubMed  Google Scholar 

  14. McHugh RK, Friedman RA: Genetics of hearing loss: Allelism and modifier genes produce a phenotypic continuum. Anat Rec A Discov Mol Cell Evol Biol. 2006, 288: 370-381.

    Article  PubMed  Google Scholar 

  15. Hertzano R, Montcouquiol M, Rashi-Elkeles S, Elkon R, Yucel R, Frankel WN, Rechavi G, Moroy T, Friedman TB, Kelley MW, Avraham KB: Transcription profiling of inner ears from Pou4f3(ddl/ddl) identifies Gfi1 as a target of the Pou4f3 deafness gene. Hum Mol Genet. 2004, 13: 2143-2153. 10.1093/hmg/ddh218.

    Article  CAS  PubMed  Google Scholar 

  16. Parker LL, Gao J, Zuo J: Absence of hearing loss in a mouse model for DFNA17 and MYH9-related disease: the use of public gene-targeted ES cell resources. Brain Res. 2006, 1091: 235-242. 10.1016/j.brainres.2006.03.032.

    Article  CAS  PubMed  Google Scholar 

  17. Kharkovets T, Dedek K, Maier H, Schweizer M, Khimich D, Nouvian R, Vardanyan V, Leuwer R, Moser T, Jentsch TJ: Mice with altered KCNQ4 K+ channels implicate sensory outer hair cells in human progressive deafness. EMBO J. 2006, 25: 642-652. 10.1038/sj.emboj.7600951.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. van Wijk E, Krieger E, Kemperman MH, De Leenheer EM, Huygen PL, Cremers CW, Cremers FP, Kremer H: A mutation in the gamma actin 1 (ACTG1) gene causes autosomal dominant hearing loss (DFNA20/26). J Med Genet. 2003, 40: 879-884. 10.1136/jmg.40.12.879.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Morishita H, Makishima T, Kaneko C, Lee YS, Segil N, Takahashi K, Kuraoka A, Nakagawa T, Nabekura J, Nakayama K, Nakayama KI: Deafness due to degeneration of cochlear neurons in caspase-3-deficient mice. Biochem Biophys Res Commun. 2001, 284: 142-149. 10.1006/bbrc.2001.4939.

    Article  CAS  PubMed  Google Scholar 

  20. Marcotti W, Erven A, Johnson SL, Steel KP, Kros CJ: Tmc1 is necessary for normal functional maturation and survival of inner and outer hair cells in the mouse cochlea. J Physiol. 2006, 574: 677-698. 10.1113/jphysiol.2005.095661.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Xiao S, Yu C, Chou X, Yuan W, Wang Y, Bu L, Fu G, Qian M, Yang J, Shi Y, Hu L, Han B, Wang Z, Huang W, Liu J, Chen Z, Zhao G, Kong X: Dentinogenesis imperfecta 1 with or without progressive hearing loss is associated with distinct mutations in DSPP. Nat Genet. 2001, 27: 201-204. 10.1038/84848.

    Article  CAS  PubMed  Google Scholar 

  22. Donaudy F, Ferrara A, Esposito L, Hertzano R, Ben-David O, Bell RE, Melchionda S, Zelante L, Avraham KB, Gasparini P: Multiple mutations of MYO1A, a cochlear-expressed gene, in sensorineural hearing loss. Am J Hum Genet. 2003, 72: 1571-1577. 10.1086/375654.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Donaudy F, Snoeckx R, Pfister M, Zenner HP, Blin N, Di Stazio M, Ferrara A, Lanzara C, Ficarella R, Declau F, Pusch CM, Nurnberg P, Melchionda S, Zelante L, Ballana E, Estivill X, Van Camp G, Gasparini P, Savoia A: Nonmuscle myosin heavy-chain gene MYH14 is expressed in cochlea and mutated in patients affected by autosomal dominant hearing impairment (DFNA4). Am J Hum Genet. 2004, 74: 770-776. 10.1086/383285.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Van Laer L, Pfister M, Thys S, Vrijens K, Mueller M, Umans L, Serneels L, Van Nassauw L, Kooy F, Smith RJ, Timmermans JP, Van Leuven F, Van Camp G: Mice lacking Dfna5 show a diverging number of cochlear fourth row outer hair cells. Neurobiol Dis. 2005, 19: 386-399. 10.1016/j.nbd.2005.01.019.

    Article  CAS  PubMed  Google Scholar 

  25. Robertson NG, Cremers CW, Huygen PL, Ikezono T, Krastins B, Kremer H, Kuo SF, Liberman MC, Merchant SN, Miller CE, Nadol JB, Sarracino DA, Verhagen WI, Morton CC: Cochlin immunostaining of inner ear pathologic deposits and proteomic analysis in DFNA9 deafness and vestibular dysfunction. Hum Mol Genet. 2006, 15: 1071-1085. 10.1093/hmg/ddl022.

    Article  CAS  PubMed  Google Scholar 

  26. Palmada M, Schmalisch K, Bohmer C, Schug N, Pfister M, Lang F, Blin N: Loss of function mutations of the GJB2 gene detected in patients with DFNB1-associated hearing impairment. Neurobiol Dis. 2006, 22: 112-118. 10.1016/j.nbd.2005.10.005.

    Article  CAS  PubMed  Google Scholar 

  27. Masmoudi S, Charfedine I, Rebeh IB, Rebai A, Tlili A, Ghorbel AM, Belguith H, Petit C, Drira M, Ayadi H: Refined mapping of the autosomal recessive non-syndromic deafness locus DFNB13 using eight novel microsatellite markers. Clin Genet. 2004, 66: 358-364. 10.1111/j.1399-0004.2004.00311.x.

    Article  CAS  PubMed  Google Scholar 

  28. Fukushima K, Nagai K, Tsukada H, Sugata A, Sugata K, Kasai N, Kibayashi N, Maeda Y, Gunduz M, Nishizaki K: Deletion mapping of split hand/split foot malformation with hearing impairment: a case report. Int J Pediatr Otorhinolaryngol. 2003, 67: 1127-1132. 10.1016/S0165-5876(03)00193-9.

    Article  PubMed  Google Scholar 

  29. Verpy E, Masmoudi S, Zwaenepoel I, Leibovici M, Hutchin TP, Del Castillo I, Nouaille S, Blanchard S, Laine S, Popot JL, Moreno F, Mueller RF, Petit C: Mutations in a new gene encoding a protein of the hair bundle cause non-syndromic deafness at the DFNB16 locus. Nat Genet. 2001, 29: 345-349. 10.1038/ng726.

    Article  CAS  PubMed  Google Scholar 

  30. Pilipenko VV, Reece A, Choo DI, Greinwald JH: Genomic organization and expression analysis of the murine Fam3c gene. Gene. 2004, 335: 159-168. 10.1016/j.gene.2004.03.026.

    Article  CAS  PubMed  Google Scholar 

  31. Johnson KR, Gagnon LH, Webb LS, Peters LL, Hawes NL, Chang B, Zheng QY: Mouse models of USH1C and DFNB18: phenotypic and molecular analyses of two new spontaneous mutations of the Ush1c gene. Hum Mol Genet. 2003, 12: 3075-3086. 10.1093/hmg/ddg332.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Ernest S, Rauch GJ, Haffter P, Geisler R, Petit C, Nicolson T: Mariner is defective in myosin VIIA: a zebrafish model for human hereditary deafness. Hum Mol Genet. 2000, 9: 2189-2196. 10.1093/hmg/9.14.2189.

    Article  CAS  PubMed  Google Scholar 

  33. Zwaenepoel I, Mustapha M, Leibovici M, Verpy E, Goodyear R, Liu XZ, Nouaille S, Nance WE, Kanaan M, Avraham KB, Tekaia F, Loiselet J, Lathrop M, Richardson G, Petit C: Otoancorin, an inner ear protein restricted to the interface between the apical surface of sensory epithelia and their overlying acellular gels, is defective in autosomal recessive deafness DFNB22. Proc Natl Acad Sci USA. 2002, 99: 6240-6245. 10.1073/pnas.082515999.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ahmed ZM, Goodyear R, Riazuddin S, Lagziel A, Legan PK, Behra M, Burgess SM, Lilley KS, Wilcox ER, Riazuddin S, Griffith AJ, Frolenkov GI, Belyantseva IA, Richardson GP, Friedman TB: The tip-link antigen, a protein associated with the transduction complex of sensory hair cells, is protocadherin-15. J Neurosci. 2006, 26: 7022-7034. 10.1523/JNEUROSCI.1163-06.2006.

    Article  CAS  PubMed  Google Scholar 

  35. Odeh H, Hagiwara N, Skynner M, Mitchem KL, Beyer LA, Allen ND, Brilliant MH, Lebart MC, Dolan DF, Raphael Y, Kohrman DC: Characterization of two transgene insertional mutations at pirouette, a mouse deafness locus. Audiol Neurootol. 2004, 9: 303-314. 10.1159/000080701.

    Article  CAS  PubMed  Google Scholar 

  36. Shahin H, Walsh T, Sobe T, Abu Sa'ed J, Abu Rayan A, Lynch ED, Lee MK, Avraham KB, King MC, Kanaan M: Mutations in a novel isoform of TRIOBP that encodes a filamentous-actin binding protein are responsible for DFNB28 recessive nonsyndromic hearing loss. Am J Hum Genet. 2006, 78: 144-152. 10.1086/499495.

    Article  CAS  PubMed  Google Scholar 

  37. Wilcox ER, Burton QL, Naz S, Riazuddin S, Smith TN, Ploplis B, Belyantseva I, Ben-Yosef T, Liburd NA, Morell RJ, Kachar B, Wu DK, Griffith AJ, Riazuddin S, Friedman TB: Mutations in the gene encoding tight junction claudin-14 cause autosomal recessive deafness DFNB29. Cell. 2001, 104: 165-172. 10.1016/S0092-8674(01)00200-8.

    Article  CAS  PubMed  Google Scholar 

  38. Kanzaki S, Beyer L, Karolyi IJ, Dolan DF, Fang Q, Probst FJ, Camper SA, Raphael Y: Transgene correction maintains normal cochlear structure and function in 6-month-old Myo15a mutant mice. Hear Res. 2006, 214: 37-44. 10.1016/j.heares.2006.01.017.

    Article  CAS  PubMed  Google Scholar 

  39. Walsh T, Walsh V, Vreugde S, Hertzano R, Shahin H, Haika S, Lee MK, Kanaan M, King MC, Avraham KB: From flies' eyes to our ears: mutations in a human class III myosin cause progressive nonsyndromic hearing loss DFNB30. Proc Natl Acad Sci USA. 2002, 99: 7518-7523. 10.1073/pnas.102091699.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Albert S, Blons H, Jonard L, Feldmann D, Chauvin P, Loundon N, Sergent-Allaoui A, Houang M, Joannard A, Schmerber S, Delobel B, Leman J, Journel H, Catros H, Dollfus H, Eliot MM, David A, Calais C, Drouin-Garraud V, Obstoy MF, Tran Ba Huy P, Lacombe D, Duriez F, Francannet C, Bitoun P, Petit C, Garabedian EN, Couderc R, Marlin S, Denoyelle F: SLC26A4 gene is frequently involved in nonsyndromic hearing impairment with enlarged vestibular aqueduct in Caucasian populations. Eur J Hum Genet. 2006, 14: 773-779. 10.1038/sj.ejhg.5201611.

    Article  CAS  PubMed  Google Scholar 

  41. Delmaghani S, del Castillo FJ, Michel V, Leibovici M, Aghaie A, Ron U, Van Laer L, Ben-Tal N, Van Camp G, Weil D, Langa F, Lathrop M, Avan P, Petit C: Mutations in the gene encoding pejvakin, a newly identified protein of the afferent auditory pathway, cause DFNB59 auditory neuropathy. Nat Genet. 2006, 38: 770-778. 10.1038/ng1829.

    Article  CAS  PubMed  Google Scholar 

  42. Cho KI, Lee JW, Kim KS, Lee EJ, Suh JG, Lee HJ, Kim HT, Hong SH, Chung WH, Chang KT, Hyun BH, Oh YS, Ryoo ZY: Fine mapping of the circling (cir) gene on the distal portion of mouse chromosome 9. Comp Med. 2003, 53: 642-648.

    CAS  PubMed  Google Scholar 

  43. Shabbir MI, Ahmed ZM, Khan SY, Riazuddin S, Waryah AM, Khan SN, Camps RD, Ghosh M, Kabra M, Belyantseva IA, Friedman TB, Riazuddin S: Mutations of human TMHS cause recessively inherited non-syndromic hearing loss. J Med Genet. 2006, 43: 634-640. 10.1136/jmg.2005.039834.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Guipponi M, Vuagniaux G, Wattenhofer M, Shibuya K, Vazquez M, Dougherty L, Scamuffa N, Guida E, Okui M, Rossier C, Hancock M, Buchet K, Reymond A, Hummler E, Marzella PL, Kudoh J, Shimizu N, Scott HS, Antonarakis SE, Rossier BC: The transmembrane serine protease (TMPRSS3) mutated in deafness DFNB8/10 activates the epithelial sodium channel (ENaC) in vitro. Hum Mol Genet. 2002, 11: 2829-2836. 10.1093/hmg/11.23.2829.

    Article  CAS  PubMed  Google Scholar 

  45. Rodriguez-Ballesteros M, del Castillo FJ, Martin Y, Moreno-Pelayo MA, Morera C, Prieto F, Marco J, Morant A, Gallo-Teran J, Morales-Angulo C, Navas C, Trinidad G, Tapia MC, Moreno F, Del Castillo I: Auditory neuropathy in patients carrying mutations in the otoferlin gene (OTOF). Hum Mutat. 2003, 22: 451-456. 10.1002/humu.10274.

    Article  CAS  PubMed  Google Scholar 

Pre-publication history

Download references


I would like to thank Xiao Meng for her help in testing early versions of the PREP script.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christopher M Frenz.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

CMF is responsible for the study and manuscript in their entirety.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Frenz, C.M. Deafness mutation mining using regular expression based pattern matching. BMC Med Inform Decis Mak 7, 32 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: