Entity recognition from clinical texts via recurrent neural network
- Zengjian Liu†1,
- Ming Yang†2,
- Xiaolong Wang1,
- Qingcai Chen1,
- Buzhou Tang1, 3Email author,
- Zhe Wang3 and
- Hua Xu4
© The Author(s). 2017
Published: 5 July 2017
Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts.
In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer – generates representation of each word of a sentence; LSTM layer – outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer – makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence.
Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems.
LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.
With rapid development of electronic medical record (EMR) systems, more and more EMRs are available for researches and applications. Entity recognition, one of the most primary clinical natural language processing (NLP) tasks, has attracted considerable attention. As a large number of various types of entities widely exist in clinical texts, studies on entity recognition from clinical texts cover clinical entity recognition, clinical event recognition, protected health information recognition (PHI), etc. Compared to entity recognition in the newswire domain, studies on entity recognition in the clinical domain are slower initially.
The early entity recognition systems in the clinical domain are mainly rule-based, such as MedLEE , SymText/MPlus [2, 3], MetaMap , KnowledgeMap , cTAKES , and HiTEX . In the past several years, lots of machine learning-based clinical entity recognition systems have been proposed, may due to some publicly available corpora provided by organizers of some shared tasks, such as the Center for Informatics for Integrating Biology & the Beside (i2b2) 2009 , 2010 [9–13], 2012 [14–18] and 2014 track1 [19–23] datasets, ShARe/CLEF eHealth Evaluation Lab (SHEL) 2013 dataset , and SemEval (Semantic Evaluation) 2014 task 7 , 2015 task 6  2015 task 14 , and 2016 task 12  datasets. The main machine learning algorithms used in these systems are those once widely used for entity recognition in the newswire domain, including support vector machine (SVM), hidden markov model (HMM), conditional random field (CRF) and structured support vector machine (SSVM), etc. Among the algorithms, CRF is the most popular one. Most state-of-the-art systems adopt CRF. For example, in the 2014 i2b2 de-identification challenge, 6 out of 10 were based on CRF, including all top 4 systems. The key to the CRF-based systems lies in a variety of features, which are time-consuming.
In recent years, deep learning, which has advantages in feature engineering, has been widely introduced into various fields, such as image processing, speech recognition and NLP, and has shown great potential. In the case of NLP, deep learning has been deployed to tackle machine translation , relation extraction , entity recognition [31–35], word sense disambiguation , syntax parsing [37, 38], emotion classification , etc. Most related studies are limited to the newswire domain rather than other domains such as the clinical domain.
In this study, we comprehensively investigate entity recognition from clinical texts based on deep learning. Long-short term memory (LSTM), a representative variant of one type of deep learning method (i.e., recurrent neural network ), is deployed to recognize clinical entities and PHI instances in clinical texts. Specifically, we investigate the effects of two different types of character-level word representations on LSTM when they are used as parts of input of LSTM, and compare LSTM with CRF and other state-of-the-art systems. Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that: 1) each type of character-level word representation is beneficial to LSTM on entity extraction from clinical texts, but it is not easy to determine which one is better. 2) LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which outperforms CRF by 2.12%, 1.47% and 1.79% respectively. 3) Compared with other state-of-the-art systems, the LSTM-based system is considerably competitive.
The following sections are organized as: section 2 introduces RNN in detail, experiments and results are presented in section 3, section 4 discusses the experimental results and section 5 draws conclusions.
where σ is the element-wise sigmoid function, ☉is the element-wise product, i t , f t and o t are the input, forget, and output gates, c t is the cell vector, W i , W f , W c , W o (with subscripts: x, h and c) are the weight matrices for input x t , hidden state h t and memory cell c t respectively, and b i , b f , b c and b o denote the bias vectors.
A bidirectional LSTM is used to generate context representation at every position. Given a sentence s = w 1 w 2 …w n with each word w t (1 ≤ t ≤ n) represented by x t (i.e., concatenation of token-level and character-level representations of the word), the bidirectional LSTM takes a sequence of word representations x = x 1 x 2 …x n as input and produces a sequence of context representations h = h 1 h 2 …h n , where h t = [h ft T, h bt T]T (1 ≤ t ≤ n) is a concatenation of outputs of both forward and backward LSTMs.
Y(x (i)) denotes the set of possible label sequences for x (i).
It is clear that if interactions between successive labels are not considered, the inference layer will be simplified into a softmax output layer to classify each token individually.
In order to investigate the performance of LSTM on entity recognition from clinical texts, we start with two baseline systems: 1) a CRF-based system using rich features (denoted by CRF); 2) a LSTM-based system only using token-level word representations in the input layer (denoted by LSTM-BASELINE), then compare them with the LSTM-based systems using token-level word representations and two different types of character-level word representations. Moreover, we also compare the LSTM-based systems with other state-of-the-art systems. Three benchmark datasets from three clinical NLP challenges: i2b2 (the Center for Informatics for Integrating Biology & the Beside) 2010, 2012 and 2014 are used to evaluate the performance of all systems. Both 2010 and 2012 i2b2 NLP challenges have a subtask of clinical entity recognition, and the 2014 i2b2 NLP challenge have a subtask of PHI recognition.
Datasets and evaluation
Statistics of entity recognition datasets used in our study
Evaluation criteria for the three entity recognition tasks
Entities have the same boundary and same type.
Entities overlap and have the same type.
Entities overlap and have the same type.
Entities have the same boundary and same type.
“Exact” criterion at token-level.
Sentence split: separate sentences using ‘\n’, ‘.’, ‘?’ and ‘!’.
Tokenization: split sentences into tokens by blank characters at first, and then separate those tokens composed of more than two types of characters (letters, digitals and other characters) into smaller parts that only contains only one type of characters. For example, “4/16/91CPT Code:” is split into “4/16/91CPT” and “Code:” at first, and then further separated into ‘4’, ‘/’, “16”, ‘/’, “91”, “CPT”, “Code” and ‘:’.
Hyperparameters chosen for all our experiments
Dimension of token-level word representation
Dimension of character representation
Character-level LSTM size
Character-level CNN filter size
Character-level CNN filter number
Token-level LSTM size
Performances of LSTM and CRF-based models for the three tasks (F1-score %)
2010 i2b2 challenge (Concept Extraction)
2012 i2b2 challenge (Event Detection)
2014 i2b2 challenge (De-Identification)
LSTM + char-LSTM
LSTM + char-CNN
LSTM + char-LSTM + CNN
When one type of character-level word representations (i.e., character-level word representations generated by LSTM or CNN, denoted by char-LSTM and char-CNN respectively in Table 4) is added in the input layer as shown in Fig. 1, the performance of LSTM is slightly improved, LSTM considering char-LSTM (i.e., LSTM + char-LSTM) achieves a little better performance on the 2010 and 2012 i2b2 NLP challenge test sets, while the LSTM considering char-CNN (i.e., LSTM + char-CNN) achieves a little better performance on the 2014 i2b2 NLP challenge. No remarkable sign shows which character-level word representation is better. When both two types of character-level word representations are added, the performance of LSTM is not further improved. The highest F1-scores of LSTM are 85.81% and 92.91% under “exact” and “inexact” criteria on the 2010 i2b2 challenge test set, 92.29% and 86.94% under “span” and “type” criteria on the 2012 i2b2 challenge test set, and 94.37% and 96.67% under “exact” and “token” criteria on the 2014 i2b2 challenge test set.
Comparison of the performances of various systems on the three tasks (%)
LSTM + char-LSTM
Tang et al (2013) 
Bruijin et al (2011)* 
Kim et al (2015) 
Jiang et al (2011) 
LSTM + char-LSTM
Xu et al. (2013)* 
Tang et al. (2013) 
CRFs + SVM
Sohn et al. (2013) 
Aleksandar et al. (2013) 
LSTM + char-LSTM
Yang et al. (2015) 
He et al. (2015) 
Liu et al. (2015) 
CRFs + rule
Dehghan et al. (2015) 
CRFs + rule
In this study, we investigate the performance of LSTM on entity recognition from clinical texts. The LSTM-based systems achieves highest F1-scores of 85.81% under “exact” criterion on the 2010 i2b2 challenge test set, 92.29% under “span” criterion on the 2012 i2b2 challenge test set, and 94.37% under “exact” criterion on the 2014 i2b2 challenge test set, which are competitive with other state-of-the-art systems. The major advantage of the LSTM-based system is that it does not rely on a large number of hand-crafted features any more. Similar to previous studies in the newswire domain, LSTM shows great potential on entity recognition in the clinical domain, outperforming most traditional state-of-the-art methods that suffer from fussy feature engineering such as CRF.
Experiments shown in Table 4 demonstrate that any one type of the two character-level word representations is beneficial to entity recognition from clinical texts. The reason may lie in that both the two types of character-level word representations have ability to capture some morphological information of each word such as suffixes and prefixes, which cannot be captured by the token-level word representation that relies on word context. Then, when any one of the character-level word representations is added into the input layer of LSTM, errors like “Test” event “URINE” missed in “2014-11-29 05:11 PM URINE” and hospital “FPC” correctly identified in “… have a PCP at FPC …” but missed in “… Dr. Harry Tolliver, FPC cardiology unit …” are fixed.
Although the LSTM-based system shows better overall performance than almost all state-of-the-art systems mentioned in this study, but it does not show better performance on all types of entities. For example, the best system on the 2012 i2b2 challenge corpus (i.e., Xu et al. (2013) ) achieves better “span” F1-score than the LSTM-based system on “Test” events (94.16% vs 93.69%). The best system on the 2014 i2b2 challenge corpus (i.e., Yang et al. (2015) ) achieves better “exact” F1-score than LSTM-based system on “ID” instances (92.71% vs 91.94%). There are two main reasons: 1) the current LSTM-based system does not use knowledge bases widely existing in the clinical domain, but the other state-of-the-art systems take full advantages of them; 2) although the character-level word representation has ability to capture some morphological information of each word, it cannot cover morphological information of specific words such as fixed size digitals. Therefore, there are two possible directions for further improvement in our opinion: 1) How to integrate widely existing knowledge bases into the input of LSTM; 2) How to use LSTM to recognize entities in specific formats. We will try them in the future.
In recent months, a few studies on deep learning for entity recognition from clinical text are also proposed. For example, Abhyuday et al. proposed two RNN-based models for medical event detection on their own annotated dataset, one of which recognizes medical event detection as a classification problem and the other one as a sequence labeling problem [48, 49]. Both the two RNN-based models adopt traditional RNN, which is not as good as LSTM, and only take token-level word representation as their input. Franck et al. deployed a similar RNN model for the de-identification task on the 2014 i2b2 NLP challenge corpus and the MIMIC dataset . According to the experimental results reported in this study and the similar studies, we may conclude that our LSTM outperforms theirs. For example, the F1-score of the RNN model proposed by Franck et al. on the 2014 i2b2 dataset, as reported, is 97.85% under the binary HIPAA token criterion (only evaluating the HIPAA-defined PHI instances under “token” criterion). Under the same evaluation criterion, the corresponding F1-score of “LSTM + char-LSTM” is 98.05% on i2b2-2014 dataset. The results demonstrate that our LSTM outperforms RNN proposed by Franck et al . Therefore, the results reported in this study can be a new benchmark system based on deep learning methods.
In this study, we comprehensively investigate the performance of recurrent neural network (i.e., LSTM) on clinical entity recognition and protected health information (PHI) recognition. Experiments on the 2010, 2012 and 2014 i2b2 NLP challenge corpora prove that 1) LSTM outperforms CRF; 2) By introducing two types of character-level word representations into the input layer of LSTM, LSTM is further improved; 3) the final LSTM-based system is competitive with other state-of-the-art systems. Furthermore, we also point out two possible directions for further improvement.
This paper is supported in part by grants: National 863 Program of China (2015AA015405), NSFCs (National natural Science Foundations of China) (61573118, 61402128, 61473101, and 61472428), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20140508161040764, JCYJ20140417172417105, JCYJ20140627163809422 and JSGG20151015161015297), Innovation Fund of Harbin Institute of Technology (HIT.NSRIF.2017052) and Program from the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (93K172016K12). The publication costs have been covered by the NSFCs 61402128.
Availability of data and materials
The datasets that support the findings of this study are available from i2b2 challenges, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from i2b2’s website: https://www.i2b2.org/NLP/DataSets/Main.php upon reasonable application with a signed “data use and confidentiality agreement” to the program manager: Barbara Mawn (E-mail: Barbara_Mawn@hms.harvard.edu). The ethics approval would not been required as all the data have been De-Identified within the meaning of the Health Insurance Portability and Accountability Act of 1996 privacy regulations (HIPAA).
The work presented here was carried out in collaboration between all authors. Z.L., M.Y. and B.T. designed the methods and experiments, and contributed to the writing of manuscript. X.W., Q.C., Z.W. and H.X. provided guidance and reviewed the manuscript critically. All authors have approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
About this supplement
This article has been published as part of BMC Medical Informatics and Decision Making Volume 17 Supplement 2, 2017: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2016: medical informatics and decision making. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-17-supplement-2.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994;1:161–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain-Volume 3. Stroudsburg: Association for Computational Linguistics; 2002:29–36.Google Scholar
- Koehler SB. SymText: a natural language understanding system for encoding free text medical data. Salt Lake City: The University of Utah; 1998.Google Scholar
- Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17:229–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Denny JC, Irani PR, Wehbe FH, Smithers JD, Spickard III A. The KnowledgeMap project: development of a concept-based medical school curriculum database. In: AMIA Annu Symp Proc; 2003;2003:195–9.Google Scholar
- Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.View ArticlePubMedPubMed CentralGoogle Scholar
- Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:1.View ArticleGoogle Scholar
- Uzuner Ö, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc. 2010;17:514–8.View ArticlePubMed CentralGoogle Scholar
- Kim Y, Riloff E, Hurdle JF. A Study of Concept Extraction Across Different Types of Clinical Notes. In AMIA Annual Symposium Proceedings. San Francisco: American Medical Informatics Association; 2015:737–46.Google Scholar
- Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Med Inform Decis Mak. 2013;13:1.View ArticleGoogle Scholar
- Uzuner Ö, South BR, Shen S, DuVall SL. i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2010;2011(18):552–6.Google Scholar
- Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011;18:601–6.View ArticlePubMedPubMed CentralGoogle Scholar
- de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc. 2011;18:557–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J Am Med Inform Assoc. 2013;20:806–13.View ArticlePubMedPubMed CentralGoogle Scholar
- Xu Y, Wang Y, Liu T, Tsujii J, Eric I, Chang C. An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J Am Med Inform Assoc. 2013;20:849–58.View ArticlePubMedPubMed CentralGoogle Scholar
- Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc. 2013;20:828–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Sohn S, Wagholikar KB, Li D, Jonnalagadda SR, Tao C, Elayavilli RK, Liu H. Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J Am Med Inform Assoc. 2013;20:836–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Kovačević A, Dehghan A, Filannino M, Keane JA, Nenadic G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc. 2013;20:859–66.View ArticlePubMedPubMed CentralGoogle Scholar
- Stubbs A, Kotfila C, Uzuner O. Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform. 2015;58:S11–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang H, Garibaldi JM. Automatic detection of protected health information from clinic narratives. J Biomed Inform. 2015;58:S30–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu Z, Chen Y, Tang B, Wang X, Chen Q, Li H, Wang J, Deng Q, Zhu S. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. J Biomed Inform. 2015;58:S47–52.View ArticlePubMedPubMed CentralGoogle Scholar
- He B, Guan Y, Cheng J, Cen K, Hua W. CRFs based de-identification of medical records. J Biomed Inform. 2015;58:S39–46.View ArticlePubMedPubMed CentralGoogle Scholar
- Dehghan A, Kovacevic A, Karystianis G, Keane JA, Nenadic G. Combining knowledge-and data-driven methods for de-identification of clinical narratives. J Biomed Inform. 2015;58:S53–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Suominen H, Salanterä S, Velupillai S, Chapman WW, Savova G, Elhadad N, Pradhan S, South BR, Mowery DL, Jones GJ. Overview of the ShARe/CLEF eHealth evaluation lab 2013. In International Conference of the Cross-Language Evaluation Forum for European Languages. Berlin Heidelberg: Springer; 2013:212–31.Google Scholar
- Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. Semeval-2014 task 7: analysis of clinical text. SemEval. 2014;199:54.Google Scholar
- Bethard S, Derczynski L, Savova G, Savova G, Pustejovsky J, Verhagen M. Semeval-2015 task 6: clinical tempeval. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 2015. p. 806–14.View ArticleGoogle Scholar
- Elhadad N, Pradhan S, Chapman W, Manandhar S, Savova G. SemEval-2015 task 14: analysis of clinical text. In: Proc of Workshop on Semantic Evaluation Association for Computational Linguistics. 2015. p. 303–10.Google Scholar
- Bethard S, Savova G, Chen W-T, Derczynski L, Pustejovsky J, Verhagen M. Semeval-2016 task 12: clinical tempeval. Proceedings of SemEval 2016:1052-62.Google Scholar
- Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:14091259 2014.Google Scholar
- Zeng D, Liu K, Lai S, Zhou G, Zhao J. Relation Classification via Convolutional Deep Neural Network. In: COLING. 2014. p. 2335–44.Google Scholar
- Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:160301354 2016.Google Scholar
- Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. arXiv preprint arXiv:160301360 2016.Google Scholar
- Chiu JP, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist. 2016;4:357–70.Google Scholar
- Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:150801991 2015.Google Scholar
- dos Santos C, Guimaraes V, Niterói R, de Janeiro R: Boosting named entity recognition with neural character embeddings. In Proceedings of NEWS 2015 The Fifth Named Entities Workshop. 2015: 25Google Scholar
- Chen X, Liu Z, Sun M. A Unified Model for Word Sense Representation and Disambiguation. In EMNLP. Doha: Citeseer; 2014:1025–35.Google Scholar
- Chen D, Manning CD. A Fast and Accurate Dependency Parser using Neural Networks. In: EMNLP. 2014. p. 740–50.Google Scholar
- Collobert R. Deep Learning for Efficient Discriminative Parsing. In: AISTATS. 2011. p. 224–32.Google Scholar
- Ng H-W, Nguyen VD, Vonikakis V, Winkler S: Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. New York: ACM; 2015:443–9.Google Scholar
- Goller C, Kuchler A: Learning task-dependent distributed representations by backpropagation through structure. In Neural Networks, 1996, IEEE International Conference on. IEEE; 1996: 347-52.Google Scholar
- Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. Neural Comput. 2000;12:2451–71.View ArticlePubMedGoogle Scholar
- Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.View ArticlePubMedGoogle Scholar
- Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. ICML (3). 2013;28:1310–8.Google Scholar
- Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994;5:157–66.View ArticlePubMedGoogle Scholar
- Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems. 2013. p. 3111–9.Google Scholar
- LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86:2278–324.View ArticleGoogle Scholar
- Okazaki N. CRFsuite: a fast implementation of conditional random fields (CRFs). 2007. URL http://www.chokkan.org/software/crfsuite/.Google Scholar
- Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of NAACL-HLT. 2016. p. 473–82.Google Scholar
- Jagannatha A, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. arXiv preprint arXiv:160800612 2016.Google Scholar
- Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes with recurrent neural networks. arXiv preprint arXiv:160603475 2016.Google Scholar