Skip to main content

Attention-based deep residual learning network for entity relation extraction in Chinese EMRs



Electronic medical records (EMRs) contain a variety of valuable medical concepts and relations. The ability to recognize relations between medical concepts described in EMRs enables the automatic processing of clinical texts, resulting in an improved quality of health-related data analysis. Driven by the 2010 i2b2/VA Challenge Evaluation, the relation recognition problem in EMRs has been studied by many researchers to address this important aspect of EMR information extraction.


This paper proposes an Attention-Based Deep Residual Network (ResNet) model to recognize medical concept relations in Chinese EMRs.


Our model achieves F1-score of 77.80% on the manually annotated Chinese EMRs corpus and outperforms the state-of-the-art approaches.


The residual network-based model can reduce the negative impact of corpus noise to parameter learning, and the combination of character position attention mechanism will enhance the identification features of different type of entities.


EMR is used by medical staff to record texts, symbols, charts, graphics, data, and other digital information generated by HIS (hospital information system). With the tremendous growth of the adoption of EMR, various sources of clinical information (including demographics, diagnostic history, medications, laboratory test results, and vital signs) are becoming available, which has established EMR as a treasure trove for large-scale analysis of health data. Unstructured medical text in EMR is one kind of narrative data, including clinical notes, surgical records, discharge records, radiology reports, and pathology reports. For the convenience of narration, we use EMR to represent unstructured EMR text in the following.

Identifying semantic relations existing among medical concepts in EMRs is of great importance to health-related various applications. These relations are hold between medical problems, tests, and treatments. Table 1 presents two examples of semantic relation, one of which is between medical concept e1=“cold” and e2=“fever” in sentence S1, and the other is between e1=“Head MRI” and e2=“lacunar infarction” in sentence S2.

Table 1 Examples of the relations between medical entities

On account of the importance of this subject, the 2010 i2b2/VA NLP challenge for clinical Records presented a relation classification task focused on assigning relation types between medical concepts in EMRs. Since then medical concept relation classification has being paid attention by more and more researchers.

In the traditional natural language processing (NLP) research, semantic relations between named entities can be used for many applications including knowledge graph construction, sentiment analysis, question answering, etc. [1], relation extraction or classification therefore has always been an important issue [2]. In previous open-domain entity relation extraction studies, researchers applied many different traditional machine learning models include Logistic Regression, SVM and CRF to recognize relations [37]. Li et al. used CRF model to reduce the space of possible label sequences and introducing long range features for relation recognition [8]. Mintz et al. put forward a remote monitoring relation classification method which could generate adequate training data by aligning text and knowledge base to solve the problem of lack of enough training data [9]. Socher et al. firstly employed recurrent neural network (RNN) on the task of relation extraction, while utilizing the syntactic structure information of sentences [10]. Miwa et al. proposed a neural network relation extraction architecture based on bidirectional LSTM and tree LSTM to encode entities and sentences simultaneously [11].

Drawing on these studies on open-domain relation extraction, similar task on EMRs was formally defined in the 2010 i2b2/VA Challenge Evaluation [12]. Some researchers proposed various models for relation classification of EMRs. Bruijn et al. used SVM to train multiple classifiers to deal with different relation categories, and improved the effect of classification [13]. Rink et al. use external dictionaries to increase the effect of entity relationship recognition [14]. Fang et al. extracted the relations from relevant articles of Chinese herbal medicine based on manually designed rules and created a relation database [15]. Zhou et al. utilized a bootstrapping framework to extract relations from the medical articles and created a knowledge base [16]. Li et al. raised an electronic health records relation classification model based on CNN-LSTM [17]. Overall, the existing models mainly focus on English EMR texts, and on the other hand it still cannot deliver satisfactory recognition performance. Concerning the increasing availability of digitalized Chinese EMRs, this paper addresses the semantic relation identification problem among medical concepts in Chinese EMRs. We propose an attention mechanism based deep residual network model to classify the medical entity relations in Chinese EMRs. Experimental results performed on a manually labeled Chinese EMR corpus show that our model achieved better performance with F1-score of 77.80% compared with other methods.


Our model is based on a CNN architecture as shown Fig. 1. The model consists of five parts: vector representation layer, convolution layer, residual networks layer, position attention layer and output layer.

Fig. 1
figure 1

The architecture of our relation extraction model

Character embedding

Given a Chinese sentence S=(c1,c2,…,cn) which contains two entities e1 and e2. Each character ci will be mapped to a low-dimensional dense vector \(V_{i} = (V_{w}^{i}, V_{p}^{i})\), in which \(V_{w}^{i}\) represents the character vector and \(V_{p}^{i}\) is the vector of character position in the sentence. The character embedding initialized with vector which is pre-trained by word2vec, and dw is the dimension of character vector.

Position embedding

Position embedding \(V_{p}^{i}\) is also a low-dimensional vector of character position in the sentence, which can combine the relative positions (see Fig. 2) of the current character to the first entity e1 as well as the second entity e2. Each relative position corresponds to a position embedding \(V_{p}^{i} \in R^{d_{p}}\), dp is the dimension of position embedding.

Fig. 2
figure 2

An example of the relative distance between an entity and a character. The relative distance of a character to medical entity “(cold)” and “(fever)” are 2 and -2 respectively

The vector \(\phantom {\dot {i}\!}V_{i} \in R^{d_{v}}\) is concatenation of character vector \(V_{w}^{i}\) and two position vectors, where dv=dw+2dp.


Convolution is to extract the effective local feature information from characters and their corresponding contexts. The Vj is a vector which corresponds the j-th character in the sentence S=(V1,V2,…,Vn), here n is the sentence length. We use filter \(\phantom {\dot {i}\!}W \in R^{h \times d_{v}}\) to extract local features from the sentence S. A feature cj is generated from a window of character Vj:j+h−1 by

$$ c_{j}=f(W \cdot V_{j:j+h-1}+b), $$

where b is a bias terms and f is a non-linear function. We apply dropout layer in convolution to prevent data from outfitting.

Residual networks

Residual learning connects low-level to high-level representations directly and solves the vanishing gradient problem, we superimposed the identity mapping function on a network. In our model, each residual convolution block (see Fig. 3) has two convolutional layers, each one followed by a ReLU activation, we use shortcut connection between each of the residue convolution block W1,W2Rh×1 are two convolution filters, where h is convolution kernel size. The first convolutional layer is

$$ {\tilde c_{j}} = f\left(W_{1} \cdot c_{j:j + h - 1} + b_{1}\right), $$
Fig. 3
figure 3

The residual convolution block

and the second is

$$ \hat c_{j} = f\left(W_{2} \cdot \tilde c_{j:j + h - 1} + b_{2} + c_{j}\right), $$

here b1, b2 are bias terms. The residual convolution block output is the vector \(\hat c_{j}\). This block will be multiply concatenated in our architecture by a shortcut connection.

Position attention

Recently attention mechanism has been widely used in machine learning, and great achievements have been made in various NLP problems. In this paper, we use the position attention to enhance relation extraction ability. Firstly, we carry the max-pooling operation on the residual learning result. Secondly, as shown in Fig. 1, we concatenate the max-pooling results with the position embedding of entity. Finally, we use the attention mechanism to balance the weight to the sentence.

$$ S_{i} = \sum\limits_{i} \alpha_{i} \times P_{i}, $$

where αi represents the attention weight. Pi is a result which concatenates the max-pooling results with the position embedding of entity. Finally, we use the softmax function to normalize and output entity relation probability.

$$\begin{array}{*{20}l} \alpha_{i} = \frac{\exp \left(e_{j}^{i}\right)}{{\sum\nolimits}_{k} {\exp \left(e_{j}^{k}\right)}} \end{array} $$


Dataset and evaluation metrics

On the basis of reference to medical semantic relation annotation specification of 2010 i2b2/VA Challenge, we established our own relation annotation specification of Chinese EMRs, in which semantic relations between medical concepts fall into five coarse-grained categories and fifteen fine-grained categories. All of relation category are detailed as follows.

Coarse-grained category 1: Treatment -Disease Relation. This category contains five fine-grained categories, including TrID (Treatment improves the disease), TrWD (Treatment worsens the disease), TrCD (Treatment causes the disease), TrAD (Treatment is administered for the disease), and TrNAD (Treatment is not administered because of the disease).

Coarse-grained category 2: Treatment -Symptoms Relation. This category also contains five fine-grained categories, including TrIS (Treatment improves the symptoms), TrWS (Treatment worsens the symptoms), TrCS (Treatment causes the symptoms), TrAS (Treatment is administered for the symptoms), and TrNAS (Treatment is not administered because of the symptoms).

Coarse-grained category 3: Test-Disease Relation. This category contains two fine-grained categories, including TeRD (Test reveals the disease) and TeCD (Test conducted to investigate the disease).

Coarse-grained category 4: Test-Symptoms Relation. This category also contains two fine-grained categories, including TeRS (Test reveals the symptoms) and TeBS (Test based on symptoms).

Coarse-grained category 5: Disease-Symptoms Relation. This category contains only one fine-grained category named as DCS (Disease causes symptoms).

According to our specification, we manually annotated 3000 de-identified Chinese EMR texts from different clinical departments of a grade-A hospital of second class in Gansu Province, China. 2000 medical texts are selected as training data, 500 medical texts as develop data, and 500 medical texts for test while evaluating our method on this dataset. The relation numbers of every fine-grained category in this dataset are given in Table 2. Precision, Recall and F1-score are used as evaluation metrics.

Table 2 The relation number of every fine-grained category in the corpus

Models and parameters

We carry out the experiments to compare the performance of our model with others described in the following.

CNN-Max: This model was used by Sahu, et al. [18], which encoded the sentence vectors with CNN, and outputted the results after max-pooling and softmax function.

BLSTM-Attention: This model was proposed by Li, et al. It mainly consists of bidirectional LSTM and attention mechanism [19].

ResNet-Max: This model was proposed by Huang, et al. Compared with our model, this model did not combined attention mechanism [20].

ResNet-BLSTM: The basic framework of the method is close to our model. The difference between this one with ours is that this model combine the residual network with Bi-LSTM.

ResNet-PAtt: This is the model presented in this paper. Table 3 gives the chosen hyper-parameters for all experiments. We tune the hyper-parameters on the development set by random search. We try to share as many hyper-parameters as possible in experiments.

Table 3 Hyper parameters of the residual neural network

Experimental results

Table 4 shows the overall classification performance of different models on our evaluation corpus. It can be seen that our method ResNet-PAtt is better than other methods in F1-score while precision, recall and F1-score reaches 79.16 and 77.80% respectively. Of all other methods, the model ResNet-BLSTM achieves the best performance on F1-score, and our model improves 2.97% F1-score compared with it, then our method is more effective. In addition, we can find that overall the residual network based methods are better than other relation extraction methods.

Table 4 Comparison of overall relation classification result of different model


The reasons our model achieves best performance maybe owe to that the residual network-based model could reduce the negative impact of corpus noise to parameter learning, and the combination of character position attention mechanism could enhance the identification information of different type of entities. Table 5 gives the classification performance of our model on every fine-grained relation category. As can be seen from these data, our model performs best on relation category TeRS and worst on category TrNAS, which shows that it is more difficult to recognize category TrNAS correctly. We also evaluate the training time of different models. Figure 4 shows that the consumed times by these models while epoch is set as 5, 10 and 20 respectively. Overall, our model takes the shortest time to complete parameter training, and the traditional machine learning method SVM takes the longest time to train.

Fig. 4
figure 4

Comparison of the training time for different model

Table 5 Classification performance of our model on every fine-grained relation category.

Table 6 is comparison of F1-score for each model on every fine-grained relation category. The model has better classification performance and faster response speed.

Table 6 Comparison of F1-score for each model on every fine-grained relation category


In this paper, we propose a deep residual network model based on the attention mechanism to classify the relation of entity pairs in Chinese EMRs. The method reduced the influence of data noise on the model training, and enhance entity discrimination feature with position attention mechanism so that the entity information can be combined effectively in the relation extraction. Experimental results show that the model reached 77.80% F1-score value, and significantly improved the classification performance of the few instance categories. At present, most relation classifications are based on entity recognition tasks and need to specify the entity in the sentence. In the future, we will study the joint extraction of entity and entity relation to further improve the efficiency of entity and entity relation recognition simultaneously.


  1. Lin Y, Liu Z, Sun M. Neural relation extraction with multi-lingual attention. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1. Vancouver: Association for Computational Linguistics: 2017. p. 34–43.

    Google Scholar 

  2. Zheng S, Wang F, Bao H, et al.Joint extraction of entities and relations based on a novel tagging scheme. 2017. arXiv preprint arXiv:1706.05075 [cs.CL].

  3. Takamatsu S, Sato I, Nakagawa H. Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Jeju Island: Association for Computational Linguistics: 2012. p. 721–9.

    Google Scholar 

  4. Miller S, Fox H, Ramshaw L, Weischedel R. A novel use of statistical parsing to extract information from text. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Seattle: Association for Computational Linguistics: 2000. p. 226–33.

    Google Scholar 

  5. Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Barcelona: Association for Computational Linguistics: 2004. p. 22–25.

    Google Scholar 

  6. Culotta A, McCallum A, Betz J. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics: 2006. p. 296–303.

  7. Wang T, Li Y, Bontcheva K, Cunningham H, Wang J. Automatic extraction of hierarchical relations from text. Eur Semantic Web Conf. 2006; 4011:215–29. Lecture Notes in Computer Science.

    Google Scholar 

  8. Li Y, Jiang J, Chieu H, Chai K. Extracting relation descriptors with conditional random fields. In: Proceedings of 5th International Joint Conference on Natural Language Processing. Chiang Maii: Asian Federation of Natural Language Processing: 2011. p. 392–400.

  9. Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2. Suntec: Association for Computational Linguistics: 2009. p. 1003–11.

    Google Scholar 

  10. Socher R, Huval B, Manning C, Ng A. Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Jeju Island: Association for Computational Linguistics: 2012. p. 1201–11.

    Google Scholar 

  11. Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures. 2016. arXiv preprint arXiv:1601.00770 [cs.CL].

  12. Uzuner Ö., South B, Shen S, DuVall S. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18:552–6.

    Article  Google Scholar 

  13. De B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc. 2011; 18:557–62.

    Article  Google Scholar 

  14. Rink B, Harabagiu S, Roberts K. Automatic extraction of relations between medical concepts in clinical texts. J Am Med Inform Assoc. 2011; 18:594–600.

    Article  Google Scholar 

  15. Fang Y, Huang H, Chen H, Juan H. TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining. BMC Complement Alternat Med. 2008; 8:58–9.

    Article  Google Scholar 

  16. Zhou X, Liu B, Wu Z, Feng Y. Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks. Artif Intell Med. 2007; 41:87–104.

    Article  Google Scholar 

  17. Li L, Zhao S. Relation Classification in Electronic Health Records via CNN-LSTM Model. In: Proceedings of the 3th China Health Information Processing Conference. Shenzhen: Chinese Information Processing Society of China: 2017. p. 40–6.

    Google Scholar 

  18. Sahu S, Anand A, Oruganty K, Gattu M. Relation extraction from clinical texts using domain invariant convolutional neural network. 2016. arXiv preprint arXiv:1606.09370 [cs.CL].

  19. Li L, Nie Y, Han W, Huang J. A Multi-attention-Based Bidirectional Long Short-Term Memory Network for Relation Extraction. Int Conf Neural Inf Process. 2017; 10638:216–27. Lecture Notes in Computer Science.

    Google Scholar 

  20. Huang Y, Wang W. Deep Residual Learning for Weakly-Supervised Relation Extraction. 2017. arXiv preprint arXiv:1707.08866 [cs.CL].

Download references


We thank the anonymous reviewers for their insightful comments.


Publication of the article is supported by the National Natural Science Foundation of China (NO. 61762081, No. 61662067, No. 61662068) and the Key Research and Development Project of Gansu Provincial (No. 2017GS10781).

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable requests.

About this supplement

This article has been published as part of BMC Medical Informatics and Decision Making Volume 19 Supplement 2, 2019: Proceedings from the 4th China Health Information Processing Conference (CHIP 2018). The full contents of the supplement are available online at URL.

Author information

Authors and Affiliations



ZZ and TZ had part in conceiving the study, and have substantially contributed in writing and revising the manuscript.YZ and YP analyzed the entity relation data regarding the Chinese electronic medical records, and was a major contributor in establishe annotation specification. All authors read and reviewed the final manuscript.

Corresponding author

Correspondence to Zhichang Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Zhou, T., Zhang, Y. et al. Attention-based deep residual learning network for entity relation extraction in Chinese EMRs. BMC Med Inform Decis Mak 19 (Suppl 2), 55 (2019).

Download citation

  • Published:

  • DOI: