Skip to content

Advertisement

  • Research
  • Open Access

Chemical-induced disease extraction via recurrent piecewise convolutional neural networks

Contributed equally
BMC Medical Informatics and Decision Making201818 (Suppl 2) :60

https://doi.org/10.1186/s12911-018-0629-3

  • Published:

Abstract

Background

Extracting relationships between chemicals and diseases from unstructured literature have attracted plenty of attention since the relationships are very useful for a large number of biomedical applications such as drug repositioning and pharmacovigilance. A number of machine learning methods have been proposed for chemical-induced disease (CID) extraction due to some publicly available annotated corpora. Most of them suffer from time-consuming feature engineering except deep learning methods. In this paper, we propose a novel document-level deep learning method, called recurrent piecewise convolutional neural networks (RPCNN), for CID extraction.

Results

Experimental results on a benchmark dataset, the CDR (Chemical-induced Disease Relation) dataset of the BioCreative V challenge for CID extraction show that the highest precision, recall and F-score of our RPCNN-based CID extraction system are 65.24, 77.21 and 70.77%, which is competitive with other state-of-the-art systems.

Conclusions

A novel deep learning method is proposed for document-level CID extraction, where domain knowledge, piecewise strategy, attention mechanism, and multi-instance learning are combined together. The effectiveness of the method is proved by experiments conducted on a benchmark dataset.

Keywords

  • Chemical-induced disease
  • Relation extraction
  • Deep learning
  • Convolutional neural network

Background

Nowdays, there is more and more literature published with rich domain knowledge. The first step to reuse literature is to extract biomedical information from literature. Chemical-induced disease (CID), which refers to adverse drug reactions, is a type of important information, which can be used for drug safety monitoring and medicine development [1], has attracted more and more attentions.

During the last decade, there have been a large number of methods proposed for CID extraction [2], which can be classified into three categories: 1) statistics-based methods, 2) rule-based methods, and 3) machine learning-based methods. The statistics-based methods determine CIDs according to the distributions of chemicals and diseases. For example, Chen et al. [3] discovered drug side effects by analyzing co-occurrences of drugs and adverse reactions in biomedical literature. Mao et al. [4] used a similar method to mine drug side effects from social media. The limitation of statistic-based methods lies in their low precision, although they usually achieves high recall. Khoo et al. [5] used manually-constructed graphical patterns derived from syntactic parse trees to extract causal relations between drugs and adverse events in MEDLINE abstracts. The rule-based methods usually need domain experts, constructing rules is time-consuming, and the manually-crafted rules are not easily applicable to other corpora. To increase generalizability of rules, Xu and Wang [6] provided a method to learn syntactic patterns from sentences containing known drug side effect pairs for drug side effect extraction from biomedical literature. The manchine learning-based methods are deployed for CID extaction due to some manually-annotated corpora, such as the corpus of the BioCreative V chemical-indcued disease relation (CDR) challenge [7] for CID extraction, are publically available. Support vector machine (SVM) is the most commonly used machine learning method. Xu et al. [8] won the BioCreative V CDR challenge using an SVM-based system. The feature engineering of the SVM-based system is terrible. To avoid fussy feature engineering, deep learning methods were applied to CID extraction [9], including convolutional neural networks (CNN) [10] and long short term memory neural networks (LSTM) [11]. In these systems, domain knowledge about adverse drug reactions, and some new techniques, such as piecewise strategy [12] and attention mechanism [13], widely used in other domains are not considered. Subsequently, Li et al. [14] adopted piecewise CNN to extract chemical-disease relations contained in intra-sentence and inter-sentence using a uniform model. Gu [15] improved the CNN model by adding syntactic information of cross-sentence, and the performance has been further improved. However, all these methods extract chemical-disease relations from single sentences or adjacent sentences. None of them consider document-level information. In a document, two entities usually do not appear only once, and it is difficult to determine which sentence or paragraph describes a relation or not. To facilitate efficient document-level relation extraction from biological text, Patrick [16] proposed Bi-affine Relation Attention Networks (BRAN), a combination of network architecture, multi-instance and multi-task learning. In this paper, we propose a novel document-level deep learning method for CID extraction, called recurrent piecewise convolutional neural networks (RPCNN). It should be noted that this paper is an extension of our previous paper [14].

Methods

Overview

There are usually two steps in chemical-induced disease extraction: 1) candidate generation – generating all possible related pairs of chemicals and diseases, denoted by <chemical, disease>; 2) candidate classification – determining whether each <chemical, disease> pair generated in the previous step is related.

Candidate generation

Given a biomedical record with m chemical mentions and n disease mentions, all m × n < chemical, disease> pairs can be recognized as candidates. In this study, we combine <chemical, disease> pairs that have the same chemical and disease identifiers together to form a candidate, denoted by <chemical identifier, disease identifier>. An example of candidate generation is shown in Table 1, where given a record with 2 chemical mentions (i.e., “terbutaline”×2) and 4 disease mentions (i.e., “Cardiovascular complications”, “cardiovascular complications”, “andpreterm labor”×2), as the two chemical mentions has the same MeSH (Medical Subject Headings) [17] identifier (i.e., D013726) and 4 disease mentions correspond to 2 MeSH identifiers (i.e, cardiovascular complications – D002318 and preterm labor – D007752), two candidates, that is, <D013726, D002318 > and < D013726, D007752>, are generated. Each candidate is a document-level candidate corresponding with multiple < chemical, disease> pairs, and each <chemical, disease> pair is an instance. Therefore, there are eight instances corresponding to two candidates in Table 1.
Table 1

An example of candidate generation (Literature with chemical and disease mentions and their identifiers)

Position

Mention

Label

Identifier (MeSH)

start

end

0

28

Cardiovascular complications

Disease

D002318

45

56

terbutaline

Chemical

D013726

71

84

preterm labor

Disease

D007752

93

121

cardiovascular complications

Disease

D002318

169

180

terbutaline

Chemical

D013726

185

198

preterm labor

Disease

D007752

Identifier (MeSH)

Chemical mention

 

Disease mention

 

position

mention

Position

mention

 

start

end

 

start

end

 

<D013726, D002318>

45

56

terbutaline

0

28

Cardiovascular complications

45

56

terbutaline

93

121

Cardiovascular complications

169

180

terbutaline

0

28

Cardiovascular complications

169

180

terbutaline

93

121

Cardiovascular complications

 

position

 

position

mention

 

start

end

 

Start

End

 

<D013726, D007752>

45

56

 

71

84

preterm labor

45

56

 

185

198

preterm labor

169

180

 

71

84

preterm labor

169

180

 

185

198

preterm labor

Cardiovascular complications associated with terbutaline treatment for preterm labor

Abstract: Severe cardiovascular complications occurred in eight of 160 patients treated with terbutaline for preterm labor. Associated corticosteroid therapy and twin gestations appear to be predisposing factors. Potential mechanisms of the pathophysiology are briefly discussed

Candidate classification

A four-layer recurrent piecewise convolutional neural networks (RPCNN) is proposed for CID extraction as shown in Fig. 1, where piecewise CNN (the same as Li et al. [14]) is used to represent each instance of a candidate, and RNN is used to combine representations of each candidate’s instances in a record together to obtain the document-level representation of the candidate.
Fig. 1
Fig. 1

Architecture of recurrent piecewise convolutional neural networks (RPCNN) for multi-instance learning

Input layer

Given a candidate, the corresponding multiple instances I0, I1, …, Im are arranged in descending order according to the length of context between the two entity mentions, which is measured by the number of words within the context. For each instance, we select the two entity mentions with context between them and context before or after them in the same sentence as the instance’s input. To distinguish chemical entity mentions and disease mentions, “<ENTC > ...</ENTC>” and “<ENTD> ... </ENTD>”, are further used to enclose them respectively. Then, an instance’s input is divided into three parts: 1) S− 1: context before the first entity mention (e.g., “Severe ... with” before “<ENTC> terbutaline </ENTC>” in Table 2); 2) S0: context between the two entity mentions (e.g., “for” in Table 2); and 3) S1: context after the second entity mention (e.g., “.” after “<ENTD> preterm labor </ENTD>” in Table 2). Each word of an instance’s input is represented by word embedding and embeddings of positions relative to chemcial and disease mentions (see Table 2). For convenience, the lengths of all instances’ inputs (i.e., numbers of words within inputs) are set to the maximum (denoted by l). For instances with short input, paddings are appended to their input to make up the difference. Given an instance <c, a > with input S = w1w2wl, suppose that the positions of c and a in S are pc and pa respectively, word wi can be represented by \( {x}_i={\left[{e}_{w_i}^{\mathrm{T}},{e}_{d_{ic}}^{\mathrm{T}},{e}_{d_{ia}}^{\mathrm{T}}\right]}^{\mathrm{T}} \), where \( {e}_{w_i\in \mid V\mid } \), \( {e}_{d_{ic}} \) and \( {e}_{d_{ia}} \)correspond to a dw-dimensional word embedding, a \( {d}_{p^c} \)-dimensional position embedding and a \( {d}_{p^a} \)-dimensional position embedding, dic = i − pc and dia = i − pa are relative distances from w to c and a respectively (−n + 1 ≤ dic, dia ≤ n − 1), and V is the word vocabulary. Then S = w1w2w3wl is represented by a matrix\( x=\left[{x}_1,{x}_2,\dots, {x}_l\right]\in {R}^{\left({d}_w+{d}_{p^c}+{d}_{p^a}\right)\times l} \).
Table 2

Example of chemical position and disease position

Piecewise convolutional layer

The convolutional layer takes the matrix of each instance’ input x, and generates high-level feature vectors by convolving filters at multiple scales across x, where the filtes need to be learnt. Given a filter of size k, \( t\in {R}^{\left({d}_w+{d}_{p^c}+{d}_{p^a}\right)\times k} \), for example, feature vector f = [f1, f2, …, fl − k + 1]TRl − k + 1 is generated by sliding filter t across S’s input x with a convolution operator (take the rectified linear unit function (Relu) for example) as follows:
$$ {f}_i= Relu\left(t\bullet {x}_{i:i+k-1}+b\right), $$
where xi : i + k − 1 = [xi, xi + 1, …, xi + k − 1]T is the context representation of wiwi + 1wi + k − 1 within a k-word window, and bR is a bias. Each filter corresponds to a high-level feature vector. Therefore, how many filters determines how many feature vectors we can obtain.
To reduce the spatial size of the representation of each instance, the number of parameters and computation, max pooling is adopted to select some important features from all the features generated in the convolutional layer:
$$ \overline{f_t}=\max \left\{{f}_{t,1},\kern0.5em {f}_{t,2},\dots, {f}_{t,l+k-1}\right\}, $$

where (ft, 1,  ft, 2, …, ft, l + k − 1) is the feature vector corresponding to filter t, and \( \overline{f_t} \) is the maximum feature. If there are q filters, we a new q- dimensional vector is generated to represent S, denoted by\( z={\left[\overline{f_1},\overline{f_2},\dots, \overline{f_q}\right]}^{\mathrm{T}} \). In addition, piecewise strategy that applies pooling to individual parts (i.e., S−1, S0 and S1), and concatenates the outputs of all pooling layers is also adopted in our study.

Before pooling, attention mechanism is used to measure feature importances for each class as follows:
$$ {\mathrm{G}}_t={f_t}^T\mathbf{M}\ {\boldsymbol{W}}^{\boldsymbol{classes}}, $$
$$ {A}_{i,j}=\frac{\exp \left({G}_{i,j}\right)}{\sum_{k=1}^n\exp \left({G}_{k,j}\right)}, $$

where G is a correlation matrix between features f for each filter t and relation class embedding Wclasses, M and Wclasses are weight matrix need to be learnt, A is an attention matrix, Ai, j and Gi, j are the (i, j)-th entry of A and G, respectively. We use a uniform distribution to initialize M, and an identity matrix to initialize Wclasses.

When the attention mechanism is adopted, the output of the pooling layer becomes:
$$ \overline{{f_{t,i}}^A}={\mathit{\max}}_j{\left({f}_tA\right)}_{i,j}, $$

where \( \overline{{f_{t,i}}^A} \) and (ftA)i, j are the i-th item of \( \overline{{f_t}^A} \) and the (i, j)-th item of ftA, respectively.

RNN layer

In this layer, RNN is used to model multiple instances of a candidate. For each instance Ii, the corresponding RNN cell takes the output of the piecewise convolutional layer (i.e., zi) and the previously hidden vector hi − 1 as input, and output hidden vector hi using a non-linear transformation function ρ, that is, hi = ρ(zi, hi − 1). The last hidden vector hm is used as the representation of multiple instances of a candidate, which is a document-level representation.

Softmax layer

In this layer, a fully connected neural network is used for classification. The neural network takes the following two parts as input: 1) hm from the RNN layer presented above; 2) features extracted from four domain knowledge bases, the same as Xu et al.’s system [8], as follows:
  1. (1)

    The CTD repository [18] that contains relationships between drugs and diseases, such as inferred-association, therapeutic, marker/mechanism, etc., manually summarized by experts.

     
  2. (2)

    The Drugs and Indications Database (MEDI) [19] that records common drugs with common indications.

     
  3. (3)

    SIDER (Drug Side Effects Database) [20] that records common drugs with common side effects.

     
  4. (4)

    Medical Subject Headings (MeSH) that records superordinate and inferior structural relationships between drugs and the diseases.

     
The one-hot features extracted from domain knowledges are first converted into dense features (denoted by v) by a 1-layer neural network. For candidate classification, we use the sigmoid function as follows:
$$ O\left({\boldsymbol{v}}^{\prime}\right)={\left(1+{e}^{\boldsymbol{u}\bullet {\boldsymbol{v}}^{\prime }}\right)}^{-1}, $$

where \( {\boldsymbol{v}}^{\prime }={\left[{\boldsymbol{h}}_{\boldsymbol{m}}^{\mathbf{T}},{\boldsymbol{v}}^{\mathrm{T}}\right]}^{\mathrm{T}} \), and u is a weight vector.

Dataset

Our method is evaluated on the CDR corpus of the BioCreative V challenge. This corpus contains 1500 manually annotated PubMed record, 1000 out of 1500 records are used as training and development sets, and the remainder 500 records as test set. In the training and development sets, there are 10,550 chemical mentions, 8426 disease mentions, corresponding to 3829 and 2973 MeSH identifiers respectively. and 2050 relations. In the test set, there are 5385 chemical mentions, 4424 disease mentions, corresponding to 1988 and 1435 MeSH identifiers respectively, and 1066 relations.

Experimental settings

We start with a simple CNN-based system which only selects the last instance of every candidate in the input layer and does not use any one of domain knowledge, piecewise strategy or attention mechanism as baseline, and then compares it with CNN-based systems gradually using them and RPCNN. In addition, our best CNN-based and RPCNN-based systems are also compared with other state-of-the-art systems using a single machine learning method. Precision (P), recall (R) and F-score (F) are used to measure performance of all systems, which are calculated by the official evaluation tool of the BioCreative V organizer.

10-fold cross-validation is used to optimize all hyperparameters of our system on the training and development sets. Finally, dw, \( {d}_{p^c} \) and \( {d}_{p^a} \) are set to 30, 5 and 5 respectively. CBOW is deployed to initialize word embeddings on a large-scale unannotated corpus from Medline, and position embeddings are initialized by a uniform distribution. Filters at scales of 3 and 4 are selected and the numbers of filters are both set to 150. In the RNN layer, we used LSTM cell with 150 hidden states as the RNN cell. In the softmax layer, we follow Srivastava ‘s work [21] to randomly drop out units from networks to prevent overfitting during training, and set the dropout probability to 0.25. The number of units of the neural network for knowledge feature conversion is set to 120.

Results

The precision, recall and F-score of the baseline system (CNN in Table 3, where the best performance in each column is in bold) are 50.47, 55.61 and 52.92%. Similar with [8], the CNN-based systems is significantly improved by the domain knowledge. Take the baselien system as an example, when the domain knowledge is added, the system’s F-score is improved by 15.72% (52.92% vs 68.64%). Both the piecewise strategy and attention mechanism are beneficial to the CNN-based systems and they are complementary to each other. For example, when the piecewise strategy is added into the baseline system (CNN + piecewise in Table 3), the system’s F-score increases from 52.92 to 54.20%, while when the attention mechanism is added to the baseline system before pooling (CNN + attention), the F-score slightly increases from 52.92 to 52.99%. When both the piecewise strategy and attention mechanism are together added to the baseline system (CNN + attention + piecewise), the system’s F-score is further improved to 55.94%. When the domain knowledge is added, the effects of piecewise strategy and attention mechanism decrease. For example, the F-score difference between CNN using domain knowledge and CNN + piecewise using domain knowledge is 0.39%, while the F-score difference between corresponding systems without using domain knowledge is 1.28%. Among all CNN-based systems, the system that using domain knowledge, piecewise strategy and attention mechanism achieves highest F-score, which is 69.09%. The RPCNN-based system (RPCNN) outperforms CNN + attention + piecewise. RPCNN without using domain knowledge achieves an F-score of 59.10%, higher than CNN + attention + piecewise by 3.16%, while RPCNN using domain knowledge achieves an F-score of 70.77%, which is higher than that of CNN + attention + piecewise by 1.68%.
Table 3

performance of our cnn-based and rpcnn-based systems for chemical-induced disease extraction

Methods

Without domain knowledge (%)

With domain knowledge (%)

P

R

F

P

R

F

CNN

50.47

55.61

52.92

63.70

74.40

68.64

CNN + piecewise

54.48

53.91

54.20

63.83

75.16

69.03

CNN + attention

48.40

58.54

52.99

62.28

76.58

68.69

CNN + attention+piecewise

57.80

54.20

55.94

59.97

81.49

69.09

RPCNN

55.17

63.63

59.10

65.24

77.21

70.77

Moreover, our best CNN-based and RPCNN-based systems are also compared with other state-of-art systems using a single machine learning method, including Xu et al.’s system developed for the CDR task of the BioCreative V challenge [8], Zhou et al.’s LSTM-based and CNN-based systems [9], Gu et al.’s CNN-based system [15] and Patrick et al.’s BRAN-based system. Table 4 list the results of comparison, where “/” denotes no result report, and the best performance in each column is in bold. Compared with Xu et al.’s system, our RPCNN-based system achieves much higher F-score no matter whether the domain knowledge is used. The difference between the systems without using domain knowledge is 5.21% (55.94% vs 50.73%), while that between the systems using domain knowledge is 3.61% (70.77% vs 67.16%). Compared with Zhou et al.’s systems, our RPCNN-based system also achieves much higher F-score. The F-score difference between our RPCNN-based system and Zhou’s systems arranges from 8.78 to 2.84%. Compared with Gu et al.’s system, though our CNN-based system does not perform better, our RPCNN-based system performs better by 1.90% in F-score. The Patrick et al.’s BRAN-based system achieves a higher F-score than our system by 3.00%, when it takes entity recogniton into account, which significantly improves the peformance of relation extraction. Without entity recognition multi-task objective, the BRAN-based’s F-score is only 55.50%.
Table 4

Comparison between our systems and other state-of-the-art systems

Methods

Without domain knowledge (%)

With domain knowledge (%)

P

R

F

P

R

F

Xu et al. [8]

59.60

44.00

50.73

65.80

68.57

67.16

Zhou et al. (LSTM) [9]

54.91

51.41

53.10

/

/

/

Zhou et al. (CNN) [9]

41.13

55.25

47.16

/

/

/

Gu et al. (CNN) [15]

59.70

55.00

57.20

/

/

/

Patrick et al. (BRAN) [16]

55.60

70.80

62.10

/

/

/

Our CNN

57.80

54.20

55.94

59.97

81.49

69.09

Our RPCNN

55.17

63.63

59.10

65.24

77.21

70.77

Discussion

In this paper, we propose RPCNN for CID extraction, where domain knowledge, piecewise strategy, attention mechanism and multi-instance learning are naturally combined. The RPCNN-based system on a benchmark corpus shows state-of-the-art performance.

Similar to previous studies on CNN-based relation extraction in other domains, the piecewise strategy and attention mechanism are effective in our CNN-based system. In our system, the attention mechanism makes it have the ability to handle some cases when the chemical mention is far away from the disease mention, especially they are not in one sentence. For example, a candidate < “AK”, “cisplatin” > with the context of “The primary outcome was acute kidney injury (<ENTD> AKI <ENTD>). RESULTS: We evaluated 143 patients who received single-agent <ENTC> cisplatin <ENTC>”, where S1is much longer and more complex than S−1 and S0, is wrongly labeled as 0 when without using the piecewise strategy, but correctly labeled as 1 when using the piecewise strategy. However, tackling the two types of cases above mentioned are still challenging. We evaluate the performance of our system (CNN + attention+piecewise in Table 3) on tackling cases when the chemical mention and disease mention are not in one sentence. The precision, recall, and F-score are only 53.15, 26.07 and 34.99% respectively.

Compared with CNN-based systems, our RPCNN-based system performs better. The main reason is that RPCNN provides a document-level representation for every candidate as all corresponding instances are considered, while CNN only selects one instance to represent a candidate by removing other instances where there may be different descriptions about relations.

There may be two limitations of our study: 1) chemical mentions and disease mentions themselves are ignored in the input layer. The chemcial and disease mentions may be helpful for CID extraction. In the future work, we will have a try to integrate chemical and disease mentions in the input layer for further improvement. 2) The effectiveness of our method is validated on an independent test set from the same resource (BioCreative V challenge), but not on latest papers. We will manually label a corpus from PubMed including latest papers as another separate test set for further validation.

Conclusion

In this paper, we propose a novel document-level deep learning method for CID extraction. The proposed method naturally combines domain knowledge, piecewise strategy, attention mechanism and multi-instance learning together. The effectiveness of the method is validated on a benchmark corpus, and the system based on the proposed method shows competitive performance with other state-of-the-art systems.

Notes

Abbreviations

CDR: 

Chemical-induced Disease Relation

CID: 

Chemical-induced Disease

CNN: 

Convolutional Neural Network

LSTM: 

Long Short Term Memory Neural Networks

RNN: 

Recurrent Neural Network

Declarations

Funding

This paper is supported in part by grants: National Natural Science Foundations of China (61573118, 61473101), Special Foundation for Technology Research Program of Guangdong Province (2015B010131010), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20160531192358466 and JCYJ20170307150528934) and Innovation Fund of Harbin Institute of Technology (HIT.NSRIF.2017052). This publication fee of this paper is supported by JCYJ20160531192358466. The funding agency was not involved in the design of this study, analysis and interpretation of data and the writing of the manuscript.

Availability of data and materials

The codes used in the experiments are now available at https://github.com/wglassly/CID_ATTCNN.

About this supplement

This article has been published as part of BMC Medical Informatics and Decision Making Volume 18 Supplement 2, 2018: Selected extended articles from the 2nd International Workshop on Semantics-Powered Data Analytics. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-18-supplement-2.

Authors’ contributions

HL, MY, QC and BT designed the study together. HL and QC performed the experiments. HL, MY and BT analyzed the results, HL and BT write the manuscript. XW and JY reviewed and edited the manuscript. All authors read and approved the manuscript.

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

(1)
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen, Guangdong, China
(2)
Shenzhen Calligraphy Digital Simulation Technology Engineering Laboratory, Harbin Institute of Technology, Shenzhen, Guangdong, China
(3)
Pharmacy Department, Shenzhen Second People’s Hospital, First Affiliated Hospital of Shenzhen University, Guandong, Shenzhen, China
(4)
Yidu Cloud (Beijing) Technology Co., Ltd, Beijing, China

References

  1. Kang N, Singh B, Bui C, Afzal Z, van Mulligen EM, Kors JA. Knowledge-based extraction of adverse drug events from biomedical text. BMC Bioinformatics. 2014;15(1):64.View ArticlePubMedPubMed CentralGoogle Scholar
  2. Zhou D, Zhong D, He Y. Biomedical relation extraction: from binary to complex. Comput Math Methods Med. 2014.Google Scholar
  3. Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease–drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2008;15(1):87–98.View ArticlePubMedPubMed CentralGoogle Scholar
  4. Mao JJ, Chung A, Benton A, Hill S, Ungar L, Leonard CE, et al. Online discussion of drug side effects and discontinuation among breast cancer survivors. Pharmacoepidemiol Drug Saf. 2013;22(3):256–62.View ArticlePubMedPubMed CentralGoogle Scholar
  5. Khoo CS, Chan S, Niu Y. Extracting causal knowledge from a medical database using graphical patterns. In: Proceedings of the 38th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics; 2000. p. 336–43.Google Scholar
  6. Xu R, Wang Q. Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature. J Biomed Inform. 2014;51:191–9.View ArticlePubMedPubMed CentralGoogle Scholar
  7. Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. 2016;2016:baw068.Google Scholar
  8. Xu J, Wu Y, Zhang Y, Wang J, Lee H-J, Xu H. CD-REST: a system for extracting chemical-induced disease relation in literature. Database. 2016;2016:baw036.View ArticlePubMedPubMed CentralGoogle Scholar
  9. Zhou H, Deng H, Chen L, Yang Y, Jia C, Huang D. Exploiting syntactic and semantics information for chemical–disease relation extraction. Database J Biol Databases Curation. 2016;Google Scholar
  10. Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. Adv Neural Inf Proces Syst. 2015;1:649–57.Google Scholar
  11. Liu P, Qiu X, Huang X.. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101. 2016.Google Scholar
  12. Zeng D, Liu K, Chen Y, Zhao J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks, in Proceedings of EMNLP 2015, Lisbon, Portugal, September; 2015:17–21.Google Scholar
  13. Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, et al. Attention-based bidirectional long short-term memory networks for relation classification. In: The 54th annual meeting of the Association for Computational Linguistics; 2016.Google Scholar
  14. H. Li, Q. Chen, B. Tang and X. Wang. “Chemical-induced disease extraction via convolutional neural networks with attention,” 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017. p. 1276–1279.Google Scholar
  15. Gu et al. Chemical-induced disease relation extraction via convolutional neural network. Database (Oxford). 2017;2017:bax024.Google Scholar
  16. Patrick Verga, Emma Strubell, Andrew McCallum. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL). 2018.Google Scholar
  17. Lipscomb CE. Medical subject headings (MeSH). Bull Med Libr Assoc. 2000;88(3):265.PubMedPubMed CentralGoogle Scholar
  18. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, et al. The comparative toxicogenomics database: update 2017. Nucleic Acids Res. 2017;45(D1):D972–8.View ArticlePubMedGoogle Scholar
  19. Wei WQ, Cronin RM, H X, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc. 2013;20:954–61.View ArticlePubMedPubMed CentralGoogle Scholar
  20. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44(Database issue):D1075–9. https://doi.org/10.1093/nar/gkv1075.View ArticlePubMedGoogle Scholar
  21. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.Google Scholar

Copyright

Advertisement