Skip to main content

Detection of medical text semantic similarity based on convolutional neural network



Imaging examinations, such as ultrasonography, magnetic resonance imaging and computed tomography scans, play key roles in healthcare settings. To assess and improve the quality of imaging diagnosis, we need to manually find and compare the pre-existing reports of imaging and pathology examinations which contain overlapping exam body sites from electrical medical records (EMRs). The process of retrieving those reports is time-consuming. In this paper, we propose a convolutional neural network (CNN) based method which can better utilize semantic information contained in report texts to accelerate the retrieving process.


We included 16,354 imaging and pathology report-pairs from 1926 patients who admitted to Shanghai Tongren Hospital and had ultrasonic examinations between 1st May 2017 and 31st July 2017. We adapted the CNN model to calculate the similarities among the report-pairs to identify target report-pairs with overlapping body sites, and compared the performance with other six conventional models, including keyword mapping, latent semantic analysis (LSA), latent Dirichlet allocation (LDA), Doc2Vec, Siamese long short term memory (LSTM) and a model based on named entity recognition (NER). We also utilized graph embedding method to enhance the word representation by capturing the semantic relations information from medical ontologies. Additionally, we used LIME algorithm to identify which features (or words) are decisive for the prediction results and improved the model interpretability.


Experiment results showed that our CNN model gained significant improvement compared to all other conventional models on area under the receiver operating characteristic (AUROC), precision, recall and F1-score in our test dataset. The AUROC of our CNN models gained approximately 3–7% improvement. The AUROC of CNN model with graph-embedding and ontology based medical concept vectors was 0.8% higher than the model with randomly initialized vectors and 1.5% higher than the one with pre-trained word vectors.


Our study demonstrates that CNN model with pre-trained medical concept vectors could accurately identify target report-pairs with overlapping body sites and potentially accelerate the retrieving process for imaging diagnosis quality measurement.

Peer Review reports


Imaging examinations are common, as well as efficient, diagnostic tools in clinical practice worldwide. Radiologists or sonographers perform examinations, observe images and write reports for meaningful findings, conclusions and opinions. Imaging examinations are highly operator-dependent modality, and many factors influence the interpretation of the images, such as patients’ demographics, current health status and medical histories. There could be discrepancies in such complicated and heterogeneous information (e.g., the diagnosis in patient’s radiology report is different than the one his/her really has), which may lead to imprecise clinical decisions [1]. Although such discrepancies could be inevitable due to the complexity of imaging-diagnosis, quality measurement and improvement are still needed to minimize avoidable error via a manual verification process. A common objective and standardized verification process is to retrospectively compare the reports of prior imaging and follow-up pathology examinations [2]. However, only few patients receiving imaging examinations on certain body site will have surgical or pathologic biopsy on the same site. To find these patients, quality control staff will regularly and manually review electrical medical records (EMRs) and scan related examination reports, which is inefficient and time consuming. In this study, we propose a machine learning based approach to retrieve these patients from EMRs more efficiently.

Formally, we aim to predict which of the provided report pairs, imaging report and pathologic report, contain overlapping body sites or regions based on their textual semantic similarity. It is slightly different from conventional text similarity cases where researchers care about if two sentences have the same meaning [3, 4]. We care about similar “body sites”, site on the patient’s body where the anomaly has been detected, rather than similar syntax or semantics in general. For example, Table 1 shows a report-pair in our study with original Chinese and translated English. This report-pair contains overlapping body sites — parotid gland, but only the pathology report mentions “parotid gland” (腮腺), and the imaging report describes the condition about “maxillofacial region” (颌面部). Parotid gland is anatomically located in the maxillofacial regions, and thus, the report-pair has similar “body site” and should be picked up. In spite of their different forms, the methodology remains unchanged — the model should extract features from texts, calculate and judge if the pairs have enough common information with certain criteria, and then assign the pairs to certain nominal categories (match or mismatch).

Table 1 A report-pair in this study

We believe that developing a well-designed semantic similarity algorithm should consider three main aspects: textual features, algorithm and domain knowledge. Previous works using textual features to calculate similarity are mainly based on corpus-based methods such as bag-of-words and word embeddings. Bag-of-words model, including vector space model (VSM) [5], latent semantic analysis (LSA) [6], and latent Dirichlet allocation (LDA) [7], treats the entire text as a set of words and calculates the weight for each word, thus transforms them into real-valued vectors and then calculate the similarity on top of them [8,9,10]. These methods need handcrafted features and external lexical resources, which makes it difficult to apply in domains without too much readily available knowledge. Word embeddings are low-dimensional real-valued vectors trained from large-scale unlabeled text. Those vectors are able to capture semantic relationships among free text documents [4, 11].

Convolutional neural network (CNN) is a typical artificial neural network algorithm which could automatically learn, filter, cluster and combine features without much human effort. It was originally invented for computer vision [12] and subsequently been shown to be effective in natural language processing (NLP) tasks, such as sentence modeling [13], search query retrieval [14], and semantic parsing [15]. CNN models can nicely represent the hierarchical structures of sentences with their layer-by-layer convolutional kernel and pooling, so as to capture the semantic patterns at different layers [16]. CNN has also been demonstrated to be effective in capturing the semantic similarities between text pairs and thus able to perform well on text matching tasks [17, 18].

Moreover, domain ontologies contain lots of semantic relations which represent the body of knowledge. In the medical domain, there are a number of popular biomedical ontologies, such as MeSH (Medical Subject Headings) for indexing literature, the ICD taxonomy (International Classification of Diseases) for public health surveillance and billing purposes, and SNOMED-CT for aggregating medical terms across sites of healthcare. All these ontologies use graph structures [19] to represent the relationships among medical concepts. However, it is not straightforward to use this extra knowledge by conventional machine learning methods. Graph embedding technology embeds edge and node information of graphs into low dimensional dense vectors [20], and we believe it has a great potential to facilitate the utility of those ontologies.

In this study, we propose an end-to-end solution based on CNN to help physician and clinical quality control staff efficiently retrieve patients’ examination reports for imaging diagnosis verification process. The input of the model is imaging and pathology report-pairs from certain patients, and the output is the corresponding label indicating whether the report-pairs contain overlapping body sites. We compared accuracy of our model (with different word embedding methods) with conventional approaches such as keyword mapping, latent semantic analysis (LSA) [6], latent Dirichlet allocation (LDA) [7], Doc2Vec [21], Siamese LSTM [22] and a method based on named entity recognition (NER) [23]. Moreover, we further applied the LIME algorithm [24] to identify the features contribute most to the final results, and imporved the model interpretability.


Technical workflow

Figure 1 shows the workflow of identifying matching body sites from medical report-pairs. We removed all punctuations, numbers, and stop words from the raw report texts, then used Jieba,Footnote 1 a Chinese segmentation tool, to transform entire texts into sequence of words for CNN model training. The study and data use were approved by the Human Research Ethics Committees of Tongren Hospital, Shanghai Jiao Tong University, Shanghai, China.

Fig. 1

Workflow of detecting text semantic similarity

Data description

We included 4262 imaging reports and 2141 pathology reports from the EMRs of 1926 patients who admitted to Shanghai Tongren Hospital and had ultrasonic examinations between 1st May 2017 and 31st July 2017, which finally resulted in 16,354 report-pairs. All report texts were de-identified. Each pair contained two pieces of report, one is imaging report and the other is pathology report. Three physicians were recruited to annotate whether each report-pair contains overlapping body sites independently, the kappa coefficient between each pair of two physicians is 0.95, 0.95, 0.97 respectively. The overall rate of positive pairs (which contain overlapping body sites) was 14.8% (2415/13,939). We randomly split the data into 80% for training and 20% for testing.

CNN model for text similarity detection

The structure of our model (showed in Fig. 2) can be divided into three parts: input layer, feature extraction layer and fully connected layer.

Fig. 2

CNN-based neural network for text similarity detection

The input layer mapped each word into a dense vector (with 128 dimensions) and transformed each report into a dense matrix. Each dense vector represented the semantic information of corresponding word, the values of which could be updated during training. We used two strategies to initialize word vectors: randomly initialized vectors and pre-trained (word2vec model trained by skip-gram and negative sampling method) word vectors using Baidu Encyclopedia corpora obtained from github.Footnote 2 We set the window length to be 3, 4 and 5, for the convolution filters, and adopted 32 convolution filters for each window size. Then we applied max-pooling operations and obtained a new feature vector for two reports respectively. We concatenated the two feature vectors and passed it to a fully connected layer and a output layer to calculate the likelihood of containing overlapping body sites. We set cross-entropy as the loss function and performed mini-batch stochastic gradient descent to train the model.

Medical concept vectors using ontology-based graph embedding

We utilized graph embedding method as a third word vector initialization strategy to enhance the word representation by capturing the semantic relations information from medical ontologies. We used CMeSH (Chinese Medical Subject Headings), a Chinese version of MeSH, which contains about 391,892 medical concepts and 2,047,749 relations, to train our medical concept vectors. We randomly generated word sequences by sampling neighbor concepts along the edge of relation in CMeSH with a length of 10. The sampling process basically follows the procedure in node2vec [20], which was composed of two major steps: 1) for every node (medical concept) V, adding its direct (1st order) neighbors to the sampling set \( {\mathcal{M}}_V \); 2) let Vm be the m-th order neighbor of V and \( {V}_m^1 \) be the direct neighborhood of Vm, then we randomly sample one node from \( {V}_m^1 \) and add it to \( {\mathcal{M}}_V \). In our experiments, we set m to be 9 and sampled a word sequence for each node. And we feed the sequence set \( {\mathcal{M}}_V \) into word2vec model with skip-gram method [25] to train the medical concept vectors.

Model evaluation

We compared the performance of our CNN model with the following six baseline models:

  • Keyword mapping. We used the vocabulary from CMeSH as a medical dictionary to filter the original text. All words outside the dictionary were discarded and Jaccard similarity coefficient was calculated based on the key words remained in the two report texts

  • Latent Semantic Analysis (LSA). For this approach we collected all reports and construct bag-of-words representation vectors for each of them. Then singular value decomposition was performed on the matrix concatenating all bag-of-words vectors to reduce the dimensionality of the vector representations and cosine similarity was measured on those vectors from the reduced-dimension space

  • Latent Dirichlet Allocation (LDA). This approach constructed the bag-of-words representations for the reports. It assumed that each report was a mixture of a set of “topics” and each topic was a mixture of the set of words in the vocabulary. Cosine similarity was measured on their topic composition vectors.

  • Doc2Vec. Doc2Vec is an extension to the Word2Vec model [26], where a document vector is trained together with the word vectors in the continuous bag-of-words model. Cosine similarity was measured on the learned document vectors.

  • Siamese long short term memory (LSTM). Siamese LSTM is often used for text similarity systems. It uses two LSTM networks to encode two sentences respectively, then calculate Manhattan distance between the encoded hidden vectors to decide whether the two sentences are similar or not. The training process is supervised.

  • Named Entity Recognition (NER). We used another annotated Chinese clinical EMR corpus from Shanghai Tongren Hospital. This corpus contains 46,665 sentences and 89,231 entities of four types: symptoms, diseases, lab tests and body structures. We trained a DNN-based NER model with random initialized word embedding [23] and then adopted this model to identify all the entities in the original report texts. We only keep these entity words and construct bag-of-words representation vectors for each of the reports. Cosine similarity was measured on their entity representation vectors.

All models were trained on the training set and evaluated on the testing set. We performed receiver operating characteristic (ROC) curve analysis for each model and calculated the AUC score. We calculated precision, recall and F1-score based on the cutoff value equal to the ratio of the positive pairs in the whole dataset. Report-pairs with similarity score higher than the cutoff value will be labeled positive in all of our models. We used bootstrapping method with 50 times repeated samplings to estimate mean and standard deviation (std) of our model performances. Because of data imbalance, we reported both overall performance (marco average) and performance for each class group.

Model interpretability

We further explored the LIME algorithm to improve the interpretability of our model. LIME, proposed by Guestrin et al. [24], can be used to explain the predicted results of machine learning models. The basic idea of LIME algorithm is to define “interpretation” using another model, usually a linear model or a decision tree. We adopted LIME algorithm to identify which keywords from report-pairs our CNN model took to give final results. Specifically, for a given report-pair, we first fixed the content of imaging report and generated new samples of pathology report by randomly deleting words. Then, we trained a LIME model on the generated pairs and calculated the relative importance for each word in the pathology report. Similarly, we also fixed the content of the pathology report, randomly generated the pairs and trained another LIME model for the pathology report. We represented the relative importance of the keywords in a visual way.


Model performance

Table 2 showed both average and class-level performances for all models and Fig. 3 showed the corresponding ROC curves. The AUC score of our CNN models with both randomly initialized vectors and pre-trained word vectors were superior than that of any other baseline models, and gained approximately 3–7% improvement. In particular, the AUC score of CNN model with medical concept vectors was 0.8% higher than the model with randomly initialized vectors and 1.5% higher than the one with pre-trained word vectors. We have done t-test to the AUC results from 50 independent runs of CNN with or without pre-trained medical concept vectors and the p-value is smaller than 0.001, which suggests the improvement is significant. Not surprisingly, keyword mapping model had the worst performance among all models.

Table 2 Performance comparison of different models including Precision/Recall/F1-score
Fig. 3

ROC Curve of different models

LIME experiment

We randomly selected two report-pairs which contain overlapping body site in the test sets and process through LIME model. Table 3 shows the original text for two sample pairs (sample pair No. 1 and No. 2) and Table 4 shows the corresponding results. For sample pair No. 1, the importance scores of words, “fetal membrane” (胎膜) and “umbilical cord” (脐带) in the pathology report, and “fetus” (胎儿) and “fetal heart” (胎心) in the imaging report, were relatively high with a score of 0.15 and 0.14, 0.12 and 0.03 respectively. The result indicated that the existence of these four words might account for the positive judgement with a prediction of 0.77 by our CNN model. “Fetal membrane”, “umbilical cord” and “fetal heart” were all body structures contained by “fetus”, and our CNN model was able to automatically and reasonably extract semantic features from texts and make judgements. For sample pair No. 2, the word “Thyroid” (甲状腺) which exist in both reports, contributes most to the result, with a score of 0.16 and 0.19 respectively. “Tubercle” (结节) and “Glandular body” (腺体) are sub-structures of thyroid gland and also contributed much to the final result. The LINE algorithm could efficiently locate the most related words from text pairs and provide meaningful explanations of our model behaviors.

Table 3 The original text of selected samples
Table 4 Sample-level feature importance of sample pair 1 and 2 for both imaging and pathologic report provided by LIME algorithm


In this paper, we proposed a direct end-to-end CNN model to judge whether two reports contained matching body sites. Comparing with conventional language models based on handcrafted textual features (keywords and bag-of-words), automatically generated features (bag-of-words extracted by our NER model and word embeddings) and neural network model with LSTM structure, our CNN model provided more flexibility in exploring the semantic information contained in medical documents and yield better performance. In addition, we compared three strategies to generate word vectors for our CNN model: randomly initialized vectors, pre-trained word vectors and graph embedding and CMeSH based medical concept vectors. Our CNN model with medical concept vectors outperformed the other two methods and an significant improvement was observed.

Many factors might contribute to the advantage of CNN model. First, our CNN model is a supervised learning model and could automatically adapt feature representations to task objectives. For LSA, LDA and Doc2Vec, we learn feature representations in an unsupervised way, semantic information and co-occurrence relationship of words or characters are weakly correlated with current learning objectives. Second, our CNN model could extract syntactic and semantic information from both local semantic patterns and hierarchical structures of the sentences. For example, body sites could be described by physicians using anatomy terms and their relative locations. Thus, information at word-level or chunk-level is more important than information at sentence-level or document-level. This could explain why the performance of our CNN model was higher than Siamese LSTM model. Even though Siamese LSTM performs better on precision than CNN model with randomly initialized vectors, it’s F1-score was significantly lower than our CNN. Third, we used end-to-end training strategy, which updated feature representations and optimized weights simultaneously.

We used graph embedding method to utilize domain knowledge from CMeSH and gained a significant performance boost. CMeSH, just like other domain-specific ontologies, organizes and represents the body of knowledge using concepts and their relations. For example, concept “parotid gland” is a sub-class of concept “salivary glands”, concept “salivary glands” is a sub-class of both concepts “exocrine glands” and “mouth”, and concept “salivary glands” contains sub-class concepts “parotid gland”, “salivary ducts” and “sublingual gland”. In our study, the affiliation information of anatomy terms extracted by graph embedding method was quite useful to judge overlapping body sites, and could explain for the higher performance.

To validate that our model could correctly find related semantic or anatomy information and make judgement as expected, we used LIME algorithm and analyzed two concrete examples. From the results we could see our CNN model chose reasonable keywords as the basis to give the predictions. In real world, we can incorporate these explanations of model behaviors into the computer-aided decision supporting system so as to further remind the clinical quality control staff why our model give such results.

Our CNN model still has several limitations. We performed an error analysis of our model and find several typical mis-classifications. Table 5 shows two sample pairs from the analysis, sample pair No.3 is a case of false-negative and No.4 is a case of false-positive. Sample pair No.3 indicates that our model could not correctly identify spatial relationship between body structures. In sample pair No.3, the imaging report described a mass observed in the ventral side of the inferior pole of left kidney, and the pathologic report described a lesion from left adrenal gland. This report-pair does not share common anatomical terms, but both left kidney and adrenal gland are adjacent body structures in local anatomy space, and thus the report-pair shares common body site. Sample pair No.4 indicate that our model is insensitive to direction information. In sample pair No.4, both imaging report and pathology report described the lymph node of armpit, but the one of imaging report is from left armpit and the one of pathology report is from right armpit, and thus this report-pair share no common region. There are other limitations including: first, Chinese word segmentation using Jieba was imperfect and might induce errors in segmentations, especially for medical terms; second, there is no Chinese version of SNOMED-CT or UMLS (Unified Modeling Language), and thus we only performed graph embedding on CMeSH, which has relatively small number of concepts and relations; third, we only evaluated our model on Chinese medical reports, but it could be easily move onto other language scenarios without language-specific optimizations.

Table 5 Sample pairs from error analysis

In this paper we only focused on identifying whether a pair of reports contain overlapping body sites. We treated it as a binary classification problem, trained a CNN model and used graph embedding based on CmeSH ontology. In future, we could: first, consider it as a ranking problem, annotate and train a machine learning model to identify whether report A is more similar to report B than report C (e.g., by checking the number of overlapping body parts); second, try different graph embedding methods and combination of medical ontologies; third, validate the end-to-end architecture in other language tasks.

The proposed technique in this paper can be used for matching the reports of medical images from different resources and help better consolidate the heterogeneous patient clinical information and improve the efficiency of clinicians. Fundamentally, our study provides a generalizable architecture to detect information discrepancies from different sources of routinely collected clinical data. With the increasing secondary use of clinical data, many commercial software was developed for similar purpose but with different data sources and algorithms, for example, IBM Watson Imaging Clinical Review.Footnote 3 Moreover, as Wang et al. [27] have envisioned, improving the quality of clinical data is one key aspect to make artificial intelligence tools really useful in clinical practice. The effective consolidation of clinical data can help us better reconcile them, detect potential errors and thus improve the data quality.


In this paper we proposed a convolutional neural network-based model to identify report-pairs of imaging examinations and pathologic examinations which contain overlapping exam body sites by detecting semantic similarity. Our model exhibited superior performance compared to other conventional models such as key word mapping, LSA, LDA, Doc2Vec, Siamese LSTM and a method based on NER. We also leveraged graph embedding method to utilize external information from medical ontologies and gained further improvement. In addition, we adopted LIME algorithm to analyze our model behavior in a visible way. The results indicated our model was able to automatically and reasonably extract semantic features from texts and make accurate judgements. It could help retrieve patients or reports for imaging diagnosis quality measurement in a more efficient way.

Availability of data and materials

Although we obtained permission from the institutional ethics committee to use the data and enforced a degree of data de-privacy, we did not obtain informed consent from patients to disclose medical history data. Therefore, in China where the HIPAA Act does not exist, we cannot determine what extent the de-privatization of data sharing conforms to Chinese laws and regulations without the consent of patients. In view of legal risks, we hope not to share data publicly. But we will do our best to help other scholars who are interested in our research and hope to reproduce the results.


  1. 1.

  2. 2.

  3. 3.



Area under the ROC curve


Chinese Medical Subject Headings


Convolutional neural network


Document to vector method


Electrical medical records


International Classification of Diseases


Latent Dirichlet allocation


Latent semantic analysis


Long short term memory


Medical Subject Headings


Named Entity Recognition


Natural language processing


Vector space models


  1. 1.

    Brady AP. Error and discrepancy in radiology: inevitable or avoidable?[J]. Insights Imaging. 2017;8(1):171–82.

    Article  Google Scholar 

  2. 2.

    Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction[J]. Radiographics. 2015;35(6):1668–76.

    Article  Google Scholar 

  3. 3.

    He H, Gimpel K, Lin J. Multi-perspective sentence similarity modeling with convolutional neural networks[C]//proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; 2015. p. 1576–86.

    Google Scholar 

  4. 4.

    Ye X, Shen H, Ma X, et al. From word embeddings to document similarities for improved information retrieval in software engineering[C]//proceedings of the 38th international conference on software engineering. Austin: ACM; 2016:404–415.

  5. 5.

    Salton G, Wong A, Yang CS. A vector space model for automatic indexing [J]. Commun ACM. 1975;18(11):613–20.

    Article  Google Scholar 

  6. 6.

    Deerwester S, Dumais ST, Furnas GW, et al. Indexing by latent semantic analysis[J]. J Am Soc Inf Sci. 1990;41(6):391–407.

    Article  Google Scholar 

  7. 7.

    Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation [J]. J Mach Learn Res. 2003;3:993–1022.

    Google Scholar 

  8. 8.

    Yih W, Toutanova K, Platt J C, et al. Learning discriminative projections for text similarity measures[C]//proceedings of the fifteenth conference on computational natural language learning. Portland: Association for Computational Linguistics; 2011:247–256.

  9. 9.

    Guo Q. The similarity computing of documents based on VSM[C]//international conference on network-based information systems. Berlin: Springer; 2008. p. 142–8.

    Google Scholar 

  10. 10.

    Wang ZZ, He M, Du YP. Text similarity computing based on topic model LDA[J]. Computer science. 2013;40(12):229–32.

    Google Scholar 

  11. 11.

    Kusner M J, Sun Y, Kolkin N I, et al. From word Embeddings to document distances [C]//proceedings of the 32nd international conference on Machine Learning. 2015.

    Google Scholar 

  12. 12.

    Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]. Adv Neural Inf Proces Syst. 2012;25:1097–105.

    Article  Google Scholar 

  13. 13.

    Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014.

  14. 14.

    Shen Y, He X, Gao J, et al. Learning semantic representations using convolutional neural networks for web search[C]//proceedings of the 23rd international conference on world wide web. Seoul: ACM; 2014. p. 373–4.

  15. 15.

    Yih W, He X, Meek C. Semantic parsing for single-relation question answering[C]//proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2; 2014. p. 643–8.

    Google Scholar 

  16. 16.

    Hu B, Lu Z, Li H, et al. Convolutional neural network architectures for matching natural language sentences[C]//advances in neural information processing systems; 2014. p. 2042–50.

    Google Scholar 

  17. 17.

    Severyn A, Moschitti A. Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and de-velopment in information retrieval. Santiago: ACM; 2015. p. 373–82.

  18. 18.

    Yin W, Schütze H, Xiang B, Zhou B. Abcnn: attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Computational Linguis-tics. 2016;4:259–72.

    Article  Google Scholar 

  19. 19.

    Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z. Dbpedia: a nucleus for a web of open data. In: The semantic web Springer; 2007. p. 722–35.

    Google Scholar 

  20. 20.

    Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. 2016. arXiv:1607.00653. Accessed 06 Aug 2019

  21. 21.

    Le Q, Mikolov T. Distributed representations of sentences and documents. In: International conference on machine learning; 2014. p. 1188–96.

    Google Scholar 

  22. 22.

    Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity[C]//thirtieth AAAI conference on artificial intelligence; 2016.

    Google Scholar 

  23. 23.

    Wu Y, Jiang M, Lei J, Xu H. Named entity recognition in Chinese clinical text using deep neural network. Stud Health Technol Inform. 2015;216:624.

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining: ACM; 2016.

  25. 25.

    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS) 2013:3111–3119. Lake Tahoe, Nevada, United States.

  26. 26.

    Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 2013.

  27. 27.

    Wang F, Casalino LP, Khullar D. Deep learning in medicine—promise, Progress, and challenges. JAMA Intern Med. 2019;179(3):293–94.

    Article  Google Scholar 

Download references


We would like to thank Ken Chen (Physician/R&D director, Department of Medical AI, Synyi Research) for his comments and suggestions on our study.


Not applicable.

Author information




TZ and HM supervised the entire project. TZ and SZ conceptualized and designed the study. CF and XF collected the data. YG and SZ implemented the method. TZ and YG performed the data analysis and drafted the initial version. ML and YZ contributed to the design of the research, the interpretation of the data, and the reviewing and structuring of the work. FW proofread and revised the manuscript iteratively for improving the language. All authors edited the paper and gave final approval for the version to be published. TZ takes primary responsibility for the research reported here.

Corresponding author

Correspondence to Handong Ma.

Ethics declarations

Ethics approval and consent to participate

The study and data use were approved by the Human Research Ethics Committees of Tongren Hospital, Shanghai Jiao Tong University, Shanghai, China.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, T., Gao, Y., Wang, F. et al. Detection of medical text semantic similarity based on convolutional neural network. BMC Med Inform Decis Mak 19, 156 (2019).

Download citation


  • Text similarity
  • Convolutional neural network
  • LIME
  • Natural language processing