 Research
 Open access
 Published:
EOESGC: predicting miRNAdisease associations based on embedding of embedding and simplified graph convolutional network
BMC Medical Informatics and Decision Making volume 21, Article number: 319 (2021)
Abstract
Background
A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNAdisease associations could provide us a root cause understanding of the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very timeconsuming and costly. Therefore, we come up with an efficient models to solve this challenge.
Results
In this work, we propose a deep learning model called EOESGC to predict potential miRNAdisease associations based on embedding of embedding and simplified convolutional network. Firstly, integrated disease similarity, integrated miRNA similarity, and miRNAdisease association network are used to construct a coupled heterogeneous graph, and the edges with low similarity are removed to simplify the graph structure and ensure the effectiveness of edges. Secondly, the Embedding of embedding model (EOE) is used to learn edge information in the coupled heterogeneous graph. The training rule of the model is that the associated nodes are close to each other and the unassociated nodes are far away from each other. Based on this rule, edge information learned is added into node embedding as supplementary information to enrich node information. Then, node embedding of EOE model training as a new feature of miRNA and disease, and information aggregation is performed by simplified graph convolution model, in which each level of convolution can aggregate multihop neighbor information. In this step, we only use the miRNAdisease association network to further simplify the graph structure, thus reducing the computational complexity. Finally, feature embeddings of both miRNA and disease are spliced into the MLP for prediction. On the EOESGC evaluation part, the AUC, AUPR, and F1score of our model are 0.9658, 0.8543 and 0.8644 by 5fold crossvalidation respectively. Compared with the latest published models, our model shows better results. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases.
Conclusion
The comprehensive experimental results show that EOESGC can effectively identify the potential miRNAdisease associations.
Background
As a kind of noncoding RNA (ncRNA), miRNA was once thought to be the medium of transcriptional noise from RNA to protein [1,2,3,4]. However, this idea was proved wrong, and it was verified that noncoding RNA plays an important role in various biological effects [1, 5]. MiRNA is endogenous, evolutionarily conserved single stranded ncRNA that regulates gene expression through complementary base pairing with corresponding target RNA (mRNA) sequences [6,7,8]. More and more studies had shown that miRNA was closely related to the generation of complex diseases, such as various cancers, diabetes, Alzheimer’s disease and other diseases [9,10,11,12,13]. In particular, miRNA act as oncogenes or tumor inhibitors in the generation and metastasis of some cancers, including breast cancer [11] and lung cancer [13]. An important goal of medical data modeling and classification is to make predictions based on training data and available features. Medical data sets with high dimensional feature space and relatively small sample numbers are key problems in machine learning tasks [14]. Therefore, more and more researchers hope to use intelligent models to predict the potential association between miRNA and disease based on the existing proven data of miRNA and disease. Most of the methods proposed so far rely on the hypothesis that functional similarity of miRNAs is associated with similar diseases [15]. The following are several methods for predicting miRNAdisease associations based on graph encoders, random walk, machine learning, and graph convolutional neural network.
Nowdays, graph neural networks have shown their superior performance, such as graph autoencoder. Ji et al. [16] proposed a semisupervised model (SVAEMDA), which was a novel feature learning approach to obtain their feature representations from an integrated set of miRNA and disease similarity networks. SVAEMDA used known miRNAdisease associations in the form of cascaded dense vectors to train predictors based on variable autoencoders. The reconstruction probability of predictors was used to measure the micronucleic miRNAdisease associations. In addition, the model did not need to use negative samples to reduce noise data. Zhang et al. [17] also proposed an unsupervised deep learning framework with variable autoencoder to predict miRNAdisease associations by constructing two spliced matrices as autoencoder (VAE) inputs where VAE learned the potential representation of input and reconstructed the data from the learned distribution. The association score of miRNAdisease was obtained by using the trained VAE model. Liu et al. [18] proposed a framework based on stacked autoencoder and XGBoost to predict the potential miRNAdisease associations (SMALF). This model differs from the two previous models as it used an autoencoder to extract miRNA and potential feature vectors of disease, rather than acting as a classifier. It used XGBoost to predict positional miRNAdisease associations. Ding et al. [19] proposed a new computational model based on variational graph autoencoder with matrix factorization (VGAMF) for miRNAdisease associations prediction. The innovation of this model is to use two autoencoders to obtain miRNA and disease feature representation on miRNA similarity network and disease similarity network respectively. This is something that no other model has used.
Secondly, motivated by word2vec, a random walk algorithm was used in the graph to obtain the sequence of nodes and thus the embedding representation of the nodes. Numerous studies had confirmed that the use of a random walk algorithm can effectively predict miRNAdisease associations. Niu et al. [20] constructed a prediction model based on the random walk and binary regression, which extracted the features of the miRNAs by restarting the random walk and used binary logistic regression to score the new miRNAdisease associations. Li et al. [21] proposed a threelayer heterogeneous network combined with a nonequilibrium random walk for the miRNAdisease associations’ prediction model (TCRWMDA). This model enabled the construction of a threelayer heterogeneous network, which enriched the information in the basic network and enabled the mining of more effective information between the networks. Dai et al. [22] proposed a double random walk based on a Logistic weighted profile to explore the miRNAdisease associations model (LWBRW). The special feature during the process of constructing this network. A logistic function was used to extract valuable information. Weighted known proximity (WKNKN) was used to preprocess the known association matrix, and the new miRNAdisease associations were inferred by double random walk on the miRNA network and the disease network using the LWBRW method.
Thirdly, traditional machine learning methods are simple but still have good results. The random forest algorithm had also made outstanding contributions in miRNAdisease associations prediction. Chen et al. [23] proposed a random forestbased method to predict the miRNAdisease associations (RFMDA), using feature selection based on positive and negative sample feature frequencies to reduce the dimension of the sample space. A random forest model was trained to obtain an association score between miRNA and disease. Later, Yao et al. [24] proposed an improved RF model (IRFMDA). Different from Chen’s multiattribute decision analysis method, this model utilized the importance score of RF variables to realize feature selection, which could effectively reduce the influence of redundancy and noise information, and selected more valuable samples to represent samples, thus improving the prediction ability of the model. Zheng et al. [25] proposed a machine learning approach (MLMDA) to predict and verify miRNAdisease associations by integrating heterogeneous information sources. This model used the kmer sparse matrix to extract miRNA sequence information and other similarity information, which then implements an autoencoder to extract the most representative features of these features. In the end, random forest classifiers are deployed to predict miRNAdisease associations. Chen et al. [26] proposed a novel rankbased KNNbased miRNAdisease associations prediction calculation method (RKNNMDA) to predict potential miRNAdisease associations. Knearest neighbor (KNN) algorithm was used to search for miRNA and disease. Then the knearest neighbors were reordered according to the SVM sorting model. Finally, a weighted vote was conducted to obtain a final ranking of all possible miRNA disease associations.
Finally, graph convolutional neural networks have shown powerful advantages in the processing of complex graphs, which has led to an increasing number of researchers using graph convolutional neural networks to solve problems. Peng et al. [27] implemented a convolutional neural networkbased framework (MDACNN) for predicting miRNAdisease associations by combining similarities between miRNA, similarities between diseases, and interactions between proteins. Chu et al. [28] proposed a new graph sampling method by using feature graph and topology graph to identify miRNAdisease associations (MDAGCNFTG) through graph convolution. This method was modeled based on the potential associations of feature space and the structural relationship of miRNAdisease associations data where this model could predict not only new miRNAdisease associations but also new diseaserelated miRNAs under unbalanced sample distribution. Tang et al. [29] proposed a multiview and multichannel attention convolutional network to predict the potential miRNAdisease associations (MMGCN). GCN was used to extract miRNA and node features from different similarity views, and the model used node embedding learned from multichannel attentional enhancement to make association predictions. Li et al. [30] proposed a neural inductive matrix completion with a graph convolutional network (NIMCGCN) approach to predict miRNA disease association. First, a graph convolutional network (GCN) was used to learn miRNA and disease underlying feature representation. Then, the learned features were input into a new neural induced completion matrix (NIMC) model to generate the completion correlation matrix. The approach used supervised endtoend learning to effectively predict miRNAdisease associations.
In conclusion, most of the miRNAdisease associations’ prediction frameworks have been proposed using the embedding of a single model learning node. Both of them ignore the edge information of the Coupled heterogeneous graph, the edge between networks can act as supplementary information of nodes. This supplementary information is important because it makes potential feature more complete and accurate. The framework we have proposed is to fill that gap. We use the EOE model based on the link to learn edge features and add them into node embedding as supplementary information. The SGC model is used for information aggregation. By combining the two models, learning edge information and aggregating neighbor information enables each node embedding to contain richer information, which also lays the foundation for effective prediction of miRNAdisease potential associations.
Methods
We present a novel framework for predicting the potential miRNAdisease associations. As shown in Fig. 1, the framework consists of four steps in total:

The first step is to construct the coupled heterogeneous graph, where we use the disease similarity, miRNA similarity, and confirmed miRNAdisease association networks to construct the graph and remove the edges with less similarity to reduce the complexity of the graph.

The second step is using the linkbased node embedding modelEOE to add network edge information to node features.

The third step is to use the SGC model for feature aggregation to fully learn the structural information of the graph, and finally get the low dimensional embedding of the node.

The last step is to feed the final embedding splicing into the MLP for prediction.
Database
A coupled heterogeneous graph consists of two distinct but related subnets connected by internetwork edges [31]. Consists of two distinct but related subnets connected by internetwork edges. The term “different” implies that the vertices of the two subnetworks are of different node types. The term “correlation” implies that the vertices of two subnetworks have a particular interaction. To construct a miRNAdisease coupled heterogeneity graph, we downloaded data from the HMDD2.0 database [32] containing 495 miRNAs, 383 diseases, and 5430 confirmed miRNAdisease associations. We use the adjacency matrix A to represent miRNAdisease associations where \(A_{ij}=1\) means there is an interaction between miRNA i and disease j, while \(A_{ij}=0\) means there is no relationship. In the experiment stage, we used dbDEMC [33] and HMDD3.2 databases as the verification database to verify the accuracy of the EOESGC model we proposed.
Disease similarity network
We effectively combine disease semantic similarity with a disease Gaussian interaction profile kernel similarity to construct disease similarity network. To ensures edges among disease nodes are valid, we set a threshold and remove the link below the threshold. Therefore, the disease similarity is calculated as follows:
The first semantic similarity is \(DSS^{1}\), the second semantic similarity is \(DSS^{2}\), and the Gaussian interaction profile kernel similarity is DGS. In the experiment, \(\alpha\) represents a scaling factor. The disease similarity obtained after removing data with low similarity according to the threshold h:
Disease semantic similarity model 1
Medical subject headings (MESH) [34] is the authoritative subject list compiled by the United States National Library of Medicine. It is a normalized and expandable dynamic thesaurus. Mesh is a collection of more than 18,000 medical topics that we use to study the relationships between diseases. The disease can be described as a directed acyclic graph (DAG = \(N_{d}, E_{d}\)), where \(N_{d}\) is the nodeset of d and it’s ancestor nodes, \(E_{d}\) is edge set [35]. Figure 2 shows the DAG of two diseases.
To calculate the similarity of two disease semantics based on DAG(D), we need to calculate the semantic contribution score for each disease in the graph. We define the contribution score of disease d to disease D in DAG(D) as:
where \(\Delta =0.5\) is a decay factor indicating that the more distant nodes from disease D contribute less to the semantics of disease D. The semantic value of disease D is calculated based on the semantic contribution score of the disease nodes in DAG(D).
If DAG (A) and DAG (B) have same diseases, we consider disease A and disease B to be similar. Therefore, the first semantic similarity between two diseases is defined as:
Disease semantic similarity model 2
Xuan et al. [36] defined the essential difference between the second disease semantic similarity and the first disease semantic similarity which differs in the calculation of the semantic contribution of disease nodes. The ancestor nodes of disease D have d1 and d2, and if d1 appears less frequently in DAG than d2, then we believe that d1 has a greater semantic contribution to disease D. Therefore, the semantic contribution score of disease node d to disease D is defined as:
As in model 1, the semantic value of each disease and the semantic similarity of the two diseases are defined as:
Disease Gaussian interaction profile kernel similarity
Since not all the diseases can be found in the MESH, we use the disease Gaussian interaction profile kernel similarity (GIP) as a supplement. GIP similarity is calculated for miRNA and disease respectively using the method proposed by Zhao et al. [37]. The adjacency matrix \(A \in R^{m *n}\) of miRNAdisease, where each column is used to represent a disease, is defined as IP(D), where each column is defined as IP(D) to represent a disease. Then, the Gaussian interaction kernel similarity between diseases \(d_{i}\) and \(d_{j}\) is defined as:
where \(\gamma _{d}\) is used to control kernel bandwidth, \(\gamma _{d}^{\prime }\) is usually set to 0.5 for controlling the kernel bandwidth \(\gamma _{d}\) is defined as:
MiRNA similarity network
We use miRNA functional similarity and Gaussian interaction profile kernel similarity to construct miRNA similarity network. The Gaussian interaction profile kernel similarity is the same as in the previous section. miRNA similarity is defined as:
where \(\alpha\) is the scale factor, FS is the miRNA function similarity. We set a threshold value of h, in believing there is no association between miRNAs with a similarity less than h. Therefore, the final miRNA similarity network is defined as:
According to Wang et al. [35] study, miRNAs with similar functions are often associated with diseases with similar semantics, and the relationship between different diseases can be represented by a directed acyclic graph (DAG) structure. The functional similarity of miRNA is inferred by measuring the similarity of DAG of related diseases. Firstly, the similarity of disease \(d_{t}\) to the disease set DT is defined as:
If the disease set associated with \(m_{1}\) is \(DT_{1}\) and the disease set associated with \(m_{2}\) is \(DT_{2}\), then the functional similarity between and is defined as:
where \(d_{i}\) belongs to \(DT_{1}\), \(d_{j}\) to \(DT_{2}\), m is the number of diseases contained in \(DT_{1}\), and n is the number of diseases contained in \(DT_{2}\).
EOESGC model
We combine two embedding models to obtain the embedding of nodes. The first is the linkbased graph embedding modelEmbedding of Embedding model, which proposed a new graph type called coupled heterogeneous graph, and miRNAdisease network essentially belongs to this type. The EOE model emphasizes that linked vertices should be close to each other and unlinked vertices should be far away from each other. The latter rule is also important. Therefore, the model sets different loss functions to satisfy this rule. A harmony matrix M was proposed to calculate the proximity between different types of nodes. The linkbased embedding model can learn edge features of graph well and add them to node features as supplementary information, which is effective and easy to implement. Then, we input the obtained embedding and miRNAdisease association network into the simplified graph convolution network to continue learning node features. The nonlinear GCN [38] is transformed into a simple linear model SGC, which reduces the additional complexity of the GCN by repeatedly eliminating the nonlinearity between the GCN layers and folding the resulting function into a linear transformation. This simplified linear SGC model is more efficient on many tasks than GCN and some other GNN networks along with fewer parameters as well. And the embedding model based on convolution can effectively obtain the neighbor information of the node. The EOESGC model does not join the embedding of the two models but puts the embedding obtained from one model into the second model for training. The experiment proves that this method can effectively learn node embedding.
Embedding of embedding
The EOE uses proximity to measure whether there are links between nodes. The larger the degree of proximity is the more similar between two same types of nodes will be, and correlations between two different types of nodes will show. We input the similarity matrix of nodes as the original feature. So we define the proximity between two nodes of the same type as follows:
where \(d_{i}\) represents row i of the disease similarity matrix, \(d_{j}\) represents row j of the disease similarity matrix, \(m_{i}\) represents row i of the miRNA similarity matrix, \(m_{j}\) represents row j of the miRNA similarity matrix.
For different types of nodes, the feature matrix \(M \in R^{m*n}\) is introduced during the calculation of proximity since their features cannot be directly computed in different feature spaces. Thus the proximity between pairs of nodes of different types is defined as:
In order to satisfy that bounded nodes with small probability and boundless vertices with large probability should receive greater penalties. The loss function is defined as:
where \(E_{d}\) is the set of edges between diseases, \(E_{m}\) is the set of edges between miRNAs, \(E_{dm}\) is the edge set between disease and miRNA, \(W_{d}\) is the similarity matrix of disease, \(W_{m}\) is the similarity matrix of miRNA, and \(W_{dm}\) is the weight between disease and miRNA.
Simplifying graph convolutional network
In the traditional GCN, each layer can only aggregate the information of directly connected neighbors. while in SGC, we can set the information aggregation of Khop neighbors at each layer. SGC consists of two parts, a fixed feature extractor and a linear logistic regression classifier. In our proposed framework, only the feature extractor is used to obtain the embedded representation of nodes. Because miRNA and disease embedding learned from the EOE model still belong to two different feature spaces, they are first mapped to the same feature space.
We map diseases and miRNAs into the Z dimensional feature space as follows:
where \(x_{m}\) and \(x_{d}\) are miRNA embedding and disease embedding output by EOE, \(W^{M}\), \(W^{D} \in R^{Z}\) are the mapping matrices. Then, the feature embedding of the disease and miRNA are fed into the SGC. The convolution operation for each layer is as follows:
where A is the adjacency matrix of the graph, I is the identity matrix; D is the degree matrix of A, K is the step size.
Finally, the output disease embedding and miRNA embedding are spliced, and make predictions with MLP. This step uses the crossentropy loss function to optimize the model.
where y is the edge label, \(\tilde{y}\) is the predicted score.
Results
We combine EOE and SGC models to learn the embedding of nodes, and the two models are trained separately. The main purpose of the EOE model is to add edge information from the coupled heterogeneous graph to nodes, with the similarity matrix of miRNA and disease as the original feature input. The model mainly relies on the loss function to train the feature matrix of miRNA, the feature matrix of disease, and the harmony matrix M. For the construction of graph convolutional network, we adopt twolayer simplified graph convolutional layer construction, each layer gathers twohop neighbor information, namely K = 2, and the output dimension is 64. MLP consists of two fully connected layers, of which the first layer contains 64 neurons. The details are shown in Fig. 3.
Experimental approaches and evaluation criteria
To verify the validity of our proposed EOESGC model, we conduct experiments on the HMDD2.0 database and evaluate the model performance by using 5fold crossvalidation and 10fold crossvalidation. Considering the large difference in the number of positive and negative samples during the experiment, we randomly select 5 negative samples for each positive sample to form the experimental data, thus achieving the function of balancing the data set. The results are shown in Fig. 4. The AUC of our model for 5fold crossvalidation is 0.9658 and the AUPR is 0.8543, the AUC for 10fold crossvalidation is 0.9644 and the AUPR is 0.8540.
Comparisons with the stateoftheart methods
To prove the superiority of the proposed model, we compare it with several more excellent models recently proposed, which were LWPCMF [39], VAGMF [19], SMALF [18], CEMDA [40] ,and ICFMDA [41]. The average AUC of the 5fold crossvalidation is used as the evaluation index, and the results are shown in Table 1. Among them, the SMALF model has a better effect, which uses a stacked autoencoder to learn node features and achieves a better effect, with an AUC value of 0.9505. However, the effect of the EOESGC framework proposed by us is more outstanding, with an AUC value of 0.9658, 1.5\(\%\) higher than that of SMALF.
Parameter sensitivity analysis
Different embedding dimensions will lead to different model training speeds and costs. To select the optimal embedding dimension, we conduct 5fold crossvalidation experiments with different dimensions. The experimental results are shown in Fig. 5. When the embedding dimension is less than 64, the AUC, AUPR, F1score value shows an upward trend; when the embedding dimension is greater than 64, the evaluation indexes tend to be stable, but the training speed decreases significantly. Therefore, 64 is selected as the feature dimension of the node after comprehensive consideration.
Compare the different combination types
To verify the effectiveness of learning node embedding in the EOESGC combined model, we conducted an ablation experiment. There are two different kinds of experiments. Category 1 to verify the effectiveness of using the EOE model, we compared this step with the model of a simplified graph convolutional neural network. Category 2 is to verify the effectiveness of the combination of EOE and SGC embedded models. We also select the combination of the other three commonly used graph convolutional neural networks with EOE, namely GCN [35], TAG [42] ,and GraphSage [43]. As shown in Table 2, if edge information is not added as supplementary information for node embedding, the effect of SGC is poor. In addition, the EOE model has a poor combination effect with other commonly used convolution models. Therefore, the experimental results fully prove the validity of this framework.
Case study
Breast neoplasms are common cancers that threaten women’s health worldwide and are also one of the leading causes of nausea in women’s deaths [44]. In recent years, gene diagnosis and gene therapy of breast cancer has become a hot topic. Studies have shown that miRNA, as a regulatory factor, plays an important role. For example, low expression of mir195 can be easily observed in breast cancer cell lines and tissue samples from chemotherapysensitive or drugresistant patients [44]. In addition, mir195 can decrease the survival rate and increase apoptosis of breast tumor cells by downregulating the expression of Raf1, Bcl2 ,and Pglycoprotein [44]. Therefore, it is necessary to use advanced methods to predict the potential miRNA related to breast neoplasms, so we predict the top 20 miRNAs related to breast tumors, as shown in Table 3. All the miRNAs we predict can be found in the validation database.
Lung neoplasms are the most common type of nausea and have a high mortality rate. Previous studies have shown that miRNA is involved in almost every process of lung cancer, including tumor progression, angiogenesis, invasion ,and metastasis. For example, the expression level of miR29s was found to be inversely correlated with DNA methyltransferase 3A (DNMT3A) and DNA methyltransferase 3B (DNMT3B) in lung cancer tissues by controlling methylation to inhibit the reexpression of tumor suppressor genes and inhibit tumorigenesis [45]. The first 20 miRNAs associated with lung cancer were predicted using our proposed framework, as shown in Table 4, among which the first 19 miRNAs are successfully verified.
Conclusions
Experiments show that our proposed EOESGC framework can effectively predict the potential miRNAdisease associations. In the coupled heterogeneous graph, EOE is used to add edge information to node embedding, which makes node embedding contain richer and more comprehensive information. Then the SGC model is used to aggregate the node information. Finally, the results are predicted using MLP. We combine EOE and SGC models for the first time. The two models play different roles respectively, but their purpose is to learn the effective feature embedding of nodes. To simplify the computational complexity and ensure the edge validity in the coupled heterogeneous graph, we simplify the graph structure twice. The AUC value of EOESGC model based on 5fold crossvalidation is 0.9650, which is higher than that of previous methods. The top 20 associated potential miRNAs are predicted in lung and breast cancer cases.dbDEMC and HMDD3.2 databases are used in the validation database, and 20, 19 miRNAs are identified in the validation database. Therefore, the EOESGC framework is very effective for predicting the potential miRNAdisease associations.
Although our proposed framework can effectively predict the miRNAdisease potential association, we cannot predict the miRNAs associated with new diseases. If the original data does not contain the known miRNAs of the disease, we cannot predict the unknown miRNAs. Therefore, in the next step, we need to solve the problem of how to effectively predict the potential miRNAs of new diseases.
Availability of data and materials
The datasets used and/or analysed during the study is available from the corresponding author on reasonable request. Data can be downloaded from the Human miRNA Disease Database: http://www.cuilab.cn/hmdd/.
References
Chen X. Constructing IncRNA functional similarity network based on IncRNAdisease associations and disease semantic similarity. Sci Rep. 2015;5:11338.
Chen G, Wang Z, Wang D, Qiu C, Liu M, Xing C, Zhang Q, Yan G, Cui Q. Lncrnadisease: a database for longnoncoding RNAassociated diseases. Nucleic Acids Res. 2013;D1:983–6.
Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136(4):629–41.
Esteller M. Noncoding RNAs in human disease. Nat Rev Genet. 2011;12(12):861–74.
Doench JG, Peterson CP, Sharp PA. The functions of animal microRNAs. Nature. 2004;7006(431):350–5.
Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin14. Cell. 1993;75:843–54.
Lee RC AV. An extensive class of small RNAs in Caenorhabditis elegans. Science. 2001;5543(294):862–4.
Mir SM, Rajasekaran P. Oncomirs“micrornas with a role in cancer”. Am Math Soc Contem Math. 1993;53–72.
Li CF. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6(11):857–66.
Huang Q, Gumireddy K, Schrier M, Sage CL, Nagel R, Nair S, Egan DA, Li A, Huang G, KleinSzanto AJ. The microRNAs mir373 and mir520c promote tumour invasion and metastasis. Nat Cell Biol. 2008;10(2):202–10.
Iorio MMV. Ferracin: MicroRNA gene expression deregulation in human breast cancer. Cancer Res. 2005;16(65):7065–70.
Latronico M, Catalucci D, Condorelli G. Emerging role of microRNAs in cardiovascular biology. Circ Res. 2007;101(12):1225–36.
Yanaihara N, Bowman E, Caplen N. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Res. 2006;8(66):189–98.
Rostami M, Forouzandeh S, Berahmand K, Soltani M. Integration of multiobjective PSO based feature selection and node centrality for medical datasets. Genomics. 2020;112(6):4370–84.
Chen X, Xie D, Zhao Q, You ZH. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20(2):515–39.
Ji C, Wang YT, Gao Z, Li L, Zheng CH. A semisupervised learning method for MiRNAdisease association prediction based on variational autoencoder. IEEE/ACM Trans Comput Biol Bioinform. 2021. https://doi.org/10.1109/TCBB.2021.3067338.
Zhang L, Chen X, Yin J. Prediction of potential miRNAdisease associations through a novel unsupervised deep learning framework with variational autoencoder. Cells. 2019;8(9):1040.
Liu D, Huang Y, Nie W, Zhang J, Deng L. Smalf: miRNAdisease associations prediction based on stacked autoencoder and XGBoost. BMC Bioinform. 2021;22(1):1–18.
Ding Y, Lei X, Liao B, Wu F. Predicting miRNAdisease associations based on multiview variational graph autoencoder with matrix factorization. IEEE J Biomed Health Inform. 2021. https://doi.org/10.1109/JBHI.2021.3088342.
Niu YW, Wang GH, Yan GY, Chen X. Integrating random walk and binary regression to identify novel miRNAdisease association. BMC Bioinform. 2019;20(1):1–13.
Yu L, Shen X, Zhong D, Yang J. Threelayer heterogeneous network combined with unbalanced random walk for miRNAdisease association prediction. Front Genet. 2019;10:1316–1316.
Dai LY, Liu JX, Zhu R, Wang J, Yuan SS. Logistic weighted profilebased birandom walk for exploring miRNAdisease associations. J Comput Sci Technol. 2021;36(2):276–87.
Chen X, Wang CC, Yin J, You ZH. Novel human miRNAdisease association inference based on random forest. Mol Ther Nucleic Acids. 2018;13:568–79.
Yao D, Zhan X, Kwoh CK. An improved random forestbased computational model for predicting novel miRNAdisease associations. BMC Bioinform. 2019;20:1–14.
Zheng K, You ZH, Wang L, Zhou Y, Li ZW. Mlmda: a machine learning approach to predict and validate microRNAdisease associations by integrating of heterogenous information sources. J Transl Med. 2019;17(1):1–14.
Chen X, Wu QF, Yan GY. RKNNMDA: rankingbased KNN for miRNAdisease association prediction. RNA Biol. 2017;14(7):952–62.
Peng J, Hui W, Bolin Q, Jianye C, Qinghua H. A learningbased framework for miRNAdisease association identification using neural networks. Bioinformatics. 2019;35(21):4364–71.
Chu Y, Wang X, Dai Q, Wang Y, Wei DQ. MDAGCNFTG: identifying miRNAdisease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform. 2021;6(22).
Tang X, Luo J, Shen C, Lai Z. Multiview multichannel attention graph convolutional network for miRNAdisease association prediction. Brief Bioinform. 2021;6(22).
Jin L, Sai Z, Tao L, Chenxi N, Zhuoxuan Z, Wei Z. Neural inductive matrix completion with graph convolutional networks for miRNAdisease association prediction. Bioinformatics. 2020;36(8):2538–46.
Xu L, Wei X. Embedding of embedding (EOE): joint embedding for coupled heterogeneous networks. ACM. 2017;9:741–9.
Yang L, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. Hmdd v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42(D1):1070.
Zhen Y, Fei R, Liu C, He S, Gang S, Qian G, Lei Y, Zhang Y, Miao R, Ying C. dbdemc: a database of differentially expressed miRNAs in human cancers. Bmc Genom. 2010;11(Suppl 4):1–8.
Lipscomb CE. Medical subject headings (mesh). Bull Med Libr Assoc. 2000;88(3):265–6.
Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNAassociated diseases. Bioinformatics. 2010;26(13):1644–50.
Ping X, Ke H, Guo M, Guo Y, Li J, Jian D, Yong L, Dai Q, Jin L, Teng Z. Correction: Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLOS ONE. 2013;8:e70204.
Zhao Yan, Chen Xing, Yin Jun. Adaptive boostingbased computational model for predicting potential miRNAdisease associations. Bioinformatics (Oxford, England). 2019;1(36):330–330.
Wu F, Zhang T, Souza A, Fifty C, Yu T, Weinberger KQ. Simplifying graph convolutional networks. 2019.
Yin MM, Cui Z, Gao MM, Liu JX, Gao YL. LWPCMF: Logistic weighted profilebased collaborative matrix factorization for predicting miRNAdisease associations. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(3):1122–9.
Liu B, Zhu X, Zhang L, Liang Z, Li Z. Combined embedding model for miRNAdisease association prediction. BMC Bioinform. 2021;22(1):1–22.
Chen X, Wang L, Jia Q, Guan NN, Li JQ. Predicting miRNAdisease association based on inductive matrix completion. Bioinformatics. 2018;24:4256–65.
Du J, Zhang S, Wu G, Moura J, Kar S. Topology adaptive graph convolutional networks. 2017.
Hamilton WL, Ying R, Leskovec J. Inductive representation learning on large graphs. 2017.
Ji W, Kim E. microRNAs in breast cancer: regulatory roles governing the hallmarks of cancer. Biol Rev. 2016;9(2):409.
Nicolson S. Marianne: the impact of comorbidity upon determinants of outcome in patients with lung cancer. Lung Cancer J Int Assoc Study Lung Cancer. 2015;87(2):186–92.
Acknowledgements
Thanks to PSC and WXZ for correcting the paper.
Funding
This work was supported by National Natural Science Foundation of China under Grant No. 61873281.
Author information
Authors and Affiliations
Contributions
ZY conceived the prediction method and wrote the paper, PSC, WXZ and QSB modified the paper, and ZY and WFY completed the code implementation. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
All authors declare that they have no competing interests as defined by BMC, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Pang, S., Zhuang, Y., Wang, X. et al. EOESGC: predicting miRNAdisease associations based on embedding of embedding and simplified graph convolutional network. BMC Med Inform Decis Mak 21, 319 (2021). https://doi.org/10.1186/s1291102101671y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1291102101671y