- Research
- Open Access

# An inference method from multi-layered structure of biomedical data

- Myungjun Kim
^{1}, - Yonghyun Nam
^{1}and - Hyunjung Shin
^{1}Email author

**17 (Suppl 1)**:52

https://doi.org/10.1186/s12911-017-0450-4

© The Author(s). 2017

**Published:**18 May 2017

## Abstract

### Background

Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels.

### Methods

To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer.

### Results

The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results.

### Conclusion

This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.

## Keywords

- Integrative inference on biomedical data
- Semi-supervised learning
- Semi-supervised learning for multiple networks
- Symptom-disease multi-layered network
- Disease co-occurrence prediction

## Background

Omics is a comprehensive study of a specific layer in a cellular system [1] and the molecular components in each layer constitute the biological system. These layers include genome, epigenome, transcriptome, metabolome, proteome, etc., and can further be extended to clinical/medical layers such as diseasome, drugs, and symptoms. There exist complex interactions between each layers, such as translation, transcription, and reactions, and such interactions allow us to view biological system as a multi-layered structure of omics. In recent years, there has been great advances in high throughput experimental techniques and brought influx of omics data including DNA sequence data, mRNA, miRNA, methylation patterns, etc [2]. While there had been many works concerning single layer of omics data, complex interactions between different layers hinder one from capturing comprehensive information on total system. Therefore, comprehensive analysis of multiple omics is required for more profound understanding of the total biological system [3]. One integrative approach for multiple levels of information that is receiving much attention is network-based or graph-based approach. A network or a graph concerning omics data consists of nodes and edges, where nodes represent biological components, such as genes or diseases, and edges represent relationships or interactions among them [4]. The main reason for the popularity of network-based analysis of biological system lies on the fact that the network structure can captures associations of biological components while managing large amount of data [5]. The network can vary from gene co-expression networks [6–9], protein networks [10–13], metabolic networks [14, 15], disease networks [16, 17], and many more, for single layered networks while multi-layered networks can be created by connecting the layers using data that reflects interactions between different layers [18].

Given a multi-layered network, one can extend the usage of such networks by implementing prediction process for finding traits (or labels) of interest with machine learning algorithms. While many traits have been discovered in numerous studies, there still remain a large room for finding more unknown traits of biological components. Instead of leaving unknown components in dark space, one can utilize both known and unknown components with semi-supervised learning. Semi-supervised learning (SSL), in general, deals with both labeled and unlabeled data where labeled data are given scarcely compared to vast amount of unlabeled data, and obtaining labels for unknown traits is costly. In this sense, SSL can serve as a cost-effective tool for prediction process [19]. For SSL in network setting [20–24], the key idea is the ‘label propagation’ [25] where known labels propagate to neighboring unlabeled data points through edges. Through label propagation and basic kernel of graphs using graph Laplacian [26], we obtain predictive values for unlabeled data, which we can utilize for prediction process for networks of biological systems.

In past works, there have been extensive studies incorporating SSL for various omics data. In [27–29] graph integration method, consisting of finding convex combination of graph Laplacians, is applied to four different types of yeast protein networks along with SSL to predict protein functions and also extends to protein function prediction by incorporating deletion process of noisy connections [30]. For more practical purpose on clinical data, [31–33] applies graph integration methods on multiple graphs from CNA, methylation, miRNA, and gene expression along with SSL to predict clinical outcomes of cancer. In [34], SSL schemes are applied to predict disease genes from protein-protein interaction network, constructed with multiple proteomics and genomic data. In [35], SSL was applied to predict synthetic genetic interactions from integrated network of protein-protein interaction, protein complex, and gene expression data. For inter-layer relationships, [36] provides algorithms for reconstructing intra-layer relations by utilizing SSL and inter-layer relations between different levels of genomic data. In [37], the authors provides miRNA-disease associations by utilizing SSL algorithm. In [38], SSL was applied to for disease comorbidity scoring for complemented disease network of metabolic disease group.

Most of the above works, however, only consider integrating multiple sources of data in parallel fashion, ignoring hierarchical, or vertical structure of multi-omics data. Furthermore, only few machine learning algorithms, including SSL, deals with networks of vertical structure. The purpose of the paper is to develop a semi-supervised learning algorithm for multiple layered networks that utilize matrix separation and graph integration method in vertical fashion. For biological systems, however, vast number of components in each layers and countless unknown relations between different layers cause issues of computational complexity and sparseness for analyzing with multi-layered networks. To alleviate the problems, we propose an efficient matrix inversion algorithm composed with Nyström method [39] and Woodbury formula [40]. The remainder of the paper is organized as the following. In Methods, we discuss graph based semi-supervised learning for multiple-layered networks. In Experiments and Results and Discussion, we present experimental results of the proposed algorithm that was applied to disease co-occurrence prediction problem on two layered network of symptom and disease.

## Methods

### Graph based semi-supervised learning

*G*(

*V, E*) which consists of nodes (

*V*) and edges (

*E*). Given a graph

*G*(

*V, E*) for

*n*data points, nodes represent data points with

*V*= {

*x*

_{1},

*x*

_{2}, …,

*x*

_{ n }_

*tween dats epresetn*}. and edges represent similarities between data points. The similarities are given by the weight matrix

*W*, where elements,

*W*

_{ ij }, of

*W*represent strength of connection between nodes

*x*

_{ i }and

*x*

_{ j }. The problem of semi-supervised learning on graph

*G*(

*V, E*) deals with labeled and unlabeled nodes where labeling is given by

*Y*= {

*Y*

_{ l },

*Y*

_{ u }} with

*Y*

_{ l }∈ {−1, 1} for labeled nodes and

*Y*

_{ u }= 0 for unlabeled nodes. Through learning process, we determine the output vector

*f*= (

*f*

_{1},

*f*

_{2}, …,

*f*

_{ n })

^{ T }using available information and minimizing the following quadratic cost functional [41]:

*L*is the graph Laplacian [26] defined as

*D–W*for

*D*=

*diag*(

*d*

_{ i }) and

*d*

_{ i }= ∑

_{ j }

*W*

_{ ij }. In (2), the first term is the loss term for consistency with initial labeling, the second term is the smoothness term for consistency with geometry of the data, and μ is a parameter for trade-off between the loss term and the smoothness term [41]. The solution to minimization problem (2) is given by:

### Semi-supervised learning for multi-layered biomedical data

*G*(

*V, E, S*), which consists of nodes (

*V*), edges (

*E*), and strata (

*S*). In addition to nodes and edges, strata in

*G*(

*V, E, S*) denote

*K*distinct layers with

*S*= {

*S*

_{1},

*S*

_{2}, …,

*S*

_{ K }}. Each

*G*(

*V, E, S*) contains intra- and inter-layer relations, where the former characterize relations between two nodes in same layer and the latter characterize relations between two nodes each of which belongs to different adjacent layer. Given a graph

*G*(

*V, E, S*) with

*K*number of layers and

*n*

_{ k }data points for each layer

*k*, the weight matrix

*W*is a

*N*×

*N*, where

*N*=

*n*

_{1}+

*n*

_{2}+ … +

*n*

_{ K }, block tri-diagonal matrix with 3

*K*− 2 non-zero blocks.

*K*symmetric diagonal blocks represent intra-layer relations and 2

*K*− 2 rectangular banded diagonal blocks represent inter-layer relations. Figure 1 depicts a multi-layered graph for three layers with structure of its corresponding weight matrix. An exemplary network would be a multi-layered network with

*S*

_{1},

*S*

_{2}, and

*S*

_{3}as symptoms, diseases, and proteins, respectively, in the context of disease co-occurrence prediction. To incorporate graph based semi-supervised learning into multi-layered omics systems, we first apply matrix separation on the weight matrix,

*W*, then implement graph integration method [28].

*W*in a multi-layered graph, let \( {W}^{\left\{{S}_p,\ {S}_q\right\}} \) be a matrix that only contains a sub-block of

*W*associated with stratum

*S*

_{ p }and

*S*

_{ q }, masking other blocks to zeros. Then, we have

*S*

_{ p }=

*S*

_{ q }denotes a sub-matrix for intra-layer relation of

*S*

_{ p }(or

*S*

_{ q }) and

*S*

_{ p }≠

*S*

_{ q }denotes a sub-matrix for inter-stratum relation of two different strata,

*S*

_{ p }and

*S*

_{ q }. Since effects of label propagation can be different for intra-layer and inter-layer connections, we want to look at them separately. Using (4), we have

*W*

^{{intra}}consists of

*K*diagonal blocks of intra-layer relations and

*W*

^{{inter}}consists of 2

*K*− 2 banded diagonal blocks of inter-layer relations. By accounting for different parameters

*μ*

_{ a }(≥0) and

*μ*

_{ b }(≥0) for

*W*

^{{intra}}and

*W*

^{{inter}}, respectively, the formalization (1) becomes

*W*

^{{intra}}and

*W*

^{{inter}}themselves are weight matrices, each has graph Laplacian denoted as

*L*

^{{intra}}and

*L*

^{{inter}}, respectively. This implies that we can translate problem (5) into

*μ*

_{ a }

*L*

^{{intra}}+

*μ*

_{ b }

*L*

^{{inter}}

*s*is positive semidefinite. This means that the optimization problem (6) is a convex problem, where the solution is given as

### Revised matrix inversion method for multi-layered biomedical data

In eq. (7), the matrix inversion requires *O*(*N*
^{3}) computational complexity for *N* number of data. For multi-layered structure of omics, the size of data can be tremendous which implies expensive computation for (7). To overcome such difficulty, various inversion algorithms for block tri-diagonal matrices, such in [42–45], can be considered. These algorithms, however, require square banded diagonal blocks which is not applicable since non-zero blocks in \( {W}^{\left\{{S}_p,\ {S}_q\right\}} \) can be rectangular because of difference in sizes of different omics (*n*
_{
p
} ≠ *n*
_{
q
}). In addition, sparseness of multi-layered structure of omics and the block tri-diagonal matrix can lead to inefficiency in matrix inversion involved in (7).

Revised matrix inversion method involves combination of Nyström method [39] and Woodbury formula [40]. The idea is to apply low rank approximation to *L*
^{{inter}} with Nyström method and utilize Woodbury formula to obtain the solution to problem (6). First, let us look at Nyström method and Woodbury formula.

*H*of size

*n*, randomly sample

*r*≪

*n*columns, namely

*C*. By defining

*Q*as the intersection of

*C*and its corresponding rows in

*H*, Nyström approximation

*Ĥ*, is given by

*Q*

^{+}is the pseudo-inverse of

*Q*with rank of

*Ĥ*equal to

*r*.

*A*is an

*n*×

*n*invertible matrix,

*B*is a

*r*×

*r*(

*r*not necessarily equal to

*n*) invertible matrix,

*U*is a

*n*×

*r*matrix. Suppose furthermore that B

^{− 1}+

*U*

^{ T }

*A*

^{− 1}

*U*is invertible. Then,

Woodbury formula is useful when computational cost of obtaining *A*
^{− 1} is cheap and the total matrix has sparse structure [43].

*L*

^{{inter}}is a positive semidefinite matrix by the property of graph Laplacian [26], and thus applicable for Nyström method. By applying Nyström method to

*L*

^{{inter}}, we obtain

*C*is a

*n*×

*r*(

*r*≪

*n*) matrix and

*Q*

^{+}is a

*r*×

*r*matrix. Substituting the result to eq. (7) yields

*A*=

*I*+

*μ*

_{ a }

*L*

^{{intra}}, and

*B*=

*μ*

_{ b }

*Q*

^{+}, By Woodbury formula, we have the final solution to problem (6) in the form

### Overview of the proposed method

*A*, defined as

*I*+

*μ*

_{ a }

*L*

^{{intra}}, is a block diagonal matrix and that the total matrix has sparse structure arising from the property of block tri-diagonal matrix. Since obtaining the inverse of block diagonal matrix is cheap and the total matrix is sparse, we can infer from [43] that Woodbury formula is an effective approach for obtaining the inverse in eq. (11). The complexity for Woodbury formula (in fact the overall complexity) is given by

*n*

_{ k }denotes size of stratum

*S*

_{ k }and

*r*≪

*N*.

In regards to Nyström method, a natural question could be brought upon selection of *L*
^{{inter}} for low-rank approximation. It is true that we could apply Nyström method on *μ*
_{
a
}
*L*
^{{intra}} + *μ*
_{
b
}
*L*
^{{inter}} as the sum of positive semi-definite matrices is still positive semidefinite. This approach, however, could lead to loss of structure and properties of each layer since we are approximating the graph Laplacian with randomly sampled columns. By selecting only *L*
^{{inter}} for Nyström method, we prevent from such loss. In addition, in contrast to various inversion algorithms for block tri-diagonal matrices, Nyström method is utilization of rectangular banded diagonal blocks combined with property of the graph Laplacian.

Finally, with respect to integrative analysis of multi-omics data, the overall complexity (13) is reduced from *O*(*N*
^{3}), achieving faster matrix inversion. Since the size of multi-omics data can get very large, the proposed method can adjust effectively to multi-layer structure of omics.

## Experiments

### Data

To validate the performance of the proposed method, we compared the performance of the multi-layered network with the proposed method to that of the non-hierarchical single layered network with ordinary semi-supervised learning scheme. For problem setting, we applied it to disease co-occurrence prediction problem on two-layered network consisting of symptom-layer and disease-layer. Disease co-occurrence prediction has importance for treatment and prevention, in practice [46]. For example, examining disease co-occurrence of cancer, which has high disease co-occurrence rate, can serve as a crucial prognostic factor for patients with cancer [47] and has direct influence on treatment of patients [48]. Therefore, disease co-occurrence had been studied but only on single layer of omics [38]. In our study, we employ the fact that knowing common symptoms of two diseases can aid disease co-occurrence prediction. For instance, knowing that a patient has coughing can lead to a diagnosis of both flu and pneumonia, which are co-occurring diseases.

*W*

^{{Disease}}, we utilized similarity between diseases in terms of shared proteins (out of 15,777 proteins). For similarity measurement, we used Tanimoto kernel [51] which is given as

*x*

_{ i }and

*x*

_{ j }are given as bit vectors. For intra-stratum relations of symptoms,

*W*

^{{Symptom}}, we utilized similarity between symptoms in terms of disease accompanying the symptoms. Tanimoto kernel was also used as similarity measurement for symptom relations. For inter-layer relations of symptom and disease, we used the symptom-disease data and binary weight where

*W*

_{ ij }

^{{inter}}= 1, if co-occurrence is present, and

*W*

_{ ij }

^{{inter}}= 0, otherwise, for

*i*∈

*Disease*, and

*j*∈

*Symptom*. Table 1 summarizes the data.

Data source for symptom-disease stratified network and disease co-occurrence information

Data | Number of data | Sources |
---|---|---|

Symptom-Disease | 319 symptoms/2,454 diseases | Supplementary information in [17] |

Disease | 4318 diseases/15,777 proteins | CTD, GAD, OMIM, PharmGKD, TTD |

Disease Co-occurrence | 1,015 diseases | HuDiNe |

### Experimental setting

For disease co-occurrence prediction problem, we employ the disease scoring setting, as in [38], where the semi-supervised learning algorithm provides the scores for disease. With the two-layered network of symptom and diseases, we first selected a target disease and gave label ‘1’ to target disease, indicating the presence of diseases. For other unlabeled diseases, we gave label ‘0’s. Then, we randomly gave label ‘1’s to 0 ~ 100% on 20% interval to related symptoms and gave ‘0’s to unrelated symptoms. The 0% of labeled symptoms represent the reference network, or the single disease network. We assume that we know 20% of co-occurring diseases in a priori, and therefore we randomly set and assign 20% of co-occurring diseases with label ‘1’s. Note that we can change the percentages but the effect is similar for both single-layered network and multi-layered network. The parameters, *μ*
_{
a
} and *μ*
_{
b
} were determined in the range {0.01, …, 100} and the performance of two-layered network of symptoms and diseases was compared to that of the reference network. The performance was measured by Area Under ROC Curve (AUC) [52], which compared prediction output *f* = (*f*
_{1}, *f*
_{2}, …, *f*
_{
n
})^{
T
} with true labels. For validation, Leave-One-Out method [53] was used and the experiment was repeated 10 times.

## Results and Discussion

### Results on validity of the proposed algorithm

### Enrichment analysis: relevance of use of symptom data for disease co-occurrence

Results for statistical evaluation with one-sided t-test for difference in means

Total list of diseases | Tier 1 | Tier 2 | Tier 3 | |
---|---|---|---|---|

| <0.001 | <0.001 | <0.001 | <0.001 |

T-statistics | 11.238 | 5.558 | 12.131 | 6.391 |

Degree of Freedom | 1,014 | 100 | 738 | 174 |

Standard Deviation | 3.654 | 8.822 | 2.378 | 0.368 |

In Fig. 4, it shows that the average number of share symptoms is higher with co-occurring disease than that with non-co-occurring diseases for each group. It is also noticeable to see that in Table 2, the results of t-tests allow us to reject the null hypothesis for each case with *p*-value <0.001 and conclude the alternative. Thus, we can deduce that there exists a definite relevance between shared symptoms between diseases and disease co-occurrence.

*f*, in eq. (12). These values represent relative closeness to being labeled as co-occurring disease with the target disease compared to one another. In Fig. 5, it shows that higher number of shared symptoms yields relatively higher value of predicative output of predicting disease co-occurrence. This solidifies the relevance of use of symptoms for prediction of disease co-occurrence.

## Conclusion

In this paper, we develop a graph based semi-supervised learning for prediction process in multi-layered biomedical systems. The algorithm involves matrix separation and graph integration methods but issues with computational complexity and sparseness must be solved. To resolve the issues, we devise a revised matrix inversion scheme consisting of Nyström method and Woodbury formula. Theoretically, the proposed method can reduce computational complexity by coping with sparseness, while preserving innate structure and properties of each layer.

To test the proposed algorithm, it was applied to two-layered system of symptoms and diseases to predict disease co-occurrence. The results showed improvement in prediction in terms of AUC where the performance increased from 0.59 of single disease network to 0.74 of symptom-disease network. Furthermore, it also showed relevance of use of symptoms on disease co-occurrence prediction with statistical evidence for higher average of shared symptoms with co-occurring diseases than that of non-co-occurring diseases. In theoretical perspective, although the proposed algorithm was applied on two-layered network for our experiments, it has scalability power as it is applicable to multi-layered structure with large number of biomedical data, and achieves faster inversion than normal matrix inversion.

As an extension of the research, since disease co-occurrence prediction problem has been studied for many years, it is possible to consider comparing the proposed method with other works such as [56]. In addition, we can consider extending additional layers where the extra layers convey relevant information. In case of disease co-occurrence prediction, inclusion of additional layers of phenotype/clinical data would be beneficial as they serve as important information to construct comorbidity map. In different perspective, we can also consider cases outside the box of the central dogma of biology, where multi-layered network can exist in a non-hierarchical structure.

On the other hand, the research has novelty in that it is a new integrative algorithm that incorporates vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. Moreover, the experiment results not only reflect the viewpoints of practitioners where they observe or seek for symptoms as primary diagnosis but also provide enhanced guideline for disease co-occurrence prediction, where it has importance for treatment and prevention in practice. Thus, the proposed algorithm can serve as a valuable tool for inference process of multi-layered biological system.

## Declarations

### Acknowledgments

The authors would like to gratefully acknowledge support from the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012-0000994/2015R1A5A7037630) and the Ajou University Research Fund.

### Funding

Publication of this article was funded by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012-0000994).

### Availability of data and materials

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

### Authors’ contributions

HJS designed the idea and supervised the study process. MJK and YHN analyzed the data, implemented the results and wrote the manuscript. SJS provided implications and interpretations of the results. All of the authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Consent for publication

Not Applicable.

### Ethics approval and consent to participate

Not Applicable.

### About this supplement

This article has been published as part of BMC Medical Informatics and Decision Making Volume 17 Supplement 1, 2017: Selected articles from the 6th Translational Bioinformatics Conference (TBC 2016): medical informatics and decision making. The full contents of the supplement are available online at <https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-17-supplement-1>.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Ishii N, Tomita M. Multi-omics data-driven systems biology of E. coli. In: Systems biology and biotechnology of Escherichia coli. Springer Netherlands; 2009. p. 41–57.Google Scholar
- Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 2015;16(2):85–97.PubMedView ArticleGoogle Scholar
- Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC bioinformatics. 2016;17(2):167.Google Scholar
- Berger B, Peng J, Singh M. Computational solutions for omics data. Nat Rev Genet. 2013;14(5):333–46.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim S. Network based approaches to the analysis of omics data. Methods (San Diego, Calif). 2015;83:1–2.View ArticleGoogle Scholar
- Presson AP, Sobel EM, Papp JC, Suarez CJ, Whistler T, Rajeevan MS, Vernon SD, Horvath S. Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC Syst Biol. 2008;2(1):1.View ArticleGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302(5643):249–55.PubMedView ArticleGoogle Scholar
- Weirauch MT. Gene coexpression networks for the analysis of DNA microarray data. Appl Stat Netw Biol. 2011:215–250.Google Scholar
- Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing. 2000;5:418-429.Google Scholar
- Dreze M, Monachello D, Lurin C, Cusick ME, Hill DE, Vidal M, Braun P. High-quality binary interactome mapping. Methods Enzymol. 2010;470:281–315.PubMedView ArticleGoogle Scholar
- Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437(7062):1173–8.PubMedView ArticleGoogle Scholar
- Venkatesan K, Rual J-F, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, Xin X, Goh K-I. An empirical framework for binary interactome mapping. Nat Methods. 2009;6(1):83–90.PubMedView ArticleGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122(6):957–68.PubMedView ArticleGoogle Scholar
- Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci. 2007;104(6):1777–82.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I. The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol. 2007;3(1):135.PubMedPubMed CentralGoogle Scholar
- Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L. The human disease network. Proc Natl Acad Sci. 2007;104(21):8685–90.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou X, Menche J, Barabási A-L, Sharma A. Human symptoms–disease network. Nat Commun. 2014;5.Google Scholar
- Yugi K, Kubota H, Hatano A, Kuroda S. Trans-Omics: How To Reconstruct Biochemical Networks Across Multiple ‘Omic’Layers. Trends Biotechnol. 2016;34(4):276–90.PubMedView ArticleGoogle Scholar
- Stanescu A, Caragea D. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets. BMC Syst Biol. 2015;9(5):1.View ArticleGoogle Scholar
- Belkin M, Matveeva I, Niyogi P. Regularization and semi-supervised learning on large graphs. In: International Conference on Computational Learning Theory: 2004. Springer. p. 624–638.Google Scholar
- Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7(Nov):2399–434.Google Scholar
- Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B. Learning with local and global consistency. Adv Neural Inf Proces Syst. 2004;16(16):321–8.Google Scholar
- Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the Twenty-first International Conference on Machine Learning (ICML). 2003;3:912–919.Google Scholar
- Chapelle O, Weston J, Schölkopf B. Cluster kernels for semi-supervised learning. In: Proceedings of the Advances in Neural Information Processing Systems 15 (NIPS). 2002;585–592.Google Scholar
- Zhu X, Ghahramani Z. Learning from labeled and unlabeled data with label propagation. In: Citeseer; 2002Google Scholar
- Chung FR. Spectral graph theory. Issue 92 in Regional Conference Series in Mathematics. Providence RI. American Mathematical Soc. 1997.Google Scholar
- Shin H, Tsuda K, Schölkopf B. Protein functional class prediction with a combined graph. Expert Syst Appl. 2009;36(2):3284–92.View ArticleGoogle Scholar
- Shin H, Tsuda K, Schölkopf B, Zien A. Prediction of protein function from networks. In: Semi-supervised learning. MIT press; 2006. p. 361–76.Google Scholar
- Tsuda K, Shin H, Schölkopf B. Fast protein classification with multiple networks. Bioinformatics. 2005;21 suppl 2:ii59–65.PubMedView ArticleGoogle Scholar
- Shin H, Lisewski AM, Lichtarge O. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics. 2007;23(23):3217–24.PubMedView ArticleGoogle Scholar
- Kim D, Shin H, Song YS, Kim JH. Synergistic effect of different levels of genomic data for cancer clinical outcome prediction. J Biomed Inform. 2012;45(6):1191–8.PubMedView ArticleGoogle Scholar
- Kim D, Joung J-G, Sohn K-A, Shin H, Park YR, Ritchie MD, Kim JH. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc. 2015;22(1):109–20.PubMedGoogle Scholar
- Kim D, Shin H, Sohn K-A, Verma A, Ritchie MD, Kim JH. Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods. 2014;67(3):344–53.PubMedPubMed CentralView ArticleGoogle Scholar
- Nguyen T-P, Ho T-B. Detecting disease genes based on semi-supervised learning and protein–protein interaction networks. Artif Intell Med. 2012;54(1):63–71.PubMedView ArticleGoogle Scholar
- You Z-H, Yin Z, Han K, Huang D-S, Zhou X. A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. Bmc Bioinformatics. 2010;11(1):1.View ArticleGoogle Scholar
- Kim D, Shin H, Joung J-G, Lee S-Y, Kim JH. Intra-relation reconstruction from inter-relation: miRNA to gene expression. BMC Syst Biol. 2013;7(3):1.Google Scholar
- Chen X, Yan G-Y. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep. 2014;4:5501.Google Scholar
- Nam Y, Kim M, Lee K, Shin H. CLASH: Complementary Linkage with Anchoring and Scoring for Heterogeneous biomolecular and clinical data. BMC Med Inform Decis Mak. 2016;16(3):72.PubMedPubMed CentralView ArticleGoogle Scholar
- Williams C, Seeger M. Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th annual conference on neural information processing systems: 2001. p. 682–688.Google Scholar
- Woodbury MA. Inverting modified matrices. Memorandum Rep. 1950;42:106.Google Scholar
- Bengio Y, Delalleau O, Le Roux N. Label propagation and quadratic criterion. Semi-supervised Learn. 2006;10.Google Scholar
- Boffi NM, Hill JC, Reuter MG. Characterizing the inverses of block tridiagonal, block Toeplitz matrices. Comput Sci Discov. 2014;8(1):015001.View ArticleGoogle Scholar
- Hager WW. Updating the inverse of a matrix. SIAM Rev. 1989;31(2):221–39.View ArticleGoogle Scholar
- Meurant G. A review on the inverse of symmetric tridiagonal and block tridiagonal matrices. SIAM J Matrix Anal Appl. 1992;13(3):707–28.View ArticleGoogle Scholar
- Terekhov AV. A fast parallel algorithm for solving block-tridiagonal systems of linear equations including the domain decomposition method. Parallel Comput. 2013;39(6):245–58.View ArticleGoogle Scholar
- Degenhardt L, Hall W, Lynskey M. What is comorbidity and why does it occur? Comorbid Mental disorders and substance use disorders: Epidemiology, prevention and treatment. 2003;10–25.Google Scholar
- Piccirillo JF, Tierney RM, Costas I, Grove L, Spitznagel Jr EL. Prognostic importance of comorbidity in a hospital-based cancer registry. Jama. 2004;291(20):2441–7.PubMedView ArticleGoogle Scholar
- Piccirillo JF. Importance of comorbidity in head and neck cancer. Laryngoscope. 2000;110(4):593–602.PubMedView ArticleGoogle Scholar
- U.S. National Library of Medicine, Medical Subject Headings (www.ncbi.nlm.nih.gov/mesh, Acessed 5 Jan 2016)
- HuDiNe (www.hudine.neu.edu, Acessed 17 Jan 2016)
- Tanimoto TT. elementary mathematical theory of classification and prediction. New York; 1958.Google Scholar
- Swets JA. Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. New York. Psychology Press; 2014.Google Scholar
- Fukunaga K, Hummels DM. Leave-one-out procedures for nonparametric error estimates. IEEE Trans Pattern Anal Mach Intell. 1989;11(4):421–3.View ArticleGoogle Scholar
- McDonald V, Scully M. Causes of thrombocytopenia. Medicine. 2009;3(37):149–54.View ArticleGoogle Scholar
- Warkentin TE, Levine MN, Hirsh J, Horsewood P, Roberts RS, Gent M, Kelton JG. Heparin-induced thrombocytopenia in patients treated with low-molecular-weight heparin or unfractionated heparin. N Engl J Med. 1995;332(20):1330–6.PubMedView ArticleGoogle Scholar
- Sun K, Gonçalves JP, Larminie C, Pržulj N. Predicting disease associations via biological network analysis. BMC bioinformatics. 2014;15(1):1.View ArticleGoogle Scholar