Product limit estimators
The primary objective of survival analysis is to explore the time until a particular event. Hence, we describe the stochastic behavior of an outcome variable in time type. We usually use survival, density, hazard, and the mean or median residual life functions in this regard. As these functions are attainable from each other, there is no priority except for better interpretability in choosing one. The product limit estimators are the estimates of survival function, which is defined as the probability of an individual surviving after a given time point t:
$$S\left( t \right) = {\text{P}}(T > t)$$
T is a random variable that denotes when the event of interest occurs. Kaplan and Meier partitioned observed times into intervals according to unique event times and proposed the following estimator for all t values in the range of observed data when \(t_{1}\) represent the first event time [1]:
$$\hat{S}\left( t \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\;t < t_{1} } \hfill \\ {\mathop \prod \limits_{{t_{i} \le t}} \left[ {1 - \frac{{d_{i} }}{{Y_{i} }}} \right] } \hfill & {if\;t_{1} \le t} \hfill \\ \end{array} } \right.$$
(1)
where \(d_{i}\) and \(Y_{i}\) represent the number of failures and at-risk persons in each interval, respectively. Therefore, the product-limit estimator is a discrete approach that leads to a step function that only changes at event times. The Greenwood formula is the well-known approach to estimating the variance of the estimators [1]:
$$\hat{V}\left[ {\hat{S}\left( t \right)} \right] = \hat{S}\left( t \right)^{2} \mathop \sum \limits_{{t_{i} \le t}} \frac{{d_{i} }}{{Y_{i} \left( {Y_{i} - d_{i} } \right)}}$$
(2)
Bayesian network
Every Bayesian Networks (BNs) correspond to a Directed Acyclic Graph (DAG) and a joint distribution, which are the graphical and probabilistic aspects of the model. DAG consists of nodes corresponding to random variables and edges that present conditional probabilities. According to the domain of the random variables, the BN could be discrete, Gaussian, or hybrid. Our BN in this study is a discrete one. Therefore, for a set of discrete random variables \({\varvec{X}} = \left( {X_{1} ,X_{2} , \ldots ,X_{D} } \right){ }\) Taking their values in the discrete and finite D-dimensional domain. The BN is defined as pair \({\mathcal{M}} = \left( {{{\mathcal{g}}},\left( {P\left( {X_{d} |{\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d} } \right)} \right)} \right)_{1 \le d \le D} } \right)\) where \({{\mathcal{g}}} = \left( {{\varvec{X}},{\varvec{\varepsilon}}} \right)\) is a DAG presentation of random variables \({\varvec{X}}\) with edges set \({\varvec{\varepsilon}}\), \({\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d} } \right)\) is the set of parents \(X_{d}\) in \({\varvec{X}}\), and \(\left( {P\left( {X_{d} |{\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d} } \right)} \right)} \right)_{1 \le d \le D}\) is the conditional probability of node \(X_{d}\) given their parents in the set \({\varvec{X}}\).
The appealing feature of BN is to summarize the complex joint probability distribution \({\varvec{X}}\) in the following parsimonious way using the conditional independence and Markov chain properties:
$$P\left( {X_{1} ,X_{2} , \ldots ,X_{D} } \right) = \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d} |{\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d} } \right)} \right)$$
(3)
Dynamic Bayesian network
The classical BN is not adopted to address time-dependent processes like survival analysis [32]. Therefore, Dynamic Bayesian Network (DBN) [33] was introduced to extend this process. In this context, time-dependent random variables \(\left( {{\varvec{X}}_{t} } \right)_{t \ge 1} = \left( {X_{1,t} , \ldots ,X_{D,t} } \right)_{t \ge 1}\) are defined where t is a discrete index time formally called slice. DBN uses Markov property which indicates the future of a stochastic process is independent of its past, given current status or several lags before it. The number of lags determines the order of the Markov process. This study only needs to use the Markov process of order 1, which leads to a 2-slice Temporal Bayesian Network (2-TBN). In this regard, we assume \(X_{t - 1} \bot X_{t + 1} |X_{t}\) for all \(t \ge 2\). A 2-TBN could be defined as a pair of 2 BNs \(\left( {{\mathcal{M}}_{1} ,{\mathcal{M}}_{ \to } } \right)\) where \({\mathcal{M}}_{1}\) is the joint distribution of the initial process \({\varvec{X}}_{1} = \left( {X_{1,1} , \ldots ,X_{D,1} } \right)\) and \({\mathcal{M}}_{ \to }\) represent the transition model. The joint probability distribution \({\mathcal{M}}_{1}\) easily derived from BN approach in Eq. 3:
$$P\left( {{\varvec{X}}_{1} } \right) = \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d,1} |{\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d,1} } \right)} \right)$$
(4)
In the transition model, the joint distribution of \({\varvec{X}}_{t}\) only depends on random variables belonging to the set of parents \({\varvec{X}}_{t}\) at slice \(t - 1\) in the form:
$$P\left( {{\varvec{X}}_{t} {|}{\varvec{X}}_{t - 1} } \right) = P\left( {X_{1,t} , \ldots ,X_{D,t} {|}X_{1,t - 1} , \ldots ,X_{D,t - 1} } \right) = \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d,t} |{\mathbf{\mathcal{P}}}_{{{\varvec{X}}_{t} }} \left( {X_{d,t} } \right)} \right)$$
(5)
Hence the probability distribution of 2-TBN is calculated by the combination of Eqs. 4,5:
$$P\left( {{\varvec{X}}_{1 \le t \le T} } \right) = \mathop \prod \limits_{d = 1}^{D} P(X_{d,1} |{\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d,1} } \right))\mathop \prod \limits_{t = 2}^{T} \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d,t} |{\mathbf{\mathcal{P}}}_{{{\varvec{X}}_{t} }} \left( {X_{d,t} } \right)} \right)$$
(6)
In order to consider the time stationary covariates \({\varvec{Z}} = \left( {Z_{1} , \ldots ,Z_{q} } \right)\) in the model, we could extend the parent sets in both initial processes \({\mathcal{M}}_{1}\) and transition model \({\mathcal{M}}_{ \to }\). In this manner, the initial conditional probability could be presented as:
$$P\left( {{\varvec{X}}_{1} } \right) = \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d,1} |{\mathbf{\mathcal{P}}}_{{{\varvec{X}}_{t} }} \left( {X_{d,t} } \right),{\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {X_{d,1} } \right)} \right)$$
where the \({\mathbf{\mathcal{P}}}_{{\varvec{X}}} \left( {X_{d,1} } \right)\) and \({\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {X_{d,1} } \right)\) represent the sets of parents \(X_{d,1}\) in \({\varvec{X}}\) and \({\varvec{Z}},\) respectively. On the other hand, the modified transition probability distribution is reformed to:
$$P\left( {{\varvec{X}}_{t} {|}{\varvec{X}}_{t - 1} ,Z_{1} , \ldots ,Z_{q } } \right) = P\left( {X_{1,t} , \ldots ,X_{D,t} {|}X_{1,t - 1} , \ldots ,X_{D,t - 1} ,Z_{1} , \ldots ,Z_{q} } \right) = \mathop \prod \limits_{d = 1}^{D} P\left( {X_{d,t} |{\mathbf{\mathcal{P}}}_{{{\varvec{X}}_{t} }} \left( {X_{d,t} } \right),{\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {X_{d,t} } \right)} \right)$$
Dynamic Bayesian network interpretation of product limit estimators
A DBN of type 2-TBN could efficiently conduct the calculation process of product-limit estimators. The product-limit approach considers the time as discrete intervals between consecutive observed failure times and counts the individuals at risk and failures. Equivalently, we define discrete intervals of Kaplan–Meier as slices and two time-dependent binary status variables. The survival state variables \(N_{i,t}\) is equal to 1 if individual \(i\) survives at least to slice t and state variables \(Q_{i,t}\) is equal to 1 if the individual \(i\) censored before or at slice t. Defining these variables enables us to form the DBN in Fig. 1 to analyze survival data. In addition, we can enter other fixed effect covariates in \({\varvec{Z}}\) to the model and examine their importance by the structure learning algorithms.
The DBN estimators of survival probability at each time t according to Eq. 6 are equal to:
$$\hat{S}\left( t \right) = P\left( {N_{1} = 1|{\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {N_{t} } \right)} \right)\mathop \prod \limits_{t = 2}^{T} P\left( {N_{t} = 1|N_{t - 1} = 1,Q_{t - 1} = 0,{\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {N_{t} } \right)} \right)$$
(7)
Moreover, the discrete covariates set \({\mathbf{\mathcal{P}}}_{{\varvec{Z}}} \left( {N_{t} } \right)\) were found by structure learning algorithms.
For simplicity, we consider there are no covariates in the model, so Eq. (7) could be modified by counter function N:
$$\begin{aligned} \hat{S}\left( t \right) = P\left( {N_{1} = 1} \right)\mathop \prod \limits_{t = 2}^{T} P\left( {N_{t} = 1{|}N_{t - 1} = 1,Q_{t - 1} = 0} \right) = & \frac{{N\left( {N_{i1} = 1} \right)}}{N}\mathop \prod \limits_{t = 2}^{T} \frac{{N\left( {N_{it} = 1} \right)}}{{N\left( {N_{i,t - 1} = 1, Q_{i,t - 1} = 0} \right)}} \\ = & \hat{S}\left( 1 \right)\mathop \prod \limits_{t = 2}^{T} \left[ {1 - \frac{{d_{t} }}{{Y_{t} }}} \right] \\ \end{aligned}$$
(8)
That is the same as product-limit estimators in Eq. (1). Therefore, we are able to use the Greenwood formula in Eq. (2) to calculate the variance of survival estimations. In addition, the common bootstrapping approaches in the BN context, like likelihood weighting and logic sampling, are the alternative approach in this way[34].
Simulation study
We conducted a simulation study to assess the performance of our model in comparison with the Kaplan–Meier and Cox proportional hazard regression. We defined various scenarios according to the sample size (N = 800, 5000, 10,000), censoring rate (R = 25%, 40%, 60%), and shape parameters of survival (\(\alpha_{S}\)) and censoring (\(\alpha_{C}\)) distributions. We considered five covariates distributed as mutually independent binomial distributions with different probability parameters \(\left[ {X_{i} \sim B\left( {N,P_{i} } \right), i = 1, \ldots ,5 P_{i} = 0.1, 0.2, 0.5, 0.7, 0.9} \right]\).
The survival and censoring times were generated using Weibull distributions. The scale parameter of survival time distribution was reparametrized according to the summation of the covariates \(\left[ {\theta_{S} = \mathop \sum \limits_{i = 1}^{5} X_{i} } \right]\). Using the numerical methods and assuming the fixed value for censorship, we found the scale parameter of censoring time distribution. We set the shape parameter of survival and censoring distributions as the values 0.5 (decreasing event/censor rate), 1 (constant event/censor rate), and 2 (increasing event/censor rate). In this manner, we achieved nine different scenarios for the shape of survival/censoring times.
In brief, if the survival time \(S \ge 0\) follows the below Weibull distribution:
$$f\left( {s;\alpha_{S} ,\theta_{S} } \right) = \frac{{\alpha_{S} }}{{\theta_{S} }}\left( {\frac{s}{{\theta_{S} }}} \right)^{{\alpha_{S} - 1}} {\text{exp}}\left( { - \left( {\frac{s}{{\theta_{S} }}} \right)^{{\alpha_{S} }} } \right)$$
where scale parameter \(\theta_{S}\) and shape parameter \(\alpha_{S}\) were defined before, the cumulative distribution function of S is:
$$F\left( {s;\alpha_{S} ,\theta_{S} } \right) = 1 - {\text{exp}}\left( { - \left( {\frac{s}{{\theta_{S} }}} \right)^{{\alpha_{S} }} } \right)$$
As the Weibull is a continuous random distribution \(F\left( {s;\alpha_{S} ,\theta_{S} } \right)\sim U\left( {0,1} \right)\). Therefore we generated N samples \(u_{i} \sim U\left( {0,1} \right)\) for \(i = 1, \ldots ,N\) and then compute the \(s_{i} = \theta_{S} \sqrt[{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\alpha_{S} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\alpha_{S} }$}}}]{{ - {\text{ln}}\left( {u_{i} } \right)}}\). A similar approach was used to generate the censored times \(c_{i} s\).
We fitted all three models to the simulated data and estimated survival probability at 20%, 50% (median), and 80% percentiles of actual survival times. The bias and its Root Mean Squared Errors (RMSE) were calculated in 1000 randomly generated samples in each scenario.
Real data analysis
We applied our proposed method to real-world survival data. The 760 patients diagnosed with gastric cancer at the Iran cancer Institute and who had undergone gastrectomy from 1995 to 2012 entered the study. This historical and artificial cohort of patients was followed until observing death events. The censorship was considered in case of loss to follow-up or being alive at the end of observation time. All the variables except follow-up duration were time stationery and were collected at the surgery time.
We conducted the ordinary Kaplan–Meier survival analysis, DBN, and Cox proportional hazard (PH) model. The Cox PH model was added to compare our findings with a regression model that could handle the covariate effect in survival analysis. In addition, the censor probability plots were generated using the Kaplan–Meier approach. For this purpose, we defined censorship as the primary event and death as the alternative status.
Structure learning
We used Hill-Climbing (HC) and Tabu search, two of the most popular score-based learning algorithms, to find the structure of DAG [35]. The last event occurred 15 years after surgery, and there was no event in year 13 post-surgery. Therefore 14 eligible slices \(t\) correspond to the unique event year were defined in the DBN structure. The prior and transition networks in Fig. 1 related to \(N_{i,t}\) and \(Q_{i,t}\) were considered as the white list, and the algorithms learned the structure of stationary variables Z. The validation of structure learning was conducted by the ten repeated hold-out technique using ten subsamples in size 30% of the original data [36]. The posterior classification error based on likelihood weighting was set as the loss function [37]. The validation process was repeated according to different score functions, including logarithm of likelihood, AIC, BIC, BDE, BDS, and \(K^{2}\) [38].