 Research article
 Open access
 Published:
Grid multicategory response logistic models
BMC Medical Informatics and Decision Making volumeÂ 15, ArticleÂ number:Â 10 (2015)
Abstract
Background
Multicategory response models are very important complements to binary logistic models in medical decisionmaking. Decomposing model construction by aggregating computation developed at different sites is necessary when data cannot be moved outside institutions due to privacy or other concerns. Such decomposition makes it possible to conduct grid computing to protect the privacy of individual observations.
Methods
This paper proposes two grid multicategory response models for ordinal and multinomial logistic regressions. Grid computation to test model assumptions is also developed for these two types of models. In addition, we present grid methods for goodnessoffit assessment and for classification performance evaluation.
Results
Simulation results show that the grid models produce the same results as those obtained from corresponding centralized models, demonstrating that it is possible to build models using multicenter data without losing accuracy or transmitting observationlevel data. Two real data sets are used to evaluate the performance of our proposed grid models.
Conclusions
The grid fitting method offers a practical solution for resolving privacy and other issues caused by pooling all data in a central site. The proposed method is applicable for various likelihood estimation problems, including other generalized linear models.
Background
In biomedical research, data sharing plays an important role in accelerating scientific discoveries. For example, networks based on information from electronic health record (EHR) [1,2] have been established for this purpose. However, due to privacy concerns, patientlevel data cannot always be exchanged across different institutions. In these circumstances grid computing, which avoids sharing patient level data among multiple institutions, can be used to build a global model.
For example, logistic regression models have been used in a variety of clinical applications, such as scoring candidates for liver transplant using the Model for Endstage Liver Disease [3], producing estimates related to myocardial infarction diagnosis [4], and detecting suspicious accesses to electronic health records [5]. These scenarios, in their classical setups, have difficulties in handling multicenter data, as the training phase requires accessing the entire dataset.
Our previous work [6] and [7] proposed privacypreserving models through the aggregation of nonsensitive intermediary results (i.e., gradient and Hessian matrix for the loglikelihood function), but the model only deals with binary models. Response variables with more than two categorical values occur very often in medical models. For example, cancer progress is often categorized into 4 or 5 phases. One simple method to deal with multiple responses is to fit binary logistic fitting for each pair of these multiple categories. However, this approach is very inconvenient and the performance of each binary logistic model might be degraded when sample size is insufficient. Some researchers extended the binary logistic model to handle multicategory response problems. Among existing approaches, ordinal logistic [8] and multinomial logistic [9] are the two most popular multicategory response logistic models for ordinal and nominal responses, respectively. Both methods are widely used to fit data with multicategory response. However, methods for binary model fitting assessment may not be applicable to multicategory problems. Hosmer and Lemeshow [9] introduced novel methods to evaluate the goodnessoffit of multicategory logistic models. The Area Under the ROC Curve (AUC) [10] is an important measure in checking classification performance of binary outcome models. Hand and Till [11] generalized the original AUC measure to deal with classification methods for multicategory outcome cases. The AUC for binary logistic regression is given by
where n_{1}, n_{2} are the number of observations with Y=1 and with Y=0, R is the rank sum based on the predicted probability of Yâ€‰=â€‰1 for observations with Yâ€‰=â€‰1 among all observations. Van Calster et. al. [12] described several AUC score estimation methods for the ordinal logistic model, one of which is to use the mean of AUC scores from Kâ€‰âˆ’â€‰1 binary logistic regression estimations to serve as the AUC score for ordinal logistic model. Hand and Till [11] defined Ã‚(k_{1}k_{2}) in the same way for observations with Yâ€‰âˆˆâ€‰{k_{1},â€‰k_{2}} and proposed a generalized AUC for multinomial logistic model as \( \frac{2}{K\left(K1\right)}{\displaystyle {\sum}_{k_1<{k}_2}}\ \left[\widehat{A}\left({k}_1\Big{k}_2\right)+\widehat{A}\left({k}_1\Big{k}_2\right)\right]/2 \) for 1â€‰â‰¤â€‰k_{1},â€‰k_{2}â€‰â‰¤â€‰K. Yang and Carlin [13] generalized the ROC curve to a surface and used the volume under the ROC surface (VUC) to measure the accuracy of a diagnostic test based on multicategory response models. Dreiseitl et. al. [14] proposed to use a threeway ROC curve analysis for the same goal. Van Calster et. al. [12] suggested an ordinal cindex measurement (ORC) and discussed the relationship between the new measurement with VUC and other measurements based on assessing pairs of cases.
In this article, we introduce grid ordinal and multinomial logistic models to handle multicenter modeling of multicategory response, including model assumption checking. We also propose to use the grid AUC score to evaluate the added value of the grid model fitting when compared to models fitted by separate subdatasets. The remainder of this article is organized as follows. The second Section briefly reviews ordinal logistic [8] and multinomial logistic [9] models and their model assumptions, and also discusses model coefficient estimation methods for both models and the statistical test for checking the ordinal logistic assumption. The third Section discusses grid maximum likelihood estimation and grid computing for the ordinal logistic model assumption test statistics. The fourth Section provides technical details for grid model fitting assessment. The fifth Section elaborates on grid AUC score computing. The sixth Section describes simulation studies to evaluate the theoretical results. The seventh Section carries out additional experiments on two real datasets to demonstrate our proposed methods. The eighth Section discusses the generalization of the proposed grid models and the limitations of this work.
Methods
Ordinal and multinomial logistic models
Before we introduce our method we first introduce both ordinal and multinomial logistic models in a few more detail. In terms of how to split response categories, many ordinal logistic models have been studied. However, in this article, we only focus on the proportional odds logistic model to deal with multicategory problems. The proposed method will be extended to other multicategory logistic regression models in the future. Suppose response Y could take values 1,â€‰â‹¯,â€‰K (for K categories) with Kâ€‰â‰¥â€‰3. There are m features in the model and n observations. The predictor matrix can be expressed as X^{T}â€‰=â€‰(x_{1},â€‰â‹¯,â€‰x_{ n }) with \( {x}_i^T=\left({x}_{i,1},\cdots, {x}_{i,m}\right) \) for 1â€‰â‰¤â€‰iâ€‰â‰¤â€‰n. Letâ€™s define p(w,â€‰i)â€‰â‰œâ€‰Pr(Yâ€‰â‰¤â€‰wx_{ i }) and assume 1â€‰â‰¤â€‰iâ€‰â‰¤â€‰n and 1â€‰â‰¤â€‰wâ€‰â‰¤â€‰Kâ€‰âˆ’â€‰1. The ordinal logistic regression [8] can be defined as
With parameters Î²^{T}â€‰=â€‰(b_{1},â€‰â‹¯,â€‰b_{ m }). The conditional likelihood function is given by
where \( {\mathrm{I}}_{\left[{\mathrm{y}}_{\mathrm{i}}=\mathrm{w}\right]} \) is the indicator function, with value of 1 if y_{ i }â€‰=â€‰w and 0 otherwise. Let Î¸ =â€‰(Î±_{1},â€‚Î±_{2},â€‚â‹¯,â€‚Î±_{Kâ€‰âˆ’â€‰1}, Î²^{T})^{T}, the loglikelihood function for the proportional odds logistic model be denoted as l_{ O }(Î¸). The maximum likelihood estimation (MLE) \( \widehat{\theta} \) for l_{ O }(Î¸) is usually computed using the Newton method for efficiency. The variancecovariance matrix for \( \widehat{\theta} \) is estimated by \( {\left[{\partial}^2{l}_O\left(\theta \right)/\left(\partial \theta \partial {\theta}^T\right)\Big{}_{\widehat{\theta}}\right]}^{1} \).
Equation (1) assumes that the nonintercept model coefficients Î² remain the same for 1â€‰â‰¤â€‰wâ€‰â‰¤â€‰Kâ€‰âˆ’â€‰1. Usually, a justification for the model assumption is needed when fitting ordinal logistic model. This assumption is called proportional odds assumption [15]. The score test is a common way to test the proportional odds assumption. To perform the score test, we first introduce the generalized ordered logit model [16], which is a generalization of the ordinal logistic model as it allows nonintercept model coefficients to be different. The generalized ordered logit model is given by
with \( {\beta}_w^T=\left({b}_{w,1},\cdots, {b}_{w,m}\right) \) for 1â€‰â‰¤â€‰iâ€‰â‰¤â€‰n and 1â€‰â‰¤â€‰wâ€‰â‰¤â€‰Kâ€‰âˆ’â€‰1. Let us denote \( \psi ={\left({\alpha}_1,{\beta}_1^T,\cdots, {\alpha}_2,{\beta}_{K1}^T\right)}^T \). The loglikelihood function for this generalized model, l_{ G }(Ïˆ), is obtained by combining (3) and (2). From its definition, we see that the generalized ordered logistic model requires more parameters than the proportional odds model. Hence, model fitting for small sample size data is a big concern for the generalized ordered logistic model. To check the proportional odds assumption, we need to test whether Î²_{1}â€‰=â€‰â‹¯â€‰=â€‰Î²_{Kâ€‰âˆ’â€‰1}. As mentioned previously, suppose \( \widehat{\theta}=\left\{{\widehat{\alpha}}_1,\cdots, {\widehat{\alpha}}_{K1},\widehat{\beta}\right\} \) is the MLE for l_{ O }(Î¸). Let \( \overset{\sim}{\psi }=\left\{{\widehat{\alpha}}_1,\widehat{\beta},\cdots, {\alpha}_{K1},\widehat{\beta}\right\} \). The score test statistic is
Under the null hypothesis Î²_{1}â€‰=â€‰â‹¯â€‰=â€‰Î²_{Kâ€‰âˆ’â€‰1}, T_{ o } asymptotically follows \( {\chi}_{m\left(K2\right)}^2 \).
The multinomial logistic model is mainly dealing with a nominal response with unordered categories. It does not require the proportional odds assumption. Using the multinomial model on ordered data disregards the inherent information in the ordering of the response categories and is not, in general, recommended. Suppose the response variable and predictors are the same as described in the proportional odds model except that the proportional odds assumption does not hold. Letâ€™s denote \( \tilde{p}\left(w,i\right)\triangleq Pr\left(Y=w\Big{x}_i\right) \). In multinomial logistic model for 1â€‰â‰¤â€‰iâ€‰â‰¤â€‰n and 1â€‰â‰¤â€‰wâ€‰â‰¤â€‰Kâ€‰âˆ’â€‰1
The likelihood function is then given by
As previously mentioned Ïˆ\( =\Big({\alpha}_1,\kern0.5em {\beta}_1^T,\kern0.5em \cdots, \kern0.5em {\alpha}_{K1}, \)\( {\beta}_{K1}^T\Big){}^T \). The loglikelihood function for multinomial logistic regression is denoted as l_{ M }(Ïˆ). The MLE \( \widehat{\psi} \) for multinomial logistic regression can be also obtained by the Newton method and the variancecovariance matrix for \( \widehat{\psi} \) is estimated by \( {\left[{\partial}^2{l}_M\left(\psi \right)/\left(\partial \psi \partial {\psi}^T\right)\Big{}_{\widehat{\psi}}\right]}^{1} \). It is worth noting that the multinomial logistic model requires the same number of parameters as does the generalized ordered logistic model.
Grid ordinal and multinomial logistic models
This section first proposes the grid Newton method for the MLE, which can be used for both the grid proportional odds and the multinomial logistic regression models. Then, we develop the grid proportional odds ratio test for proportional odds logistic regression.
Suppose that we want to find the MLE \( \widehat{\theta} \) for the loglikelihood function l(Î¸) with Î¸ being a column vector. We can apply the Newton method as
for Jâ€‰=â€‰0,â€‰1,â€‰2,â€‰â‹¯. Î¸^{(J)} approaches \( \widehat{\theta} \) as J increases. Because the Newton method is very efficient, it is usually enough for Jâ€‰<â€‰15 to achieve a tolerance 10^{(âˆ’6)} for Î¸^{(J)},
Suppose data are split into U parts in terms of observations and each part contains the same variables. Let l(Î¸) be the loglikelihood function for data combined from U parts, which can be decomposed by observations. Hence
where l_{ u }(Î¸) is the loglikelihood function for data of part u with uâ€‰=â€‰1,â€‰â‹¯,â€‰U. For the gradient and Hessian matrix of l(Î¸), we have
and
respectively. We get the following grid Newton method from (9), (10) and (7)
Equation (11) tells us that each Newton update can be finished by combining gradients and Hessian matrices of the partial loglikelihood functions based on corresponding subdatasets. This equation suggests the following model fitting process in which separate datasets do not need to be pooled in the fitting process.

1.
Compute gradients and Hessian matrices based on the current coefficient estimation using partial datasets separately.

2.
Find overall gradients and Hessian matrices by combining the partial results obtained from Step1, then updating the coefficient estimation.
Starting from an initial value for the model coefficients, the MLE can be obtained by repeating Step 1 and Step 2 until convergence.
The above grid Newton method is used for both ordinal and multinomial logistic model coefficient estimations. The variancecovariance matrix of MLE \( \widehat{\theta} \) based on the loglikelihood function l(Î¸) is given by \( {\left[{\partial}^2l\left(\theta \right)/\left(\partial \theta \partial {\theta}^T\right)\Big{}_{\widehat{\theta}}\right]}^{1} \). Using (9) and (10), we get the grid variancecovariance matrix estimates of \( \widehat{\theta} \). This is a typical grid method for a variancecovariance matrix and it is suitable for both proportional odds and multinomial logistic regression. The gradients and Hessian matrices for both regression models are presented in Additional file 1.
For the grid computing for the proportional odds assumption test statistic T_{ o } in (4), we first compute the grid MLE \( \widehat{\theta} \) based on the loglikelihood l_{ O } of ordinal logistic regression, then T_{ o } is produced by using (9) and (10) to evaluate the gradient and Hessian matrix of l_{ G } at \( \overset{\sim}{\psi } \), where \( \overset{\sim}{\psi } \) comes from the rearrangement of \( \widehat{\theta} \) entries as introduced in the previous Section.
Grid model fit assessment
Assessment of goodnessoffit for the ordinal logistic model can be done using methods for binary logistic regression on each of Kâ€‰âˆ’â€‰1 regressions. Additionally, Fagerland and Hosmer [17] proposed a HomerLemeshow type goodnessoffit test for the proportional odds. To handle the multinomial logistic model, Hosmer and Lemeshow [9] modified several existing measures, including Pearsonâ€™s residual and Rsquare. Alternatively, Fagerland et al. [18] modified the HosmerLemeshow (HL) test for the same goal. Some of these methods can be used for grid models.
We use the HL test as an example to explain grid model fit assessment. For binary logistic regression, the HL test statistic is calculated as follows. First, sorted values of the predicted probability of Yâ€‰=â€‰1 for all observations are split into g groups. E_{c,k} equals the sum of predicted probability of Yâ€‰=â€‰k (kâ€‰=â€‰0,â€‰1) in category c, O_{c,k} equals the number of observations with Yâ€‰=â€‰k in category c. Then the test statistic is given by
which asymptotically follows \( {\chi}_{g2}^2 \). In the modified statistic, the g groups are split based on sorted values of the predicted probability of Y<K for all observations. The extended HL (EHL) test statistic is defined as \( H{L}_m={\displaystyle {\sum}_{c=1}^{\mathit{\mathsf{g}}}{\displaystyle {\sum}_{k=1}^K{\left({O}_{c,k}{E}_{c,k}\right)}^2/{E}_{c,k}}} \), where O_{c,k} and E_{c,k} are defined in the same way as above. The new statistic asymptotically follows \( {\chi}_{\left(g2\right)\left(K1\right)}^2 \). O_{c,k} and E_{c,k} only requires response value and predicted probability of Y=k for all observations. Grid HL_{ m } computing can be finished by first pooling Y values and corresponding predicted probability values from separate subdatasets after grid model fitting.
Grid Area under the ROC Curve
The rationale of grid model fitting is based on the assumption that the grid model outperforms models fitted by separate subdatasets. However, this is not always true and actually depends on data structures. The Area Under ROC Curve (AUC) is a very popular measurement to assess model classification performance, so we propose to use the AUC to check the value of a grid model.
For ordinal logistic regression, we adopt the idea proposed by Van Calster et al. [12] to use the mean of Kâ€‰âˆ’â€‰1 AUC scores for assessing the model. For the multinomial logistic regression we adopt the Hand and Till [11] AUC estimation method. For both grid models, their AUC scores can be obtained by pooling response values and predicted probabilities for necessary observations from separate subdatasets after model fitting. To check the added value, we need to compare the grid AUC score with the AUC score for each subdataset.
Results
Simulation
The derivation of the grid method clearly implies that the grid method gives identical results as does the centralized method (i.e., the methods in which subdatasets are pooled). Additionally, only the total sample size from all sites is important for the fitting results, and the sample size in an individual site will not affect the fitting results for a fixed total sample size. We conducted simulation studies to evaluate the accuracy of the proposed grid model estimation and to compare it with the classical centralized fitting method. Four simulation studies in different settings were performed to compare the various grid multicategory models against two corresponding centralized models. For all studies, simulated data are split into two pieces, one for model fitting, and another for AUC score evaluation and HL test. The first two studies are designed for the ordinal logistic model and the other two studies are designed for the multinomial logistic model. In Studies 1 and 2, data were simulated from an ordinal logistic model with total sample sizes 1800 and 900, respectively. In Studies 3 and 4, data were simulated from a multinomial logistic model with total sample sizes 1800 and 900, respectively. The HL tests for all binary logistic regression estimations were performed in Studies 1 and 2; the extended HL test was performed in Studies 3 and 4. In addition, an average AUC score or extended AUC score [11] was evaluated for each study.
In all studies, we simulated data so that there are 4 outcome categories (Yâ€‰âˆˆâ€‰{1,â€‰2,â€‰3,â€‰4}). For Studies 1 and 3 we used a total sample size of 1800 for centralized models and split them into 3 separate parts in three different ways: (600, 600, 600), (100, 200, 1500) and (50, 50, 1700) for the grid models. For Studies 2 and 4 we used a total sample size of 900 for centralized models and split them into 3 separate parts in three different ways: (300, 300, 300), (50, 100, 750) and (24, 26, 850) for the grid models. For all studies, each split subset was further split in half, one for model fitting and another for AUC evaluation and HL or extended HL tests. We chose two continuous covariates x_{1} and x_{2} and two binary covariates x_{3} and x_{4} (i.e., 5 coefficients for 4 covariates and intercept) in these studies. Simulation data were generated in two steps. First, we generated x_{1} and x_{2} from a standard normal distribution independently and generated x_{3} and x_{4} from a Bernoulli distribution with pâ€‰=â€‰0.5 independently. For Studies 1 and 2 we generated the response y from an ordinal distribution assuming that
and
For Studies 3 and 4 we generated the response y from a multinomial distribution assuming that
and
We conducted the simulations with 1000 runs in all studies. In Studies 1 and 2, the estimation for log odds \( log\frac{Pr\left(Y\le k\right)}{Pr\left(Y>k\right)} \) equals \( {\widehat{\alpha}}_k+{\widehat{\beta}}_1{x}_1+{\widehat{\beta}}_2{x}_2+{\widehat{\beta}}_3{x}_3+{\widehat{\beta}}_4{x}_4 \), for kâ€‰=â€‰1, 2, 3. In Studies 3 and 4, the estimation for log odds \( log\frac{Pr\left(Y=k\right)}{Pr\left(Y=4\right)} \) equals \( {\widehat{\alpha}}_k+{\widehat{\beta}}_{1,k}{x}_1+{\widehat{\beta}}_{2,k}{x}_2+{\widehat{\beta}}_{3,k}{x}_3+{\widehat{\beta}}_{4,k}{x}_4 \), for kâ€‰=â€‰1,â€‰2,â€‰3. Table 1 presents the results for Studies 1 and 2 and Table 2 presents the results for Studies 3 and 4. We show the average biases (Bias) and standard errors (Se) for the estimates in both tables. Table 3 provides the passing rate of the proportional odds assumption (POA) test, the HL test and both (POA&HL) tests for Studies 1 and 2. Table 3 depicts the results of the EHL test in Studies 3 and 4 among 1000 runs. Figure 1 shows the box plots of AUC scores for the four studies.
Note that, as expected, all four studies show that the three grid methods and the corresponding centralized method produce identical results. Hence, each table or figure presents the common results for the three grid models and the corresponding centralized model.
Two examples
In addition to simulation studies, we used two split public datasets to test our core modelfitting algorithm. The purpose was to illustrate how our core gridfitting algorithm works. Note that these are not real multicenter studies but used for illustration purposes.
The first example is about the low birth weight dataset, which was obtained from Hosmer and Lemeshow [9] and contains 189 observations with 9 nonredundant variables. We picked 8 variables including AGE, RACE, SMOKE, PTL, HT, UI, FTV, BWT from the dataset, and reasonably modified several variables to create a new dataset as follows. RACE is a threecategory variable, replaced by two binary variables: OTHERvsWHITE and BLACKvsWHITE, respectively. PTL is the number of premature labors with values of 0, 1, etc., and was dichotomized into 0 and greater than 0. FTV is the number of physician visits, which is also dichotomized into 0 and greater than 0 as well. BWT is the birth weight in grams and it was categorized into 4 values (1, 2, 3 ,4) using cutoffs 3500, 3000 and 2500. AGE, SMOKE, HT, UI were kept as original, where AGE is continuous, SMOKE is binary, HT is binary variable for â€œHistory of hypertension", and UI is binary variable for â€œPresence of uterine irritability". We denote the new dataset as LBW.
To test the grid model fitting, we randomly picked 95 observations from LBW to create dataset LBW1 and the rest 94 observations to create LBW2. BWT is chosen as the 4category response variable and the rest are covariates. Since the response is ordinal, we fitted a grid ordinal logistic model without pooling LBW1 and LBW2. Suppose the fitted value for \( log\frac{Pr\left(BWT\le k\right)}{Pr\left(BWT>k\right)} \) (kâ€‰=â€‰1,â€‰2,â€‰3) is
Table 4 shows the model coefficient estimates (Est) and their standard errors (Se), with zvalues (Zval) equal to the ratios of Est values over according Se values and pvalues (Pval) to test whether Zval is significantly different than 0.
The grid proportional odds assumption test was also performed and resulted in a pvalue of 0.366. Hence, there is no evidence to show that the assumption for the ordinal logistic model was invalid. To justify the grid model fitting, the ordinal logistic model was also fitted for LBW1 and for LBW2, separately. Grid AUC score (GAUC), AUC score for the model fitted by LBW1 (AUC1), and AUC score for the model fitted by LBW2 (AUC2) were all evaluated by 10fold cross validation:
Note that in this example the data are randomly split so every subset has the same underlying population. Hence, small AUC values only result from smaller sample sizes (in subgroups). In addition, a grid HL test for grid model and HL tests for two separate models were performed using 10fold cross validation with the same data partitions. Unfortunately, none of these models passed the HL test. This may be related to nonlinear effects of the continuous variable age, or to omitted interaction terms. However, as shown in simulation studies, failing to pass the HL test does not necessary mean the goodnessoffit of these models are very poor.
The second example is about Mammograph experience data, which was also obtained from Hosmer and Lemeshow [9] and contains 412 observations with 6 variables. We kept the original dataset and only replaced multicategory variables by multiple binary variables. The generated new dataset was denoted as MAM and contained 9 variables: ME, SYMPT1, SYMPT2, SYMPT3, PB, HIST, BSE, DETC1 and DETC2. ME denotes mammograph experience with â€œ3â€‰=â€‰never", â€œ2â€‰=â€‰within a year" and â€œ1â€‰=â€‰over a year ago". Original SYMPT was a 4category variable and denoted the 4 responses to â€œyou do not need a mammograph unless you develop symptoms" from â€œstrongly agree" to â€œstrongly disagree". It was replaced by binary variables SYMPT1, SYMPT2 and SYMPT3. PB is a continuous variable for the degree of â€œperceived benefit of mammography". HIST is a binary variable for the response to whether â€œmother or sister has breast cancer history". BSE is the binary response to â€œHas anyone taught you how to examine your own breasts?". Original DECT was a 3category variable and the response to â€œHow likely is it that a mammogram could find a new case of breast cancer?". It was replaced by binary variables DECT1 and DECT2.
We first randomly picked 206 observations from MAM to create dataset MAM1, and used the remaining 206 to create MAM2. We used ME as the response. The multinomial logistic model was used to fit the dataset. We fitted a grid multinomial model without pooling MAM1 and MAM2. For kâ€‰=â€‰1,â€‰2, suppose the fitted value for \( log\frac{Pr\left( ME=k\right)}{Pr\left( ME=3\right)} \) is
Table 5 shows the model coefficient estimates (Est) and their standard errors (Se), with zvalues (Zval) and pvalues (Pval).
To justify the grid model fitting, a multinomial logistic model was also fitted for MAM1 and for MAM2, separately. However, both separate models produced invalid estimates (with very large standard errors). The invalid estimates are probably due to the small number of subjects with MEâ€‰=â€‰2 after splitting the dataset, and the large number of parameters. This obviously shows the need for grid model fitting based on datasets MAM1 and MAM2 when they are not allowed to be pooled. Tenfold cross validation was used to evaluate extended AUC score and we performed the extended HL test for the grid fitted model. The grid AUC score was 0.626 and the grid fitted model passed the extended HL test.
Discussion
While our focus was on multicategory logit models, the grid MLE method is applicable for grid computing for various likelihood type estimation problems including other generalized linear models and generalized estimating equation models. However, when the likelihood is not separable for observations, then grid MLE may not work. For example, the Cox proportional hazards regression adopts a profile likelihood that cannot be split by observations. Hence, more effort is necessary to design a grid model for Cox proportional odds regression, which was discussed in our recent publication [19].
For the proposed grid HL test and grid AUC, Y values are pooled directly and not protected. To protect Y, the patient outcome values, we could adopt the methods proposed by Wu et al. [6] for the Grid HL test and the AUC score calculation, which avoid exchanging Y values. These methods are accomplished through using transmitted locally predicted probabilities and their orders. Details are given in Algorithms 1 and 2 in Wu et al. [6].
In practice, the grid model fitting using multisite data is more complicated than what is described in this manuscript (we focused on the model fitting step). Very often, it is necessary to conduct data preprocessing before the model fitting. For example, gender may use a coding method in different sites. Hence, data harmonization is necessary before the grid model can be fitted. Another issue is missing data. One way to mitigate the problem is to deal with missing data during the preprocessing step using the same grid protocol across all sites. Another approach is to handle missing data in the grid modelfitting step, which would be cumbersome. Additionally, sometimes there are too many variables to fit the model; variable selection may thus be needed. Variable selection usually requires the construction of models and it can be incorporated into the modelfitting step. Different sites may have different variables, so choosing and harmonizing the values of common variables needs to be done before the modelfitting step. For the proposed grid models, we assumed that the data were uniformly distributed across local clinical sites, and treated the data from each local site as a random sample from the whole dataset. However, this assumption may not hold and we will consider cluster effects from different sites in our future work. We described (on page 4) that steps 1 and 2 for the grid modelfitting step need to be repeated until convergence. Each site needs to send the first derivative and Hessian matrices multiple times, which means that a reliable data transmission function is necessary for successfully fitting a grid model. Recently we produced a reliable webservice called WebGLORE for binary logistic grid fitting [20]. In our setting the data transmission was adequate but there may be settings in which this may not be the case.
Conclusion
In the proposed grid methods, individuallevel observation data were never shared during the model fitting process. This offers a practical solution for mitigating privacy issues caused by pooling all data into a central site. Grid ordinal and multinomial logistic models were introduced in detail. In terms of increasing sample sizes, grid computing is more valuable for multicategory response logistic model than it is for binary logistic regression, since the larger number of coefficient estimates in multicategory models obviously require more observations. A small sample size might result in estimations with very large bias or standard error. The ordinal logistic model was proposed to only address the ordinal response data. The multinomial logistic model is used to deal with nominal response data, which requires even more coefficients and hence more observations for proper estimation when compared to the ordinal logistic model. The theory guarantees that the proposed grid Newton method achieves accurate estimation, which is the same as the one of the classical centralized Newton method. This is consistent with simulation study results. As shown in the simulation studies, the HL test and its extension might be too strong for assessing model fit and might produce false significant test results. These are limitations for the HL test, which are discussed by Vittinghoff et al. [21]. Hence, other model fit assessment methods introduced by Hosmer and Lemeshow [9] could be used in addition to the extended HL test for the multinomial logistic model, and other methods for binary logistic model fit assessment could be used in addition to the HL test for the ordinal logistic model.
References
OhnoMachado L, Agha Z, Bell DS, Dahm L, Day ME, Doctor JN, et al. pSCANNER team: patientcentered Scalable National Network for Effectiveness Research. J Am Med Informatics Assoc. 2014; 21:amiajnlâ€“2014. doi:10.1136/amiajnl2014002751
Crandall W, Kappelman MD, Colletti RB, Leibowitz I, Grunow JE, Ali S, et al. ImproveCareNow: The development of a pediatric inflammatory bowel disease improvement network. Inflamm Bowel Dis. 2011;17:450â€“7. doi:10.1002/ibd.21394.
Kamath PS, Kim W. The model for endstage liver disease (MELD). Hepatology. 2007;45:797â€“805.
Kennedy RL, Burton AM, Fraser HS, McStay LN, Harrison RF. Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur Hear J. 1996;17:1181â€“91.
Boxwala AA, Kim J, Grillo JM, OhnoMachado L. Using statistical and machine learning to help institutions detect suspicious access to electronic health records. J Am Med Inf Assoc. 2011;18:498â€“505.
Wu Y, Jiang X, Kim J, OhnoMachado L. Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;2012:758â€“64. doi:10.1136/amiajnl2012000862.
Wang S, Jiang X, Wu Y, Cui L, Cheng S, OhnoMachado L. EXpectation Propagation LOgistic REgRession ( EXPLORER ): Distributed PrivacyPreserving Online Model Learning. J Biomed Inform. 2013;46:480â€“96.
McCullagh P. Regression Models for Ordinal Data. J Royal Stat Soc Series B. 1980;42:109â€“42.
Hosmer DW, Lemeshow S. Applied logistic regression. New York: WileyInterscience 2000. http://books.google.com/books?hl=en&lr=&id=Po0RLQ7USIMC&oi=fnd&pg=PA1&dq=Applied+logistic+regression&ots=Dn7Usc1kAR&sig=vR7mj7OsZ8DMsnvS19BsT30Ad8c (accessed 15 Mar2012).
Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30:1145â€“59.
Hand DJ, Till RJ. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn. 2001;45:171â€“86.
Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW. Discrimination ability of prediction models for ordinal outcomes: Relationships between existing measures and a new measure. Biometrical J. 2012;54:674â€“85.
Yang H, Carlin D. ROC surface: a generalization of ROC curve analysis. J Biopharm Stat. 2000;10:183â€“96.
Dreiseitl S, OhnoMachado L, Binder M. Comparing threeclass diagnostic tests by threeway ROC analysis. Med Decis Mak. 2000;20:323â€“31.
Brant R. Assessing Proportionality in the Proportional Odds Model for Ordinal Logistic Regression. Biometrics. 1990;46:1171â€“8.
Williams R. Generalized ordered logit/partial proportional odds models for ordinal dependent variables. Stata J. 2006;6:58â€“82.
Fagerland MW, Hosmer DW. A goodnessoffit test for the proportional odds regression model. Stat Med. 2013;32:2235â€“49.
Fagerland MW, Hosmer DW, Bofin AM. Multinomial goodnessoffit tests for logistic regression models. Stat Med. 2008;27:4238â€“53.
Lyles RH. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. J Am Stat Assoc. 2006;101:403â€“4.
Lu C, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, et al. WebDISCO: a Web service for DIStributed COx model learning without patientlevel data sharing. In: Translational Bioinformatics Conference (accepted). 2014.
Jiang W, Li P, Wang S, Wu Y, Xue M, OhnoMachado L, et al. WebGLORE: a web service for Grid LOgistic REgression. Bioinformatics. 2013;29:3238â€“40. doi: 10.1093/bioinformatics/btt559.
Acknowledgements
We owe thanks to the Editor and two reviewers for their helpful and constructive comments and suggestions that helped improve the manuscript from earlier versions.
Publication of this article has been funded in part by NIH grants U54HL108460, K99HG008175, R00LM011392, R21LM012060, and PCORI contract CDRN130604819.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authorsâ€™ contributions
YW drafted the majority of the manuscript and developed the models. XJ and SW provided detailed edits and discussion about the proposed model. WJ and PL helped on the implementation. LOM guided the experimental design and provided detailed edits to the manuscript. All authors read and approved the final manuscript.
Additional file
Additional file 1:
Gradients and Hessian matrices. In this file we provide the gradients and the Hessian matrices for all loglikelihood functions used in this manuscript.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Wu, Y., Jiang, X., Wang, S. et al. Grid multicategory response logistic models. BMC Med Inform Decis Mak 15, 10 (2015). https://doi.org/10.1186/s129110150133y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s129110150133y