Adapted time-varying covariates Cox model for predicting future cirrhosis development performs well in a large hepatitis C cohort

Background Patients with hepatitis C virus (HCV) frequently remain at risk for cirrhosis after sustained virologic response (SVR). Existing cirrhosis predictive models for HCV do not account for dynamic antiviral treatment status and are limited by fixed laboratory covariates and short follow up time. Advanced fibrosis assessment modalities, such as transient elastography, remain inaccessible in many settings. Improved cirrhosis predictive models are needed. Methods We developed a laboratory-based model to predict progression of liver disease after SVR. This prediction model used a time-varying covariates Cox model adapted to utilize longitudinal laboratory data and to account for antiretroviral treatment. Individuals were included if they had a history of detectable HCV RNA and at least 2 AST-to-platelet ratio index (APRI) scores available in the national Veterans Health Administration from 2000 to 2015, Observation time extended through January 2019. We excluded individuals with preexisting cirrhosis. Covariates included baseline patient characteristics and 16 time-varying laboratory predictors. SVR, defined as permanently undetectable HCV RNA after antiviral treatment, was modeled as a step function of time. Cirrhosis development was defined as two consecutive APRI scores > 2. We predicted cirrhosis development at 1-, 3-, and 5-years follow-up. Results In a national sample of HCV patients (n = 182,772) with a mean follow-up of 6.32 years, 42% (n = 76,854) achieved SVR before 2016 and 16.2% (n = 29,566) subsequently developed cirrhosis. The model demonstrated good discrimination for predicting cirrhosis across all combinations of laboratory data windows and cirrhosis prediction intervals. AUROCs ranged from 0.781 to 0.815, with moderate sensitivity 0.703–0.749 and specificity 0.723–0.767. Conclusion A novel adaptation of time-varying covariates Cox modeling technique using longitudinal laboratory values and dynamic antiviral treatment status accurately predicts cirrhosis development at 1-, 3-, and 5-years among patients with HCV, with and without SVR. It improves upon earlier cirrhosis predictive models and has many potential population-based applications, especially in settings without transient elastography available. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01711-7.


Background
Approximately 20-25% of patients with untreated hepatitis C virus (HCV) infection will progress to cirrhosis within 30 years. Though individual rates of progression vary depending on comorbidities and other risk factors [1], modern direct acting antiviral (DAA) medications have outstanding efficacy and can eradicate HCV in Open Access nearly all cases [2]. Viral eradication clearly reduces progression to cirrhosis and lowers the risk of mortality [3][4][5][6]. However, the rate of hepatic fibrosis regression varies between individuals. In some, liver disease may continue progressing even after successful antiviral treatment, particularly given the emergence of non-alcoholic fatty liver disease [7]. Hepatic fibrosis and necroinflammation are powerful predictors of future disease progression [8]. Liver biopsy, historically the criterion standard for assessment of hepatic fibrosis, is invasive, costly, and associated with complications, making it impractical for routine monitoring of all patients with HCV [9]. Transient elastography, while promising, is not universally available and may not be obtainable in low-resource settings [10]. Many cross-sectional studies have tried to use laboratory data or other non-invasive methods to stage hepatic fibrosis in individuals with HCV at a single time point, yet few have aimed to predict future liver disease progression and none have incorporated dynamic antiviral treatment status into predictive models [11]. As a result, available cirrhosis prediction models have unknown generalizability to the expanding group who achieve sustained virologic response (SVR) after HCV antiviral therapy. No available laboratory-based models accurately predict the risk of progression to cirrhosis after SVR, with the result that life-threatening liver disease complications, such as hepatocellular carcinoma or esophageal varices, could develop and progress undetected after antiviral treatment.
The few cirrhosis predictive models that do exist have methodologic limitations and only achieve marginal discrimination in predicting fibrosis progression [12][13][14]. Most importantly, earlier models using traditional regression-based methods reduced laboratory data to a single value (e.g., baseline value, mean, maximum, minimum, etc.) [12][13][14][15]. This approach obscures the trajectory of key laboratory data, often an important clue to the ongoing development of cirrhosis [14]. We sought to develop an improved cirrhosis prediction model using survival analysis, an ideal technique given the variable length of time until development of cirrhosis, while also incorporating the full spectrum of laboratory data and HCV treatment status.

Study population and data collection
We obtained data from the VHA Corporate Data Warehouse, a continually updated electronic repository of demographic, laboratory, pharmaceutical, and other clinical data for Veterans under VHA care. We identified all patients in the VHA system with a history of HCV, defined as the lifetime presence of at least one positive HCV RNA test from January 1, 2000, to January 1, 2016 (n = 280,494). We defined HCV treatment as the receipt of at least one dose of an antiviral medication approved by the US Food and Drug Administration for the treatment of HCV on or before December 31, 2015. SVR was defined as of December 31, 2016, as the permanent absence of detectable HCV RNA after antiviral treatment. Patients were followed for the development of cirrhosis or death through January 1, 2019. We required patients to have at least two AST-to-platelet ratio index (APRI) scores (n = 231,566). APRI is a widely used noninvasive method for assessing fibrosis stage among patients with HCV, with excellent accuracy in detecting advanced fibrosis and cirrhosis. We defined APRI using the standard formula APRI = 100*(AST (U/L)/40)/platelet count (1000/µL) [16]. Component laboratory values were required to be drawn within 30 days of one another and could occur in inpatient or outpatient settings.
Cohort entry was defined as the date of the first APRI. Time-zero was defined as the time of entry into the cohort (Fig. 1). We excluded patients with known or suspected cirrhosis at baseline or a history of hepatocellular carcinoma, defined by relevant International Classifications of Diseases (ICD) codes prior to or within 1-year after cohort entry (n = 18,650) (Additional file 1: Table S1) or baseline APRI > 2.0 (n = 30,144) [16]. Our final cohort contained 182,772 patients with both HCV and at least two APRI scores who were classified as noncirrhotic at baseline (Fig. 2). The study was reviewed by the Institutional Review Board of the Ann Arbor VA Healthcare Systems and was granted a waiver of informed consent.

Predictor variables
Predictors of interest were selected a priori based on our prior work, biological plausibility, and expert clinician opinion [13][14][15]. Demographic variables included age at cohort entry, sex, race, and Hispanic ethnicity. SVR was modeled as a step function of time whereby the variable value remained 0 until antiviral treatment, at which point it became 1. Laboratory predictors included aspartate aminotransferase (AST), alanine aminotransferase (ALT), AST/ALT ratio, albumin, total bilirubin, creatinine, blood urea nitrogen, glucose, hemoglobin, platelet count, white blood cell count, sodium, potassium, and chloride. INR and total protein were excluded due to a large baseline degree of missingness (50% missing and 17% missing at baseline, respectively). We used all available laboratory data points for each patient. We modeled each longitudinal laboratory predictor using a stepwise function where the value between two consecutive time points was imputed by the lab value measured at the previous time point. Specifically, if we did not have a lab measured at time-zero, we imputed the missing value with the median of the variable of all measured values of all patients at time-zero. After time-zero, any lab values missing during the accrual window (2-or 4-year window) were imputed by the closest last measured value prior to the missing value. We considered including additional comorbidities such as alcohol use and diabetes in the model. However, in prior work we found that these additional characteristics did not significantly contribute to prediction of cirrhosis after accounting for longitudinal laboratory results (e.g., AST, glucose) already included in our models [15]. In the same earlier study, we systematically evaluated a variety of parameters for body mass index (BMI) (e.g., most recent, minimum, maximum) and found they ranked at or near the bottom of statistical importance relative to the laboratory variables. Therefore, in the current study we report these patient characteristics but limited our models to laboratory data to enhance reproducibility across systems and avoided variables such as "alcohol use" which may be documented inconsistently and depend on the accuracy of patient reporting as well as the definitions and diagnostic criteria used [15].

Outcome variable
We defined our primary outcome, cirrhosis development, as two consecutive APRI scores > 2, as described in previous work by our group [15]. APRI has been previously validated against liver biopsy in patients with HCV and has outstanding discrimination based on area under receiver operating curve (AUROC), with performance similar to transient elastography for detecting cirrhosis [17]. Furthermore, APRI is less sensitive to the effects of age than other non-invasive markers of fibrosis, such as the FIB-4 index, and performs at least as well as FIB-4 in predicting cirrhosis after SVR [18,19]. The observation period for each patient started on the date of the first recorded APRI and ended with occurrence of cirrhosis or censoring due to death or loss to VHA follow-up.

Statistical analysis
Time-varying covariates Cox model: In classical time-varying covariates Cox models, prediction of an outcome (or "event") in the future is not possible because computation of the survival probability at a future time would require future knowledge of covariate values. In order to predict future cirrhosis using a time-varying covariates Cox model, we redefined the notion of an event in survival analysis. Traditionally, an event consists of the occurrence of an outcome at the current time. We changed its definition as occurrence of cirrhosis K (1-, 3-, 5-years) years after the accrual time, where K is the length of prediction window to be specified by the user. The hazard function in our time-varying covariates Cox model characterizes the conditional probability that cirrhosis will subsequently develop after K years of additional followup, given no previous occurrence of cirrhosis (Fig. 1). Parameters in the model were estimated via maximizing the partial likelihood. We fit time-varying covariates Cox model using Survival R version 3.6.1 package (R Project for Statistical Computing).

Model evaluation
To evaluate the discriminative performance of the model, we used the area under the receiver operating characteristics curve (AUROC) to compare the predicted probability of developing cirrhosis with each patient's observed outcome (an AUROC of 1.0 represents perfect discrimination). We predicted cirrhosis development in 1-, 3-, and 5-years using laboratory data accrual windows of 2-and 4-years after first APRI. For example, if we had 4-years of lab data, we used the first 2-years of data in the accrual window to make a prediction in the subsequent years. Accrual windows of 2-and 4-years were used as a 2-year time period as an approximate assessment of a new patient trajectory coming into the health system and a 4-year period gives a more longitudinal view to capture long-term changes. Patients censored before 1-, 3-, or 5-years were removed since their true outcomes were not available. We evaluated each prediction setting in terms of specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV).

Training and testing cohorts
We created model training and testing datasets by randomly splitting the sample into 70% and 30% subsets. The random splitting process was performed 30 times to produce a more stable evaluation and to generate confidence intervals. Under each split, the time-varying covariates Cox model was fitted on the training set and evaluated on the testing set. The AUROC measures for all outcome windows and predictor windows were averaged over 30 splits. We report the representative split of training and testing data with the AUROC closest to the average AUROC over 30 splits. The best cut-off was selected by choosing the point on the ROC curve closest to where both sensitivity and specificity equal one. Specifically, we the cut-off where (1-sensitivity) ^2 + (1-specificity) ^2 is minimized.

Sensitivity analysis
For prediction of cirrhosis, we used an APRI cut-off > 2 as our primary outcome to maximize specificity and PPV; however, we also performed a sensitivity analysis using APRI > 1 given variation in thresholds across prior studies.

Model performance
We predicted cirrhosis development at 1-, 3-, and 5-years using a laboratory covariate time window of 2 and 4 years, respectively. The average, standard deviation, and 95% confidence intervals for AUROC over 30 random splits are summarized in Table 2. The misclassification results for all 6 combinations of outcome prediction windows and covariate time windows are shown in Table 3.
To investigate the effect and significance of each predictor, we fit the 1-, 3-and 5-year outcome prediction model on the full cohort of data. The summary of model fitting is shown in Additional file 1: Tables S2-S4. The p values

Discussion
A Cox model using time-varying covariates and a flexible time accrual window for longitudinal laboratory data achieved excellent discrimination for cirrhosis prediction at 1-, 3-, and 5-years among patients with HCV. Our study is the first to successfully use a large administrative dataset with a time-varying covariates model to predict future cirrhosis outcomes in HCV patients with and without SVR. This approach achieved high AUROCs for predicting the development of cirrhosis, as assessed by serial APRI score, and performed well at up to five years compared to previous models that were limited by fixed laboratory covariates and shorter follow up time [15]. We developed a novel approach to prediction by transforming longitudinal laboratory variables into timevarying covariates, allowing us to use each patient's full spectrum of laboratory data instead of reducing the laboratory data to summary values. Unlike earlier models constructed exclusively for patients with viremic HCV, we included antiviral treatment as a time-varying covariate. Our model is therefore generalizable to both treated and viremic patients with HCV. All six combinations of laboratory data windows (2-or 4-years) and cirrhosis prediction windows (1-, 3-, or 5-years) produced excellent AUROCs. Taken together, our method accurately predicted risk of cirrhosis without inducing obvious bias due to the selection of the prediction window length.
Our study benefited from a very large HCV population drawn from the VHA healthcare network, which oversees the largest single cohort of patients with HCV in the US. We had access to comprehensive laboratory, demographic, and pharmacy data for all patients. VHA users tend to be older and more likely to be male than the general US population, so results should be extrapolated cautiously to other cohorts. Our conclusions are tempered by the use of a laboratory surrogate (two consecutive APRI scores > 2) to mark the development of cirrhosis rather than liver biopsy or transient elastography results, though prior studies have confirmed APRI as an excellent surrogate for biopsy-proven cirrhosis [17]. We selected this method due unknown validity of transient elastography values after HCV treatment, and the small proportion undergoing serial liver biopsy after antiviral therapy. In addition, we sought a surrogate cirrhosis endpoint that would be practical for others to replicate in administrative datasets and in resource-limited settings. Nevertheless, although APRI is considered a reliable laboratory marker of cirrhosis, a small amount of cirrhosis misclassification likely occurred. As a linear model, the time-varying covariates Cox model can only reflect a linear effect between the predictors and the outcome and therefore may not fully represent a non-linear relationship. We note that approximately 30% of the treated patients in our cohort received an interferon-based regimen due to the time period involved. Though such regimens are obsolete, there is no scientific reason to suspect that the type of regimen used would alter the risk of subsequent cirrhosis development after SVR or change the conclusions of the study. Finally, our data sources lacked results for laboratory testing or antiviral treatment conducted outside the VHA system. This model may not be generalized to non-Veteran populations and future external validation studies are needed to assess performance.

Conclusions
Our model has many potential applications for predicting cirrhosis given the expanding population of patients with HCV now achieving SVR after antiviral treatment. For example, as more HCV patients successfully achieve SVR, practitioners will need tools to identify those at continued risk for cirrhosis despite antiviral therapy. Incorporating predictive models into HCV registries or