Concordance and predictive value of two adverse drug event data sets

Background Accurate prediction of adverse drug events (ADEs) is an important means of controlling and reducing drug-related morbidity and mortality. Since no single “gold standard” ADE data set exists, a range of different drug safety data sets are currently used for developing ADE prediction models. There is a critical need to assess the degree of concordance between these various ADE data sets and to validate ADE prediction models against multiple reference standards. Methods We systematically evaluated the concordance of two widely used ADE data sets – Lexi-comp from 2010 and SIDER from 2012. The strength of the association between ADE (drug) counts in Lexi-comp and SIDER was assessed using Spearman rank correlation, while the differences between the two data sets were characterized in terms of drug categories, ADE categories and ADE frequencies. We also performed a comparative validation of the Predictive Pharmacosafety Networks (PPN) model using both ADE data sets. The predictive power of PPN using each of the two validation sets was assessed using the area under Receiver Operating Characteristic curve (AUROC). Results The correlations between the counts of ADEs and drugs in the two data sets were 0.84 (95% CI: 0.82-0.86) and 0.92 (95% CI: 0.91-0.93), respectively. Relative to an earlier snapshot of Lexi-comp from 2005, Lexi-comp 2010 and SIDER 2012 introduced a mean of 1,973 and 4,810 new drug-ADE associations per year, respectively. The difference between these two data sets was most pronounced for Nervous System and Anti-infective drugs, Gastrointestinal and Nervous System ADEs, and postmarketing ADEs. A minor difference of 1.1% was found in the AUROC of PPN when SIDER 2012 was used for validation instead of Lexi-comp 2010. Conclusions In conclusion, the ADE and drug counts in Lexi-comp and SIDER data sets were highly correlated and the choice of validation set did not greatly affect the overall prediction performance of PPN. Our results also suggest that it is important to be aware of the differences that exist among ADE data sets, especially in modeling applications focused on specific drug and ADE categories.


Background
Predictive modeling of adverse drug events (ADEs) is attracting growing interest. The high ADE-related costs in the US have been known for many years [1,2], and recent studies conducted in other developed countries provide further motivation for the importance of this problem worldwide [3][4][5]. To address these large and growing ADE-related economic and public health concerns, a wide array of ADE identification and prevention methods have been implemented. These include early-stage drug toxicity prediction and testing [6][7][8], clinical trials for evaluating a drug's safety profile, and postmarket surveillance methods for detecting abnormally high ADE rates [9,10]. Still, many types of ADEs can go undetected for years after a drug has been on the market [11,12], necessitating constant additions of label warnings and, in extreme cases, drug withdrawals [13,14]. At the same time, toxicity and clinical safety concerns remain lead causes of the high attrition rates in the drug development process [15], as the cost of bringing New Molecular Entities (NMEs) to the market continues to increase [16]. All these factors have spurred a noticeable expansion in research on adverse event prediction -research that critically relies on good data. The emerging field of system pharmacology is being increasingly recognized as a promising new approach for predicting ADEs [17]. System pharmacological approaches typically rely on the integration of various diverse types of data, such as chemical, biological and taxonomic, followed by the application of quantitative models to extract information from these data. Often times, the data are represented and integrated through network models [18,19]. In recent years, a number of system pharmacology predictive models for ADEs have been proposed [20][21][22][23][24][25][26][27].
While all these predictive approaches rely critically on "known" drug-ADE associations, there is currently no "gold standard" source for drug safety data. As a result, several different ADE data sets have been used to develop predictive pharmacological models. Often, the drug-ADE associations listed in these data sets are primarily extracted from drug package inserts. This is the case, for instance, with SIDERa widely used public database, and Lexi-compa widely used commercial database. Alternatively, the listed drug-ADE pairs may be extracted from post-marketing databases, such as FAERS (formerly AERS; http://www.fda.gov/Drugs). Thus, the drug safety data used to train the above models may contain drug-ADE associations supported by strong evidence (e.g. associations for whom a causal link between the drug and ADE has rigorously been established) as well as associations supported by weaker evidence (e.g. associations based solely on post-marketing reports).
Recent work has highlighted various types of inconsistencies in the reporting of drug safety information, including discrepancies among the reports for bioequivalent drugs, or reports used in different countries [28][29][30]. However, systematic comparisons of the major safety data sets used in system pharmacological models and assessment of the impact of data-set choice on prediction performance are lacking. In this paper, we systematically compare the Lexi-comp and SIDER ADE data sets. While the choice of an ADE data set can also have diverse clinical and economic consequences, we focus on its implications for predictive models. As a case study, we use the Predictive Pharmacosafety Networks (PPNs) model. Figure 1 shows an overview of the study framework. First, we integrated data from multiple sources, including data on drug-ADE associations from two snapshots of Lexi-comp and one snapshot of SIDER, drug and ADE taxonomies, and intrinsic drug properties. Next, we carried out a number of steps to standardize and integrate these data, including mapping the Lexi-comp and SIDER ADE names to MedDRA High Level Terms (HLTs), standardizing the drug names in Lexi-comp and SIDER, and constructing bi-partite network representations of the drug-ADE associations. Next, we assessed the strength of the association between the counts of  Figure 1 Overview of the study framework. First, data were integrated from multiple sources, including data on drug-ADE associations from two different sources (Lexi-comp and SIDER), drug and ADE taxonomies, and intrinsic drug properties. Next, a number of steps to standardize and integrate these data were carried out. The strength of the association between the counts of ADEs and drugs in Lexi-comp and SIDER data sets and the differences between these two data sets were assessed. Finally, a PPN model was trained using a 2005 version of Lexi-comp and validated using both Lexi-comp 2010 and SIDER 2012 as reference standards.

Framework overview
ADEs (per drug) and drugs (per ADE) in Lexi-comp and SIDER data sets. We also identified the difference between the two data sets and characterized it in terms of drug categories, ADE categories and ADE frequencies.
Finally, we trained a PPN model using a 2005 version of Lexi-comp data and validated the prediction performance of PPN using both Lexi-comp 2010 and SIDER 2012 as reference standards.

Data description
The NCGC Pharmaceutical Collection (NPC) resource [31] was used to identify different forms of drug names that refer to a common "active pharmaceutical ingredient". The Lexi-comp ADE data were extracted from Lexi-Drugs® (http://www.lexi.com), a commercial database widely used in hospitals today. For this study, we had access to data on 809 Lexi-comp drugs. For each drug, we were provided with two text fields extracted, respectively, from 2005 and 2010 versions of Lexi-Drugs®. Each of these text fields integrates ADE information from drug package inserts, relevant clinical trials, case studies and post-marketing reports. We extracted the ADE names contained in each field and mapped them to MedDRA Preferred Term (PT) level. In order to compress the space of all possible drug-ADE pairs, each PT was further mapped to one or more High-Level Terms (HLT) in the MedDRA hierarchy [21]. As an example, the PT "myocardial infarction" was mapped to the following two HLTs: "ischaemic coronary artery disorders", and "coronary necrosis and vascular insufficiency". Since several different PTs could typically map to the same HLT, this mapping compresses the space of possible drug-ADE pairs, thereby reducing the computational time needed to train and validate the prediction model. The main trade-off of such mapping is that more complex follow-up investigations would be needed to evaluate a predicted drug-HLT pair because only a subset of the PTs mapping to the predicted HLT may actually be associated with the drug. After these pre-processing steps, the final Lexi-comp data used in our study consisted of two lists of pairs of the form (drug name, HLT name)one list corresponding to 2005 and another to 2010.
The SIDER data used here is the most recent version of the publicly available SIDER2 database, released in October 2012. We downloaded these data from the FTP site ftp://sideeffects.embl.de/SIDER/2012-10-17/. Although earlier versions of SIDER going back to 2009 are available, SIDER2 is the first version of the database providing MedDRA-coded ADEs, making it suitable for this comparative study. SIDER integrates ADE information contained in package inserts and post-marketing reports using five main public sources: British Columbia Cancer Agency (www.bccancer.bc.ca), Facts@FDA (http://www.fda.gov), FDA Center for Drug Evaluation and Research (http://www.fda.gov), FDA Med-Watch (http://www.fda.gov/) [32].
To enable the comparison of Lexi-comp and SIDER data, we standardized the drug names in both data sets using the NPC resource [31], and mapped SIDER ADE names from the Preferred Term level to the High-Level Terms level of MedDRA. The final pre-processed SIDER data used in our study consisted of a lists of pairs of the form (drug name, HLT name) corresponding to year 2012.

Overview of PPNs
Predictive Pharmacosafety Networks (PPN) [21] are predictive models that exploit the overall network structure of all known drug-ADE relationships and combine it with inherent attributes of drugs and adverse events in order to predict unknown adverse events. Rather than waiting for sufficient post-marketing evidence to accumulate for a specific ADE, this predictive approach relies on leveraging contextual information from previously known drug-safety relationships, and thus has the potential to predict certain candidate ADEs earlier than they can be detected by existing pharmacovigilance methods.
Here, we provide a brief overview of the PPN model; complete details of the model, including a full specification of the data sets used to train the model, are given in Cami et al. [21]. Using logistic regression, we model the presence or absence of drug-ADE associations Y ij , i = 1, …,number of drugs, j = 1,…,number of ADEs, as a Bernoulli random variable and a function of three types of covariates. Network covariates depend only on the structure of the bipartite drug-ADE network. Taxonomic covariates depend on the structure of the drug-ADE network and on ATC and MedDRA codes. Intrinsic covariates depend on the structure of the drug-ADE network and on the intrinsic drug properties. Model fitting is carried out by maximum likelihood. After the model is estimated, each drug-ADE pair (i, j) not reported to be an association in the training data is scored using the predicted probabilities generated by the model.
Cami et al. [21] used the Lexi-comp 2005 ADE data to form a bi-partite network that contained 39,591 links among 809 drugs and 852 HLTS. The drug and ADE attributes described above were integrated with the nodes of this network. Twelve predictor variables were then computed and a logistic regression (LR) model was estimated. This estimated LR model achieved an area under the Receiver Operating Characteristic curve (AUROC) of 0.87 in predicting the 10,845 drug-ADE associations that were newly reported in the 2010 version of Lexi-comp.
As case studies, eight prominent drug-ADE associations discovered during the period 2006-2010 were identified by two pharmacologists. For each case study, the specificity and the positive predictive value corresponding to the score generated by PPN were computed. It was found that the specificities corresponding to the model-generated scores were consistently high, providing additional support on the utility of the model. For example, the pair (norfloxacin, tendon ruptures) achieved a specificity of 0.95, while (zonisamide, suicidal ideation) a specificity of 0.93.

Comparative analysis
Since the Lexi-comp data available to us consisted of two snapshots from 2005 and 2010, ideally the comparison of Lexi-comp and SIDER would have been based on two SIDER snapshots from 2005 and 2010. However, the MedDRA-coded SIDER data available to us consisted of only one snapshot from 2012. Due to this restriction, we designed the comparative analysis as follows. We first identified the Lexi-comp 2010 drugs and ADEs from Cami et al. [21] study that were also included in SIDER 2012. Our goal was to assess the concordance of the sets of associations newly reported between these common drugs and ADEs in Lexi-comp 2010 and SIDER 2012, as well as the impact of data set choice on the prediction performance of PPN. In this analysis, we did not address any discordance between the sets of drug-ADE associations formed by drugs or ADEs that were included in only one of the two data sets.
In the first part of the comparative analysis, we assessed the strength of the association between ADE (or, drug) counts in Lexi-comp 2010 and SIDER 2012 by computing Spearman rank correlation. Next, we computed the difference between the Lexi-comp 2010 and SIDER 2012 data sets and characterized it in terms of drug categories, ADE categories and ADE frequencies.
As described earlier, both Lexi-comp and SIDER use package inserts as the primary source of information. However, we expected that these data could differ for a number of reasons. First, SIDER 2012 may include new drug-ADE associations that were discovered after 2010 and thus could not be reported in the Lexi-comp data sets. Second, both Lexi-comp and SIDER supplement the package-inserts with other information extracted from various sources, as described earlier. Third, the mapping of ADE names to MedDRA-which, was independently implemented in the two data setsis a non-deterministic process. There are numerous ADE names that appear in the original data sources for whom there is no exact match to MedDRA and for whom the most appropriate MedDRA code is determined either algorithmically or based on expert opinion.
In the second part of the comparative analysis, we trained a PPN model using Lexi-comp 2005 and then assessed the prediction performance of the model using Lexi-comp 2010 and SIDER 2012, respectively, as validation sets.
In the Discussion section, we explain how the data restriction mentioned earlier, i.e. the availability of only one SIDER snapshot taken at a different time point from either Lexi-comp snapshot, impacts the interpretation of our results.

Common drugs and ADEs
Out of 809 Lexi-comp drugs used in the study by Cami et al. [21], 695 (86%) were also included in SIDER; out of 852 HLTs used in [21], 765 (90%) were also included in SIDER. In the reminder of this section, we compare the drug-ADE associations reported in Lexi-comp 2010 and SIDER 2012 between these common 695 drugs and 765 HLTs.

Correlation between drug and ADE counts
The Spearman correlation between the ADE counts in the Lexi-comp 2010 and SIDER 2012 data sets was 0.84 (95% CI: 0.82-0.86) (Figure 2(A)), while the Spearman correlation between the drug counts in these two data sets was 0.92 (95% CI: 0.91-0.93) (Figure 2(B)). For comparison, we also computed the Spearman correlations of ADE and drug counts between Lexi-comp 2005 and Lexi-comp 2010 data sets. For these two data sets, the correlations were, respectively, 0.87 and 0.99. Figure 2 indicates that drug and ADE counts generated from SIDER 2012 are generally higher than the corresponding counts from Lexi-comp 2010. In aggregate, we found that relative to the Lexi-comp 2005 data snapshot, Lexi-comp 2010 and SIDER 2012 introduced new drug-ADE associations at a mean rate of 1,973 and 4,810 per year, respectively. To better understand this difference between Lexi-comp 2010 and SIDER 2012, we computed the percentage of SIDER-only associations in each ATC top-level category, each MedDRA top-level category, and each ADE frequency group ("postmarketing", "frequent", "rare", "potential", or exact number).

Differences between Lexi-comp and SIDER data sets
We found that the drug categories with the highest percentages of SIDER-only ADEs were drugs targeting the Nervous System and Antiinfective drugs (21.9% and 11.4%, respectively, Table 1), while the ADE categories with the highest percentages of SIDER-only drugs were Nervous System Disorders and Gastrointestinal Disorders (8.9% and 8.7%, respectively, Table 2). With regards to ADE frequency classes, we found that for 63% of SIDER-only associations the frequency of ADE was missing. Of the remaining SIDER-only associations, the type having the highest percentage was "postmarketing" (46%), followed by "infrequent" (16%), "exact number" (16%), "rare" (11%), "potential" (8%), and "frequent" (3%).  (Table 3). In fact, the relative change in AUROC was less than 5% for all but six SOCs which are rather general and not specifically related to a body organ or system: 1) Congenital, familial and genetic disorders (relative change 34%), 2) Surgical and medical procedures (relative change 20%), 3) General disorders (relative change 12%), 4) Injury, poisoning and procedural disorders (relative change 7%), 5) Investigations (relative change 7%), 6) Pregnancy, puerperium and perinatal conditions (relative change 5.1%).

Discussion
This study aimed to systematically assess the concordance between Lexi-comp and SIDER ADE data sets, as well as the impact of using each data set in the prediction performance of PPN model. Our main result was that ADE and drug counts in the Lexi-comp 2010 and SIDER 2012 data sets were highly correlated and that the AUROC of the PPN model changed very little (approximately 1.1%) when SIDER 2012 was used for validation instead of Lexi-comp 2010.  While we found overall concordance, there were also differences between the Lexi-comp 2010 and SIDER 2012 data sets. These differences were most pronounced for Nervous System and Anti-infective drugs, for Gastrointestinal and Nervous System ADEs, and for "postmarketing" ADEs. Our results suggest that the differences between the two data sets do not simply arise from the two-year time lag between them. Indeed, the correlations of drug and ADE counts were higher between the two Lexi-comp snapshots (separated by five years) than they were between Lexi-comp 2010 and SIDER 2012. Further, relative to Lexi-comp 2005, SIDER 2012 introduces new associations at a higher rate than Lexi-comp 2010. As discussed earlier, other factors that could have introduced these differences include the use of various sources to supplement package-insert information and the independent mapping of ADE names to MEdDRA.
The observations of high overall concordance between Lexi-comp and SIDER, and high robustness of the PPN model under two different validation sets, are not affected by the time lag between the two data sets. In fact, the presence of the time lag makes the concordance and robustness conclusions even stronger than they would be if the same results were obtained by comparing data from the same year. On the other hand, the differences between the two data sets should be interpreted with caution as it is not clear to what extent they are accounted for by the time lag and to what extent by other factors. The interpretation of these differences is also hindered by the high proportion of missing ADE frequency data (63%).
While this manuscript was under preparation, Lin et al. [27] published a new study in which they developed an "external link prediction" method for unknown drug-ADE associations. Using two snapshots of data based on the intersection of SIDER with FAERS 2005 and FAERS 2011, respectively, they carried out a simulated prospective validation of a subset of PPN covariates analogous to the validation by Cami et al. [21]. The training set in the study by Lin et al. consisted of 422 drugs and 462 ADEs. These authors found that in that data set, the chosen subset of PPN covariates achieved an AUROC of 0.75, while the "external link prediction" method achieved an AUROC of 0.83. Thus, the study by Lin et al. using a different ADE data source, different validation year and different drug and ADE sets provides  an independent confirmation of the robustness of PPN variables with respect to the choice of ADE data set. Recently, Tatonetti et al. [24] published a method to extract potentially significant drug-ADE associations from FAERS and a new accompanying data set of such associations (OFFSIDES). Similarly, Cheng et al. [25] developed a new drug-ADE data set named MetaADEDB by integrating information from SIDER, CTD (ctdbase. org), and OFFSIDES, and utilizing Medical Subject Headings (MeSH) to annotate compounds and diseases. We believe that these data integration, standardization and annotation efforts are important steps toward the development of improved reference standards for drug-ADE associations.

Conclusions
In summary, we have conducted a study that systematically compared two drug safety data sets and assessed the impact of data set choice on the prediction performance of the PPN predictive model. Overall, we found a high concordance between the two data sets and only a minor impact on the prediction performance of PPN. However, we also identified a number of key differences between the two data sets. We believe it is important for researchers, drug safety professionals and public health officials to be aware of such differences, especially in modeling applications aimed at specific drug and ADE categories, and a wide range of studies aimed at ADE prediction models.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions AC and BR acquired the data, developed the proposed methods and designed the study. AC implemented the methods and performed the experiments. AC and BR wrote the manuscript. Both authors read and approved the final manuscript.