Materials and data sources
The proposed strategy for detecting high-order directional DDI effects on ADEs was applied to a publicly available database, the FDA Adverse Event Reporting System (FAERS: https://open.fda.gov/data/faers/). Specifically, we apply our method on the myopathy event using ADE reporting records from FAERS, to investigate the directional effects of high-order DDI on myopathy.
Myopathy is a relatively frequent (around 3.64% in our dataset) and clinically important ADE, and has been listed as a side effect of more than 80 FDA approved drugs. Given its high frequency and close and complex associations of myopathy with drugs, it is appropriate to use myopathy-related events as testbed for investigating the performance of directional effects of high-order DDIs. Below, we describe the data preprocessing and present the summary statistics of the FAERS dataset we used in this study.
FAERS database
The data used for this analysis included reports from FAERS collected between Q1 2004 and Q3 2012. The FAERS is a database that contains information on adverse event and medication error reports submitted to FDA. Reports were obtained from the FAERS database, and preprocessed as described in [7]. Briefly, the most recent reports from each individual were extracted and organized as a list of records, where each record consisted of an ADE and corresponding administered drugs.
Myopathy-related case-control dataset
As this analysis focused on the myopathy-related ADE, we firstly derived ADEs grouped under “myopathy”. And then we assembled a case-control dataset by labeling record as “case” if the ADE was in “myopathy” group, and otherwise labeling record as “control”. To avoid the confusion between causal effect and bystander effect, we included only drugs with primary or secondary suspects, while removing the drugs that were concomitant or interacting. We use T to denote the set of all the records from the FAERS database, and use Tm and Tnm to denote the sets of case and control records, respectively. Finally, totally |T|=4,077,447 records were analyzed, including |Tm|=136,860 cases and |Tnm|=3,940,587 controls, and totally 1,763 unique FDA approved drugs (see Fig. 1a-b).
The number of drugs contained in a single record ranges from 1 to 103, with a mean of 2.98 drugs in each record in FAERS. However, the numbers of drugs taken between two groups are significantly different (independent T-test p-value <2.2E-16), with mean of 4.18 drugs taken in myopathy cases and 2.94 drugs taken in non-myopathy controls. When focusing on records having more than three drugs, 25.82% individuals from FAERS dataset are taking four or more drugs together, while the proportion changes to 36.27% in myopathy cases and 25.45% in non-myopathy controls. Significant difference is also observed between these two groups (independent T-test p-value <2.2E-16), giving a mean of 8.79 drugs in myopathy group and 7.30 drugs taken in non-myopathy group.
Methods for mining high-order directional dDI effects
We use DC to denote drug combination, and use sup(DC,T) to represent the support (i.e., count of occurrences) of DC in dataset T. To evaluate the risk of developing myopathy by adding drugs to existing drug combination, for example, taking DC2 = (Di+1,...,Dn) in addition to taking DC1 = (D1,...,Di), we formulate the problem as follows: 1) the baseline population is defined as those who take DC1 = (D1,...,Di), regardless of taking other drugs or not; 2) exposed population is defined as those who take DC2 in addition to DC1, say DC3, where DC3 = DC1∪DC2; and 3) unexposed population is defined as those who take DC1 but without taking at least one drug from DC2. See Fig. 1e-f for a schematic example.
Then we employ the odds ratio (OR) to measure the directional DDI effect of adding DC2 to existing DC1, by formulating the DDI effect problem to mining the association between myopathy event with exposure to drug combination. In practice, the OR compares the odds of exposure to DC2 among cases to the odds of exposure to DC2 in controls, within the baseline population who all take DC1. Accordingly, given an interested drug combination, we need to calculate the number of exposed and unexposed population in both cases and controls before the calculation of OR.
In the following sections, we organize and present the framework as follows. First, we describe the algorithm for constructing candidate drug combinations. Second, we present the algorithm for extracting supports of occurrence of drug combinations in case and control datasets. After that, we discuss the calculation of OR for estimating the directional effect of drug combinations. Finally we present the novel and scalable tool we developed for visualizing high-order DDIs. Figure 1 shows the workflow of this study.
Construct candidate drug combinations from T
We first created a set of drug combinations with their supports from our FAERS dataset T. To avoid the possible misleading results from low-frequent drug combinations, we restricted our analysis to the DCs with a minimum support of MinSup=250 records in T, named candidate drug combinations. Algorithm 1 summarized the procedures for constructing candidate drug combinations from T (see Fig. 1c).
Briefly, we applied Apriori, an influential algorithm for mining frequent itemsets to T, to discover frequent DCs with sup(DC,T)>MinSup that involved up to seven drugs. Apriori has been used in our previous work [7] for mining the frequent drug combinations from both T and Tm, using MinSup=1000 and MinSup=1 respectively. However, due to time and space complexities, our previous strategy could not generate drug combinations containing more than three drugs. Instead of applying Apriori on both T and Tm, we only employed it on T to generate candidate DCs.
Computing supports for case records and control records
For each candidate drug combination obtained from above, we would like to extract their counts of occurrence in both cases and controls, for constructing contingency table for OR estimation. As we mentioned before, the computational time and space of using Apriori on Tm to extract DCs with MinSup=1 limited our previous work to involve up to three drugs (see gray part in Fig. 1). In this work, we develop a more efficient strategy to calculate supports for only candidate DCs instead of mining all possible drug combinations appeared in Tm. Algorithm 2 describes how to extract the case and control supports from Tm and Tnm respectively (see Fig. 1d).
Estimating directional dDI effects
We organize the results from Algorithms 1 and 2, and construct a table of drug combinations (DC); see Fig. 1e. Each record in the table stores the counts of the corresponding DC in the entire studied FAERS set, the case subset and the control subset respectively. Base on this information, for each candidate DC, a contingency table is constructed and then used for OR calculation, where four counts a, b, c and d can be calculated as shown in Fig. 1f.
Figure 1e-g shows our procedure for estimating the directional effect of adding DC2 to DC1 on myopathy, including contingency table construction and OR calculation. The baseline population, exposed population and unexposed population are the sets of individuals who take DC1,DC3, and DC1 but without taking at least one drug in DC2, respectively. The numbers of exposed individuals with myopathy and non-myopathy can be directly extracted from Fig. 1e as follows: a=y3 exposed individuals with myopathy and b=x3−y3 exposed individuals with non-myopathy. The numbers of unexposed individuals with and without myopathy (i.e., c and d) can then be obtained by computing the difference between baseline and exposed populations. That is, unexposed individuals with myopathy are the individuals from baseline population but not in exposed population. Based on Fig. 1e, given y1 individuals in the baseline population with myopathy, the number of unexposed individuals with myopathy is c=y1−a=y1−y3. Similarly, the number of unexposed individuals with non-myopathy is d=(x1−y1)−b=(x1−y1)−(x3−y3).
With the above calculation, the OR estimation of directional effect of DC1 to DC3 on myopathy can be computed as follows:
$$ OR_{DC_{1}\rightarrow DC_{3}}=\frac{a/b}{c/d}=\frac{ac}{bd}. $$
(1)
Here ORs of the ADE for adding one to seven drugs are examined in this study.
Chi-square test is used in this work to evaluate the significance of associations between drug combination and myopathy ADE, were p-value and confidence interval corresponding to each odds ratio are obtained. Multiple comparison correction is further performed using the Bonferroni strategy.
Sunburst visualization for directional dDI findings
Another important aspect of DDI mining is the visualization. In our previous work, we proposed a tree structure to visualize the directional DDIs involving up to three drugs. However, the growth of the tree was exponential, making it infeasible to read for combinations involving four or more drugs. In this paper, we develop a novel tool to organize and visualize high-order directional DDIs using D3 sunburst diagram (https://d3js.org/).
Specifically, given a candidate drug combination S and a set C of all subsets of S, we organize the pair-wised relationship of elements C and arrange them into a series of circles in a hierarchical manner as shown in Fig. 1h. Each ring sector represents a drug combination, outer ring sectors radiated from which indicate the directional DDIs from inner to outer. The sector color indicates the effect size (i.e., OR value). In addition, we include the zooming function to enable more effective visualization via interactive exploration, where one can select a drug combination as baseline to (1) zoom in and see the details or (2) zoom out and see an overall picture.