#### Dual-tree complex wavelet decomposition

For a signal expressed as a function of time,

*t*, the wavelet transform is described by the following basis set:

Here,

*S* gives the wavelet’s width and

*l* gives its position. The ‘mother function’,

*Φ*, is a decaying wave-like function, altered to form the basis and subject to constraints that all members of the set are orthonormal, which provide a linearly independent set of functions. In Discrete Wavelet Transform (DWT), the scaling function, defined as follows, plays a central role in forming the basis.

where

*C*
_{
k
}‘s are the wavelet coefficients, and

*k* and

*M* stand for time-shift and signal length, respectively. Traditional DWT suffers from shift variance. Notably, multiple signal segments (one for each shock) are contributed by each subject. Shift variance can yield spurious features that have false correlations with outcomes. As such, the predictive model generalizes poorly, or put another way, is not discriminative. Complex Wavelet decomposition, under certain conditions, can be approximately shift-invariant without a considerable increase in computational complexity for low-dimensional signals; for our case, one-dimensional. Here, the mother function and scaling function, both have a real as well as a complex component.

Specifically, when

*Φ*
_{
r
} and

*Φ*
_{
i
} are Hilbert transform pairs, the decomposition coefficients approach the desired shift-invariant property. This version of Complex Wavelet Transform was implemented using a ‘dual-tree’ decomposition as previously proposed [

17]. Multiple attributes were then derived from the resulting coefficients at each level of decomposition, including mean, median, standard deviation, energy and entropy. Entropy was calculated as follows.

Here, V is the total number of unique discrete values that the signal takes, and C is the number of times the signal takes a particular value *i*.

#### RPD-PD through Non-linear Non-deterministic time-series analysis

FT, as utilized by others [

10], performs a linear transformation of a function space such that the original signal (function) is decomposed into multiple sinusoids that are globally averaged. Characterizing a short-term, non-stationary, pathological signal requires the assumptions of linearity and periodicity to be relaxed. Limitations of a Fourier based analysis have also been discussed in other studies [

11,

18]. As with most nonlinear time-series analyses, we begin by projecting our data

*x(t)* onto a state space

*p(t)*. Here, each dimension, of the state-space, itself represents a time-delay. The concept of recurrence [

19] can be interpreted as measuring the level of aperiodicity in the data.

Here, the data projected onto a state-space is

*p(t)*,

*r* is the radius of a hypersphere defined around a state

*p(n)* (where

*n* is a specific value of

*t*). Following the data, in state space,

*δt* is the recurrence time at which data falls within the sphere, once again, after having left it. Periodicity is a special case of recurrence when

*r* = 0 and all ‘states’ exhibit the same

*δ*. Time delay embedding is used to project the data series into multiple dimensions of a phase space. Each dimension

*m* corresponds to a multiple of the time delay

*τ*.

Autocorrelation and mutual information have been suggested [19] for selecting a proper combination of dimensions m, time delay τ, and radius r. However, our objective is to separate the two classes, ‘successful’ and ‘unsuccessful’, as far as possible based on a distance metric and the given data without losing generalization power. Neither class presents apparently periodic signals. As such, the novel parameter selection regime, as proposed here, finds a ‘structure’ in the signal, defined by dimensions *m* and time delay *τ*. This structure would differ significantly in its pseudo-periodicities for the two classes. Proper parameter selection is essential in rendering this method useful. Four post-defibrillation signals that exhibited regular sustaining sinus rhythms, with narrow complexes, were selected as successful prototypes. Four defibrillation signals that induced minimal change in the ECG or were immediately followed by smooth VF after shock, with no conversion, were selected as unsuccessful prototypes. Note that selection of pre-defibrillation signals is based solely on post-defibrillation segments. Considerable variability was observed in prototypes of the unsuccessful class. Selecting more prototypes, at least for this class, should result in a better tuning of parameters (by the procedure described in next paragraph) for RPD-PD. However, this desire for more prototypes had to be balanced with the need for a relatively unbiased sample set, given the relatively small size of our dataset. Thus, the number of prototypes for this study was kept to four.

For 10-fold cross validation and a dataset with n instances, each training-set would contain

*n*-(

*n*/10) samples, thus leaving out the test set. A range of possible values was defined for each parameter. Recurrence period density was then calculated for each combination of parameter values and each signal in the training-set (TS) and prototype-set (PS). We define the metric

*KD* (Equation

7) to calculate the pairwise distances from each TS density to all PS densities.\

Here, *s* stands for a given signal while *c* can stand for any of the other signals; *D*
^{
c
}
_{
i
} and *D*
^{
s
}
_{
i
} are the density values at a certain period *i*. *KD*, being inspired by the Kullback–Leibler distance, is biased towards the characteristics of *c* but, unlike KL, can also serve to measure the distance between two discrete distributions. Given classes A and B, a density from class A is subdivided into non-overlapping windows or ranges, which are compared (by *KD*) with respective windows of other densities. Therefore, our optimization is performed over a total of four variables, *m*, *τ*, *r*, and window, as follows.

Classes are maximally separated by maximizing the quantity

*sep* (Equation

8).

*Sep* represents closeness of all TS signals to PS signals in their own class (and remoteness from the opposite class), while also accounting for differential variation in within-class distances for the two classes. We deem this normalization necessary, as data in one class may be more homogenous than data in the other.

Here, *L* is total number of TS instances/defibrillations. For a given *i*, *KD*
^{
B
} and *KD*
^{
W
} are means of between-class and within-class distances, respectively, to instances in PS. *C*
^{
B
} and *C*
^{
W
} are total number of PS instances in the opposite class and *i*’s own class, respectively.

Each input signal from the test set is then compared to each prototype in both classes. The following distance is calculated as two features,

*sKD*
_{
B
} and

*sKD*
_{
W
}, for a signal

*s*.

Here, *Q* is total number of signals in PS for a given class, *T* is longest period in the chosen window, *D*
^{
P
} and *D*
^{
S
} are vectors representing densities of the prototype and *s*, respectively, and *sgn* is the sign/signum function. The average *sKD* for each class serves as an attribute of a given signal.