Given the H1N1 strain emergence and in anticipation of the significance of the 2009–2010 influenza season, the co-authors launched a previously-described [24] pilot study in the midwestern Canadian city of Saskatoon to electronically collect contact patterns between 36 participants, in addition to their influenza-related health status information. The study was conducted between November 9th, 2009 and February 9th, 2010 – a time period coinciding with the second rise of reported H1N1 cases in the province of Saskatchewan [30]. Each participant was requested to carry a proximity sensor whenever awake during the study period, and to respond to a sequence of weekly health surveys via a web browser. Participants filled out an informed consent form prior to joining the study, as required by the university research ethics board.
To study the impact of network representation on simulation outcomes, an H1N1 infection transmission model [24] was created and parameterized with empirical characteristics of the H1N1 pathogen [31]. The model simulated the spread of infection between agents through their daily interactions. Agents’ daily connectivity – either from empirically collected or aggregate approximations – were imported into the model and defined the potential for infection between discordant agents. Overall, 616 different scenarios – each positing a different connectivity network derived from the data – were simulated, with each such scenario being run for 10,000 realizations by replaying a sequence of the 265,000 thirty-second time slots in the study period. Based on the 100,000 run ensembles with the same model and dynamic network reported in [24] we concluded that 10,000 runs were sufficient, as the maximum depth of infection in [24] was reached 10 times, implying that division by 10 would have a high probability of capturing even the most extreme infection events, while maintaining the probability distribution at significantly reduced computational cost. Because each run could be conducted relatively swiftly (typical run-times for an ensemble were 30 s) having a large number of runs was not computationally prohibitive given the resources available to us.
The underlying disease parameters and distributions were held constant across scenarios. Within a given scenario, stochastics associated with exogenous and endogenous infection transmission and duration of different phases in the natural history of infection induced variability in simulation output.
Data collection
The Flunet experiment covered 57 weekdays and 33 weekends/holidays (including Christmas break). Participants were asked to carry a small wireless sensor (or “mote”) capable of short-range wireless communication [8]. When two motes were in close proximity, they would record a contact with a minimum resolution of 30 seconds. Each contact record represented a contact session between two motes, which included the start and end time of a contact, and the distance between the adjacent motes binned by the received signal strength indicator (RSSI, a measure of the wireless signal strength) into close (< 5 m), medium (5–15 m) and far (> 15 m) bins [8]. Contacts longer than 7 hours (0.03% of total reported contacts) were removed, as we assumed that most contact of this duration was due to sensors abandoned near each other making the contact duration distribution broadly consistent with other long-term datasets [8, 23, 32]. Participants were asked to fill out a sequence of weekly health surveys, which included symptoms and diagnoses of ILI, reported date of H1N1 vaccination, and self-reported contact patterns. Demographic data was collected in a single survey at the conclusion of the study.
Transmission model
As the data collection occurred during the H1N1 outbreak, we used an agent based H1N1 SEIR transmission model to simulate the infection dynamics. This section includes a short description of the model; interested readers are referred to a detailed specification in [24]. The simulation model classified each individual in the sample population into one of seven states: Susceptible, Latent, Asymptomatic Infectious, Symptomatic Infectious, Symptomatic Non-Infectious, and Recovered. All the agents in the model started in the Susceptible state, consistent with limited pre-existing population-level immunity to H1N1. A susceptible individual could contract the infection either from exogenous or endogenous sources. Exogenous sources are defined as the population outside the study who were in contact with Flunet participants and could transmit the infection to the monitored individuals, while endogenous sources are other Flunet participants in an infectious state within the simulation. Both endogenous and exogenous infection forces of infection were calculated using data from the 2009 outbreak [30], a prominent H1N1 model [31], and collected contact data [24].
A susceptible agent contracting the infection from either exogenous or endogenous sources transitions to the Latent state. When an individual enters the Latent period, the model computed the duration for each of the subsequent four stages of illness (Figure 1). In determining these durations, we sought to reproduce the observed variability in H1N1 progression by drawing the duration of incubation and duration of symptoms from two log-normal distributions [24]. The duration for other stages were calculated using these two values.
Each infected agent experienced the four illness states sequentially with the passage of time. A person in the Asymptomatic Infectious or Symptomatic Infectious state was considered infective. At each time the infective person triggered potentially transmitting events (e.g. coughs or sneezes) with a specified likelihood. A potentially transmitting event had a given probability of transmission to each susceptible in contact. The simulation model did not consider H1N1 mortality, self-quarantine, antiviral administration, or hospitalization outcomes. While the H1N1 model employed has been previously used the capture the impact of vaccination [24], vaccination was not considered in this study.
Connectivity patterns
This work seeks to analyze and quantify the impact of contact pattern representation on transmission outcomes in agent-based simulation models. Because we had recourse to data offering considerably greater temporal span than most other past contributions in this area, we could more readily examine the impact of varying levels of temporal aggregation on model outcomes. To do so, we looked at three different experimental manipulations, each associated with additional parameter variations focused on a particular type of network representation. A combination of manipulation and parameter variation is termed a scenario.
The first baseline manipulation focused on two reference scenarios, each representing different extremes in the aggregation spectrum. The first employed the Flunet empirical data directly, as it captures the contact patterns between agents with the greatest fidelity. In the dynamic baseline graph, edges have weights of 1 or 0 – that is, they correspond to the existence or absence of a connection during a given timeslot. The dynamic contact network can be visualized as a series of network where the nodes represent participants and each connections represents a pair of participants that were proximate to each other during a time step. This construct is often called a dynamic graph – drawing from the field of Graph Theory – where the participants are nodes, and the connections are the edges. This graph can either be realized in practice as a sparse dynamic graph with edges appearing and disappearing, or a time series of symmetric matrices with binary constituents, with 0 indicating the absence and 1 indicating the presence of an undirected edge.
The second baseline manipulation scenario collapsed the dynamic Flunet graph down to a single static graph. In the static graph scenario, edge weights are replaced by the time averaged contact density between a specific pair of participants over the entire study. This is trivially constructed as the sum of all the dynamic symmetric matrices divided by the number of timesteps. This aggregation approach captures an extreme form of aggregation, in which no changes are made in contact graph structure over time. If contact dynamics had no significant impact on the results, then simulations using this graph should echo the fully dynamic case.
The second manipulation examined typical day approaches. As a method of temporally extrapolating graphs collected over shorter time horizons, both [9] and [15] employed a typical day [9] hypothesis, positing that the day(s) captured in the experiment were generally representative of longer-term contact patterns and could be duplicated through time to extend the dynamic graph until the simulated outbreak dissipated or reached a quasi-static endemic equilibrium. Once an interval was selected, the day-by-day duplication of that interval could be accomplished in one of two ways [9, 25]. In the first approach, the collected contacts are aggregated over that interval – in a manner similar to how the static graph is aggregated over the course of the study – with the aggregated contact network then being applied across the entire simulation time horizon. Another appoach simply replicates the fully dynamic graph for that interval throughout the study period. To evaluate the impact of the typical day hypothesis, the second manipulation consisted of a series of simulations with duplicated days using both the fully dynamic and aggregated typical day scenarios.
Reflecting the fact that many researchers lack recourse to high-fidelity empirical contact data for model integration, the third manipulation focused on fitted synthetic networks. Past contributions have generated random contact networks using small-world, scale free or other network topologies, often with contact probabilities represented as edge weights randomly drawn from other independent distributions. To analyze the impact of such an aggregate network representation on the spread of infection across the simulated population, the dynamic Flunet graph was reduced to distributions for edge weights and node degree and a set of new small-world contact graphs based on these distributions were created. To determine the impact of model fit on simulation results, several edge weight distributions were employed that provided increasingly accurate fit, at the cost of decreasing theoretical rigour.
Unweighted small-world connectivity graphs with 36 nodes were generated using the Watts and Strogatz model [33] in R [34, 35]. Average path length and cluster coefficient were used as measures of similarity with respect to the aggregate empirical Flunet graph. Five hundred possible parameterizations of small-world networks based on different connectivity ranges and rewiring probabilities were each generated with 10,000 realizations. For each such realization, the average of each of the similarity measures were calculated. The connectivity range and rewiring probability parameterization that yielded the highest average similarity measure was selected as the base for unweighted graph generation. The parameters were then used to generate final unweighted small-world networks. Figure 2 shows the resulting distributions of networks selected using this process.
Weights were assigned to the edges of the unweighted small-world network by drawing the value for each edge from a distribution. For each of the scenarios within the synthetic network manipulation, we examined the effects of using three different fitted distributions, as well as drawing from the normalized empirical histogram of pairwise Flunet aggregated contact durations (ACDs).
Fitting of distributions was performed using linear regression in MATLAB. For linear regression to perform properly, data underwent log-log (power law fitting) or log-linear (exponential fitting) transformation prior to performing the piecewise fit. For fitted distributions, the fitted curves were required to achieve a R2 value exceeding 99%. Piecewise breakpoints were selected by iteratively changing the breakpoints, performing the regression, and manually selecting the point at which error began to increase sharply, but which still maintained a minimum R2 of 99% value for all the piecewise components.
Figure 3 shows one of the curve fits implicitly and three explicitly. The red and blue lines correspond to the two and three piece fits for the data. The data points themselves form a discrete empirical distribution, and the three piece plus the additional three points with the highest aggregated contact duration form a combined functional-discrete distribution, which can be mathematically considered a 6 part piecewise function, where the discrete outliers are described as Dirac delta functions.
Infectious events were generated according to a Poisson process as described in [24]. When such event occurred in the FullD and DayD cases, the state of the contact graph was queried, and connected agents were infected based on the probability of infection for the pathogen. For aggregate representations a joint probability of the edge weight from the static graph and infection probability was used to determine infection probability. This is mathematically equivalent to randomly sampling the contact records from either the DayD or FullD contact records for the DayA and FullA cases.
Detailed scenario description
The three primary experimental manipulations (baseline, typical day and synthetic network representations) are each represented by a set of scenarios describing the method for generating the contact graphs. Each scenario in turn is composed of a set of cases; for example, there are 57 possible typical day pairs to investigate using our methodology; by contrast, each baseline scenario is associated with just a single case. Each experimental case was simulated for 10,000 realizations, where a single realization is an agent-based simulation of the entire 92 day study period, yielding a total of 6.16 million realizations across all scenarios. Here we describe each scenario in detail.
-
1.
Full-Detailed Network (FullD): FullD is the first scenario in the baseline experimental manipulations. The connectivity pattern in this case uses the complete contact information of participants throughout the study as a dynamic graph, preserving the chronological order of contacts. For each participant, his/her observed contacts with other individuals in the study at each of 265,000 thirty-second time-slots across the 3 months of the Flunet study were imported into the model to represent the connectivity pattern of the corresponding agent. In each realization, the model stepped through all timeslots sequentially, simulating infection transmission using the corresonding inter-agent connectivity graph.
-
2.
Full-Aggregated Network (FullA): FullA provides an upper bound on the impact of temporal aggregation. The connectivity pattern of this scenario is similar to the FullD scenario, but the contacts are aggregated over time. To generate the connectivity pattern, the study-wide per-timeslot contact likelihood between any two given agents was calculated based on the Flunet database, and imported into the model. Assuming the probability of two given nodes contacting each other across the entire 92 day study period is p, the connectivity between those two agents in a given timestep model was drawn from a Bernoulli distribution with success rate of p. Before starting the simulation, 265,000 samples were drawn from the distribution for each pair of nodes, representing the simulation connectivity patterns between those nodes during each successive timestep. Note that, in this scenario, contact patterns are regenerated at the beginning of each realization, and while the contact likelihood between pairs of agents are held invariant between realizations, the network dynamics and chronological order of the contacts are not preserved.
-
3.
Day-Detailed Network (DayD): The Flunet study covered 57 weekdays and 34 weekends/holidays. The DayD scenario – the first scenario in the typical day manipulation – abstracts this as 57 cases, each related to a unique pair of weekday-weekend/holiday. The first 34 weekdays paired with each of the first 34 weekend/holidays, and the remainder of the weekdays are paired by repeating the first 23 weekend/holidays. Subsequently, each pair was used to generate the connectivity pattern between agents by replicating the weekday of the pair 57 times (for the non-holiday weekdays during the study period) and replicating weekend/holiday of the pair 34 times (during the weekends and holidays during the study period). Therefore, in each case of this scenario, the ordered contacts of a particular weekday-weekend pair were replicated to cover all remaining days of the simulation as well.
-
4.
Day-Aggregated Network (DayA): This scenario – the second scenario in the typical day experimental manipulation – consists of a hybridization between DayD and FullA. Like DayD, it also consists of 57 cases, each based on one of the 57 weekday-weekend pairs. Similar to FullA, the contact network used is aggregated over time (here, over a day) and regenerated by each realization prior to simulation. For each of the 57 weekday-weekend pairs, we derived the weekday-specific per-timestep contact probability between any two given nodes based on the specific contact patterns seen in the weekday from that pair. A weekend-specific per-timestep contact probability was analogously derived for each pair of nodes. For each pair of participants, and each timeslot of each (non-holiday) weekday throughout the study period, samples were drawn from a Bernoulli distribution with the specified weekday-specific contact probability for that pair. Contacts in weekend timeslots were similarly defined.
-
5.
Small-World Network with Power Law-Exponential Fitted ACD (SW2P): In the first scenario in the synthetic network experimental manipulation, the selected connectivity range and rewiring probability (described in the previous section) were used to construct 100 unweighted small-world networks. To determine the weights for edges, a distribution was generated using a two-piece Power Law-Exponential distribution fitted to the Flunet ACD curve based on [36], where 4 data points from the head and 18 data points from tail of the empirical distribution were cut to improve the fit. The connectivity of network i was determined by drawing the weight associated with each edge in the unweighted i th small-world network from the 2 piece distribution.
-
6.
Small-World Network with Power Law-Exponential-Exponential (SW3P): To construct connectivity patterns in the second scenario of the synthetic network manipulation an approach similar to SW2P was followed, but a 3-piece Power Law-Exponential-Exponential distribution was used to fit the Flunet ACD empirical distribution. Here, 3 data points from the head and 4 data points from the tail of the Flunet distribution were removed to improve the fit. As with SW2P, 100 connectivity networks were created.
-
7.
Small-World Network with Power Law-Exponential-Exponential True Tail Replaced (SW3PTT): In the curve-fitting for SW3P – the third scenario in the synthetic network manipulation – we cut 4 points from the tail to better fit the distributions. Although the eliminated tail portion is approximately 1% of all data points in the Flunet empirical distribution, their contact duration was substantially larger than that of other data points, resulting in a greater importance to the network-wide transmission of infection [24]. Similar to SW3P, this scenario also generates 100 connectivity networks by drawing from a 3-piece fitted distribution, but randomly selects 4 weights within each such network and replaces them with the 4 empirical values to preserve the important contacts in the tail.
-
8.
Small-World Network with Empirical ACD (SWE): The fourth and final scenario in the synthetic network manipulation uses a specified connectivity range and rewiring probability to generate 100 small-world networks, where the weights for the edges in each network are drawn from the empirical Flunet ACD distribution.