Our primary data source was Flunet [10]. While other contact datasets are available [6, 12], and others with health information have been described [13], Flunet is unique in providing longitudinal information on contact patterns, occurrence of influenza-like illness (ILI) symptomology, and vaccination status obtained during an epidemic outbreak in the region under observation. Our micro-contact data contained detailed inter-participant contact patterns but lacked data about the threat of infection from non-participants, which we modeled using reported H1N1 incidence data for the same time and province as the contact data [8]. A system diagram outlining the flow of data and state changes is shown in Figure 1.

The simulation model is encapsulated in the dashed box. Agents remained in a susceptible state unless acted on by an infection event. Such stochastic events were triggered externally using the exogenous pressure data derived from case reports [8], or internally through contact with an infected agent. The likelihood of endogenous infection was governed by contact and vaccination data from [10] and infection risk information drawn from [11]. Because we assumed H1N1 re-infection risk to be negligible, agents in the recovered or vaccinated state remained there until the simulation ended. If an agent became infected, the infection ran its course deterministically through the stages of infection. The duration of each infection stage was drawn from the distributions in [11] and derived quantities.

We performed Monte Carlo ensembles of stochastic dynamic simulations operating on the contact data, where the primary variables drawn from distributions were disease stage durations, exogenous infection event rates and the probability of transmission from infected endogenous contacts. Ensembles were selected in a memoryless fashion based only on the disease parameters. In every realization (simulation run), the contact record was stepped through like an animation, creating exactly the same sequence of contacts in the course of every Monte Carlo realization, which we have termed a Groundhog Day technique, after the 1993 movie of the same name. This is similar to the technique we employed in [7], but different from the technique employed by [6] where a single daily aggregate contact network was employed. While our technique is computationally more expensive, it captures inter- and intra-day heterogeneity in the contact network. While agents repeatedly relived the same sequence of contacts in different realizations, the stochastics associated with infection progression and transmission gave rise to differences in disease spread.

### Contact data

The Flunet study population consisted of 36 participants, each carrying a small wireless sensor (or “mote”) capable of short-range wireless communication [10]. Participants were asked to carry the sensors with them at all times during the experiment period. In addition, 11 stationary motes were deployed at fixed high-traffic locations, picked by experimenters, in order to study the contact patterns between people and places [14]. Three of these stationary motes were also connected to a networked PC and acted as data sinks, opportunistically receiving accumulated data from nearby mobile motes.

When two motes were in close proximity, they would each record a contact with a minimum resolution of 30 seconds. Each contact record represented a contact session between two motes, which included the start and end time of a contact, and the distance between the adjacent motes. A contact’s distance was estimated by binning the received signal strength indicator (RSSI, a measure of the wireless signal strength) into close (< 5 m), medium (5–15 m) and far (>15 m) bins [10]. Although contacts between participants and stationary nodes at various ranges were recorded, only close contact data from person-person interactions are considered here. Participants were asked to fill out a sequence of weekly health surveys which included symptoms and diagnoses of ILI, reported date of H1N1 vaccination, and self-reported contact patterns. Demographic data was collected in a single survey at the conclusion of the study.

A preliminary analysis of the dataset is provided in [10]. Analysis in Figure 2a and 2b has been reproduced here for the readers’ convenience while additional analysis germane to the application of the dataset to agent-based modeling of infectious disease (Figure 2c and 2d) has been added.

Figure 2a shows that contacts are tightly clustered throughout the workday, with staff arriving in the morning and graduate students trickling in throughout the day. Sporadic contacts are recorded throughout the evening and night. This figure also illustrates that the contact data contains primarily workplace relationships. Figure 2b shows the complementary cumulative distribution function (CCDF) of contact duration. In addition to the CCDF for all the collected data (solid line), we removed contacts with durations exceeding 7 hours (0.03% of total reported contacts) from the raw dataset (dashed line) because we assumed contact of this duration was due to sensors abandoned near each other. Removing this section of data yields considerable differences in the distribution’s tail. The CCDF is broadly consistent with other long-term datasets of this nature [15]. The heterogeneity of the contact distribution is important for our hypotheses and assumption - that contact dynamics have significant impacts on infection rate as initially noted by [6] – which in turn drove the simulation design. In Figure 2b, contact duration spans more than two orders of magnitude. If we posit that an infectious individual gives rise to contagious events (e.g., sneeze or cough) with some stochastic arrival probability independent of the contact duration, a susceptible is likely to experience more contagious events in a prolonged contact than in a shorter one, a property assumed in some other modeling studies [16].

To visually highlight the impact of cliques and place on the dataset, Figure 2c plots the relationship between contacts which existed for an average of 18 min/day. This threshold was to represent a plausible amount of time per day that a regular contact might have occurred over the course of the study, bearing in mind that weekends and holidays are included in the denominator. Black nodes represent stationary nodes associated with a location, and are included in this graph for illustrative purposes only. As is apparent in the graph, nodes that are generally collocated have a high degree of contact with each other. Nodes that are not collocated have much lower connectivity, with the exception of a few bridging individuals.

Given the importance of network structure, we consider the span of the network in Figure 2d, which is closely related to degree centrality (see Network Metrics Section). This graph is shown for two scenarios: a scenario where only close proximity constitutes a contact, and a scenario where any detectable presence qualifies. When limiting the analysis to close contacts, the histogram is both more peaked and has a lower mean than when considering all the possible contacts. The modes are 22 and 31, respectively, implying that many participants saw most of the other participants at least once. However, because it is not saturated at the maximum (as would be the case if all participants saw all other participants) it is logical to hypothesize that some partially isolated cliques exist, and that the close contact network is more strongly cliqued.

### Transmission model

#### Model design

The simulation model classified each individual in the sample population into one of seven states: *Susceptible*, *Latent*, *Asymptomatically Infectious*, *Symptomatic Infectious*, *Symptomatic Non-Infectious*, *Recovered*, and *Vaccinated*. All the agents in the model started in the *Susceptible* state, consistent with limited pre-existing population immunity to H1N1. A susceptible individual could contract the infection either from exogenous or endogenous sources. Exogenous sources are defined as the population outside the study who were in contact with Flunet participants and could transmit the infection to the monitored individuals, while endogenous sources are other Flunet participants in an infectious state.

Dynamic transmission models differ in their treatment of contacts. For some epidemiological contexts, the contacts underlying transmission are of defined or bounded duration – for example, needle sharing, sexual encounters, and blood transfusions. For this class of contacts, the frequency rather than the duration of contacts is the primary source of variability in transmission risk. For air-borne infections, however, the likelihood of transmission rises not only with contact frequency, but also with contact duration [6, 16].

For the case of H1N1 influenza transmission, our model assumes that on-going contact between two discordant individuals provides a conduit for transmission, where the likelihood density of transmission is a constant independent of contact duration. More specifically, we posit that an infectious individual gives rise to potentially contagious events (e.g., sneeze or cough) at a fixed rate *v*. We further treat any susceptible individual in contact with that person as having a likelihood *β* of contracting the infection for each such infectious event. Similar to the analysis of [6, 16], infections are more likely to occur in a longer contact than in a shorter one.

Given this model, the basic reproductive number *R*
_{
0
} (the average number of secondary infections caused by an infective individual, in an otherwise susceptible population) is as follows:

{R}_{0}=\overline{{T}_{i}}\overline{{F}_{C}}v\beta

(1)

where \overline{{T}_{i}} is the mean duration of infectiousness in days, \overline{{F}_{c}} represents the average daily cumulative contact duration of an individual in time slots (summed regardless of concurrency), *v* is the number of potentially infecting events per time slot of contact between two individuals, and *β* is the mean likelihood of a susceptible will be infected by a given infecting event.

For endogenous infections, we assumed that the mean of the basic reproductive number R_{0} for our study population was equal to 1.31, identical to that reported in a prominent Canadian H1N1 study [11]. For an infective person, the infection hazard of infecting an adjacent susceptible individual per time step (*βv*) was thus determined as follows:

\beta v=\frac{{R}_{0}}{\overline{{T}_{i}}\overline{{F}_{c}}}

(2)

Where *T*
_{
i
} is set to 3.38 [11], and \overline{{F}_{c}} is computed using recorded data in Flunet dataset according to the following formula:

\overline{{F}_{c}}=\frac{{\sum}_{i=1}^{{N}_{p}}{\sum}_{j=1}^{{N}_{p}j\ne i}{T}_{c}(i,j)c}{{N}_{p}{N}_{d}}

(3)

where *N*
_{
p
} represents the number of study participants, *N*
_{
d
} gives the study duration, and *T*
_{
c
}(*i*, *j*) indicates the total duration of contacts (in days) between nodes *i* and *j* during *N*
_{
d
} days.

The per-week infection hazard of acquiring the infection from exogenous sources was determined according to the following formula:

{P}_{x}\left(k\right)=\frac{{N}_{k}}{{N}_{U}-{\sum}_{i=1}^{k-1}{N}_{i}}

(4)

where *k* refers to the week number since the start of the simulation, *N*
_{
i
} gives the number of laboratory confirmed H1N1 cases in the province of Saskatchewan during the *i*
^{th} week of the 2009–2010 influenza season, and *N*
_{
U
} refers to the total population of the province. The denominator represents an estimate of the number of susceptible individuals in the province. The entire denominator thus estimates the number of susceptible individuals who remain at risk in week *k*. We note here the computation of susceptible individuals in the denominator consider neither the vaccination status of the population (for which reliable data was not available at the time of simulation), nor the infections that took place during the previous influenza season. This formulation therefore systematically underestimates the actual infection pressure. We compensate for this systematic error by providing a thorough exploration of the space of possible transmission probabilities (Figure 3).

A susceptible agent receiving the infection from either exogenous or endogenous sources transitions to the latent state. Before starting the Latent period, the model computed the duration for each of the subsequent four stages of illness (Figure 1). In determining these durations, we sought to reproduce the observed variability in H1N1 progression by drawing the duration of incubation (*T*
_{
Inc
}) and duration of symptoms (*T*
_{
S
}) from two log-normal distributions with parameters from [11]. The illness duration *T*
_{
Ill
} was calculated by adding *T*
_{
Inc
} and *T*
_{
S
}. Using these three values, the total duration of infectiousness *T*
_{
Inf
} was calculated as:

{T}_{\mathit{Inf}}=\frac{{T}_{\mathit{Ill}}\overline{{T}_{\mathit{Inf}}}}{\overline{{T}_{\mathit{Inc}}}+\overline{{T}_{S}}}

(5)

where *T*
_{
Ill
} gives the computed duration of illness, \overline{{T}_{\mathit{Inf}}} is the average duration of infectiousness, \overline{{T}_{\mathit{Inc}}} shows the average incubation period, and \overline{{T}_{S}} is average duration of symptoms. Following the computation of the duration of infectiousness, the duration of the *Symptomatic Infectious* state *T*
_{
sInf
}was estimated using:

\frac{{T}_{\mathit{sInf}}}{{T}_{\mathit{Inf}}}\approx \frac{\overline{{T}_{\mathit{sInf}}}}{\overline{{T}_{\mathit{Inf}}}}

(6)

The remainder of the durations were computed using following subtractions:

{T}_{\mathit{aInf}}={T}_{\mathit{Inf}}-{T}_{\mathit{sInf}}

(7)

{T}_{\mathit{lat}}={T}_{\mathit{Inc}}-{T}_{\mathit{aInf}}

(8)

{T}_{\mathit{nInf}}={T}_{S}-{T}_{\mathit{sInf}}

(9)

where *T*
_{
aInf
} represents the asymptomatically infectious duration, *T*
_{
lat
} shows the latent period duration, and *T*
_{
nInf
} shows the symptomatic non-infectious duration.

Each infected agent experienced the four illness states sequentially with the passage of time. A person in the *Asymptomatically Infectious* or *Symptomatic Infectious* state was considered infective, and could infect other susceptible adjacent individuals with infection hazard *βv*. At each time step and for each adjacent susceptible, the infective person transmitted the infection with likelihood density *βv*. A susceptible receiving H1N1 vaccination transitioned to the *Vaccinated* state, and was thereafter considered immune.

For the sake of the simulation, we assumed no H1N1 mortality. Our study lacked sufficient data to predict whether a specific individual would elect to self-quarantine given a symptomatic infection, and did not consider hospitalization outcomes. Given these assumptions, we chose to regard an individual’s contact patterns as unaffected by the health status of that individual and those around them. To examine the degree to which these assumptions might shape simulation results, we simulated an additional Monte Carlo ensemble examining the extreme situation in which individuals removed themselves from circulation for the duration of their symptomatic period. Finally, in light of the dominance of the H1N1 strain during the Saskatchewan 2009–2010 influenza season, only one strain of influenza was considered.

#### Simulation setup

The model described in the previous sections was implemented in Network Simulator 3 (ns-3), a discrete-event simulator. A network of 36 agents was created, where each agent represented one individual. The Flunet study data was discretized into 30-second time slots, and at each time slot the connectivity between each pair of individuals was updated based on the contacts recorded in the Flunet dataset. This dynamic contact network can be visualized as a time-varying graph where edges appear or disappear every time step depending on whether the two participants were in contact. The network could also be effectively encoded as a fully connected graph where edge weights at every time step have a value of 0 (unconnected) or 1 (connected). The fully connected graph representation can easily be implemented as a time series of sparse symmetric matrices (one for every time step) where 0 represents no connection between the node (*i,j*) and 1 represents a connection. A variant of this representation of the connectivity pattern was employed. A flat text file for each agent was created containing a vector representing the contact between an agent and all other agents for every time step in the experiment. Before starting a realization, each agent loaded the connectivity file, and at each time step referenced the appropriate vector (line in the file) to infer its connectivity with other agents at a given time.

To estimate *P*
_{
x
}, the model required a time series of the laboratory-confirmed H1N1 cases in Saskatchewan. This data was extracted from the Public Health Agency of Canada FluWatch [8] on a weekly basis. As it is shown in Figure 3, the number of reported cases for the 13 weeks of the simulation declines monotonically to zero after the 9^{th} week (January 4^{th} 2010), and therefore *P*
_{
x
} in the model for 10^{th} week to the end of the simulation was zero.

Each susceptible agent drew from a distribution at each time-step to determine whether it was infected by exogenous sources. If it contracted the infection – whether from exogenous or endogenous sources – the agent determined the integral duration (in units of time-steps) for each state of the infection based on the equations explained in Model Design section, and proceeded to remain each state for the determined number of time-steps. During the infectious period, the agent drew from a distribution to to determine whether it infected other nearby susceptible agents.

#### Scenarios

The simulation explored a three-dimensional scenario space that examined the impact on model outputs of four distinct assumptions. The first two assumptions related to the exogenous and endogenous forces of infection (FOI). An exogenous FOI coefficient linearly scaled *P*
_{
x
} to values that were 1, 2, 4, 8, 16, and 32 times the baseline. Similarly, the endogenous FOI coefficient scaled *βv* by 0.5, 1, 1.5, 2, 2.5, and 3 times the baseline value.

The third assumption varied was whether the H1N1 vaccination status from [10] was considered during simulation. For the case without H1N1 vaccination, none of the self-reported H1N1 vaccination data was considered. For the scenarios that account for H1N1 vaccination, participants started susceptible but transitioned to the *Vaccinated* state according the time they reported an H1N1 vaccination in Flunet health surveys. Individuals who did not report vaccination in the surveys never entered the *Vaccinated* state. We assumed a negligible benefit of vaccination if the agent was infected at the time of vaccination, and allowed the infection to run its course.

One supplementary baseline scenario explored the impact of participants removing themselves from circulation during their symptomatic period. Note that to compute *βv* for this scenario, \overline{{T}_{i}}was replaced with \overline{{T}_{\mathit{aInf}}}, where \overline{{T}_{\mathit{aInf}}} was calculated as the average duration of asymptomatically infectiousness of all the previous baseline and alternative scenarios.

In total, the scenario space consisted of three baseline scenarios and 72 additional scenarios. Each baseline scenario was simulated using 100,000 Monte Carlo realizations; the other 72 alternative scenarios were each simulated using 2,500 Monte Carlo realizations. Exploration of the scenario space (including the baselines and alternative scenarios) required running 480,000 different realizations.

#### Metrics for contact networks structure

While static representations of social networks are convenient, popular, and can yield powerful insights [17–20], the temporal aggregation involved may obscure features of real contact networks that serve important roles in the transmission of infectious disease. Our experimental and simulation design provided us with rich information to study network dynamics. However, because network structure – particularly evolving network structure – is difficult to represent concisely, derivative measures are often employed [19]. To quantify the structure of our network, we employed four centrality measures: betweenness, degree, time degree, and log time degree.

Betweenness centrality is a classic measure of network structure that attempts to capture the importance of the node to the graph’s connectivity, by summing the number of times a node lies on the shortest path between two other nodes, calculated using:

{C}_{B}\left(v\right)=\sum _{a\in Nodes}\sum _{b\in Nodes-\left\{a\right\}}\frac{{\Sigma}_{\mathit{ab}}\left(v\right)}{{\Sigma}_{\mathit{ab}}}

(10)

where *v* is the vertex in question, *Σ*
_{
ab
} is the number of shortest paths between *a* and *b*, and *Σ*
_{
ab
}(*v*) is the number of shortest paths between vertices *a* and *b* that pass through *v,* summed over all pairs of vertices in the graph.

While betweenness captures a global picture of the network by examining shortest paths, degree centrality only considers a node’s number of one-hop neighbors. For a static graph, degree centrality is calculated according to:

{C}_{D}\left(v\right)=\frac{deg\left(v\right)}{n-1}

(11)

where deg (*v*) is the number of edges incident on *v*, and *n* is the number of nodes in the graph. Degree centrality can capture the local conditions of a node more accurately, but does not take into account the heterogeneous nature of the contact patterns and durations evident in our dataset. As a result, we defined two additional centrality measures to capture ∈s of network dynamics.

Time degree centrality (TD) for a node can be defined as the average over all time slots of the fraction of all other agents with whom that node is in contact in a given time slot (analogous to the “strength” metric proposed in [6]). This is computed in our case by summing up the count of that node’s contacts over all 30-second time slots in the entire study, and then dividing by both the number of time slots in the whole study and by the number of participants − one. This measure captures the duration of (potentially concurrent) interpersonal contact patterns. People who encounter many others for short periods would have a large degree centrality, and a similar TD centrality to individuals who spend a great deal of time in a smaller group and have a smaller degree centrality. Because time degree aggregates over contact times, and because contact times are often characterized by exponential or power law distributions Contact Data Section [6, 10, 15], we also introduce log time degree (LTD) centrality as a measure of contact density. TD and LTD centralities are calculated discretely as:

{C}_{\mathit{TD}}\left(v\right)=\frac{1}{{N}_{k}}\sum _{k}{C}_{D}(v,k)

(12)

{C}_{\mathit{LDT}}\left(v\right)=ln\left({C}_{\mathit{TD}}\right(v\left)\right)

(13)

where *N*
_{
k
} is the total number of time slots in the period and *C*
_{
D
}
*(v,k)* is the degree centrality of the *v*
^{th} vertex at time *k*. The LTD is simply the natural logarithm of the time degree. Time degree is normalized and therefore always less than or equal to one, causing log time degree to be always negative, with more negative numbers indicating a lower centrality.

If the heterogeneity of the system is dependent on the network structure, then the likelihood of a participant’s infection at some point during the study should be correlated with appropriate network structure metrics. We ran Pearson and Spearman correlations using the MATLAB statistical toolbox against the probability of infection in two baseline simulations (with and without vaccination) against the four measures of centrality described above. Given that an individual’s network location may also shape their likelihood of transmitting a pathogen when infected, for the same scenarios as above we ran correlations of the four centrality measures against the average number of secondary endogenous infections directly caused by a node once it was infected. Finally, to better understand the effect of vaccination status on the correlations derived above, we also used Student’s *t*-test to examine the difference in the four measures of centrality between participants who did and who did not report vaccination.