The web-based Delphi process deployment
The Delphi method involves the anonymous completion of a questionnaire presented to a panel of experts on successive occasions, called rounds. In a typical Delphi study, during the first round, each expert fills in the questionnaire. During subsequent rounds, they are again invited to fill in the questionnaire but they are also allowed to alter their initial choices, at the light of a provided feedback on the previous round responses of the group (as shown in Figure 1) [11].
This study was deployed using a previously PHP/MySQL-based computer application devoted to online Delphi surveys [14]. We further developed this application to display influenza epidemic curves as illustrated in Figure 1.
Ethics statement
The protocol was conducted in agreement with the Helsinki declaration and was approved by the ethical committee (CPP Ile de France V). We obtained authorization from the French Data Protection Agency (CNIL, registration number #471393) covering all non-publicly available data included in this study.
Selection of the expert panel
Four complementary sources were used to build the list of experts solicited in the study: network members of WHO Euroflu (key officials actively involved in European influenza surveillance) [15], North-American experts selected by an American influenza expert belonging to the National Institutes of Health, members of the “French influenza working group” within “French Public Health Council” (HCSP), and influenza experts selected via a systematic research on Medline [16].
An information sheet describing the proposed study and inviting to participate was emailed to the 322 potential participants. On each round, reminders were sent to those who did not answer nor declined the invitation.
Questionnaire
The first round questionnaire included 34 questions (Figure 2) of which 32 were time-series graphs. On each of the 32 graphs, experts were invited to indicate the starting and the ending week of the influenza epidemic period by using a slider (Figure 1).
Each graph included data from the French Sentinelles Network (http://www.sentiweb.fr) [3, 4] and from WHO-FluNet [17] (Figure 3): weekly national ILI incidence rates, weekly national proportions of confirmed influenza positive samples (virological data, available since 1997–1998), numbers of virological samples analyzed per week. Data were shown from the beginning of July (week 27) of the year to the end of June (week 26) of the following year, called seasons thereafter. Weeks were not numbered on the graphs and no indication to the year was given to the experts.
Among the 32 graphs, 26 corresponded to one of the French influenza seasons from 1985 to 2011 and six showed one of these same graphs but in duplicate (Figure 2), in order to measure variation within experts. Among the six duplicates, five influenza seasons (named partial duplicate seasons) were presented twice at each round, but with different information: either both ILI incidence and virological data or only ILI incidence data; and one influenza season (named full duplicate season), describing incidence and virological data, was presented two times at each round as internal quality-control tool.
The choice of the full/partial duplicate season(s) was random (same years for all experts and all rounds). The order of the 32 graphs was randomly defined for each expert and each round, except for the duplicate seasons which positions were non-consecutive, selected at random, and fixed for all experts and all rounds, in order to avoid neighbouring duplicates. To prevent the outstanding 2009–2010 influenza curve (H1N1 pandemic) from altering the assessment of the other epidemic curves, the corresponding graph was the last presented.
Finally, in order to collect the auto-evaluated level of expertise of the participants and their speciality, we added at the end of the first round (Figure 2) the two following questions: “How would you rate your level of expertise in influenza outbreak detection?”(answer from “1 - Low level of expertise” to “5 - High level of expertise”) and “What is your main occupation?” (multiple choice question - Epidemiologist/Virologist/Modelling specialist/Clinician/Other). The mean level of expertise of the experts who participated in the entire study was compared to the mean level of expertise of the experts who partially participated in the study, by using a Mann–Whitney non-parametric test.
Number of rounds and stopping criterion
The first round began at the end of May 2012; the third and last round ended in August 2012. A graph was not to be presented at the next round if the answers collected at a given round were stable, as compared to the previous round answers. We considered that such stability was reached whenever at least 75% of the experts had not moved the dates of the beginning and the end of the epidemic period by more than one week (experts named stable experts). This stopping criterion was applied to each graph. The study ended when the stopping criterion was reached for every graph.
Determination of the start and end of influenza epidemics and definition of the level of consensus
As previously proposed [14], because extreme values may correspond to erroneous answers, we eliminated for each graph at each round the 5% lowest values and 5% highest values. Considering the remaining 90% answers, the mode (most often cited answer), the median, the range and the interquartile range (IQR) of each bound (beginning and end of each epidemic period) were calculated. The mode and the median were used for estimating the central tendency of the distribution of the panel’s answers, while the IQR was used as a measure of the variability across the panel answers.
The mode was considered as most informative and was used to define the timing of each influenza outbreak. In the case of several modes, the closest to the median was preferred. The main analysis to define the period of the influenza outbreak concerned the 26 initial graphs. For the partial duplicate seasons, the one considered in the main analysis was the graph with virological data.
To define a consensus among respondents, we used the cut off previously used by Norder and al. [18], and defined three levels of consensus as a function of the percentage of experts whose answer was no different than one week from the mode: when this percentage was at least 75%, was between 50% and 75%, and was below 50%, the corresponding level of consensus was categorized as high, medium, and low, respectively. In order to evaluate the answers coherence of each expert we studied the differences between the modes and the individual answers, for each expert, and calculated for each expert the standard deviation (SD) of these differences.
Reproducibility and influence of virological data
We compared the answers of the duplicate seasons together (mode, median, range, IQR). For the full duplicate season, we studied, for each expert, if the same answers were given for the two graphs. For the partial duplicate seasons, we compared the level of consensus achieved between the graphs with ILI incidence and virological data and the graphs with ILI incidence data only.
All analyses were performed with the R software, version 2.8.1 [19].