Bmc Medical Informatics and Decision Making Eurisweb – Web-based Epidemiological Surveillance of Antibiotic-resistant Pneumococci in Day Care Centers

Background: EURIS (European Resistance Intervention Study) was launched as a multinational study in September of 2000 to identify the multitude of complex risk factors that contribute to the high carriage rate of drug resistant Streptococcus pneumoniae strains in children attending Day Care Centers in several European countries. Access to the very large number of data required the development of a web-based infrastructure – EURISWEB – that includes a relational online database, coupled with a query system for data retrieval, and allows integrative storage of demographic, clinical and molecular biology data generated in EURIS.

far more demanding than simple queries, eventually including artificial intelligence predictive models.

Background
Social forces that produced Day Care Centers (DCCs) for preschool age children in many developed countries have -ironically -also created in these structures one of the major ecological reservoirs of drug resistant strains of Streptococcus pneumoniae, which spread globally and began to create serious complications in the chemotherapy of diseases caused by this dangerous pathogen [1][2][3]. Day Care Centers recruit in close physical proximity children of an age group that is characterized by high rate of carriage of S. pneumoniae, an immature immune system and frequent viral and bacterial respiratory tract infections leading to extensive use of antimicrobial agents which provide a powerful selective milieu for the emergence of resistant strains [4][5][6][7]. The best evidence that such strains can cause both pediatric and adult disease came from molecular epidemiological studies, which demonstrated that resistant clones of S. pneumoniae most frequently identified in disease [8,9] were also the ones frequently carried in the nasopharynx of healthy children in DCCs [10][11][12].
If DCCs are ecological reservoirs of resistant S. pneumoniae then reduction in the rate of carriage of such strains in DCCs should also impact on the frequency of infections caused by resistant pneumococci. Testing the efficacy of such a novel strategy was the purpose of the multinational initiative EURIS (European Resistance Intervention Study -Reducing Resistance in Respiratory Tract Pathogens in Children) [13] launched by the European Community in September of 2000 until 2003. Investigators from four countries (France, Iceland, Portugal and Sweden) supported by scientists from Germany and the USA joined forces to test the effect of a variety of different interventions methods (e.g. reduction in drug prescriptions; changing antibiotic dosing; improving hygienic conditions in DCCs etc.) on the frequency of nasopharyngeal colonization by resistant pneumococci -in carefully controlled studies.
The structure of EURIS is composed of four centers where strain collections and interventions are carried out: Portugal -Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa; Iceland-Landspitali University Hospital; Sweden -Swedish Institute for Infectious Diseases Control; France -Institut National de la Santé et de la Recherche Médicale. However EURISWEB represents data generated in only three of the four collection centers: Portugal, Iceland and Sweden, the three countries in which the timing, the age groups of the children and the methods used were fully harmonized. The French initiative, while addressing the same issues, was not directly comparable, as it involved different age groups, different mode of sampling, and schools rather than Day Care Centers. Therefore the French data were deposited in a different database. Four additional collaborating units assist as reference centers for the harmonization of methods in clinical microbiology (Iceland: Landspitali University Hospital); molecular epidemiology of antibiotic resistant genes and clones (USA: Laboratory of Microbiology, The Rockefeller University; Germany: University of Kaiserslautern); data management and mathematical modeling of epidemiological aspects of EURIS (Portugal: Instituto de Biologia Experimental e Tecnológica).
The risk factors, i.e. the nature and number of the factors that influence the rate of carriage of drug resistant S. pneumoniae in preschool age children and their quantitative contribution to the degree of colonization, are not well understood. Furthermore, major risk factors for nasopharyngeal colonization may differ significantly from one setting to another [14][15][16], which makes analysis of data generated by a multinational study like EURIS, more complex. The evaluation and comparison of such massive amounts of surveillance data necessitated the construction of a computerized infrastructure organized in such a manner that it would assure not only data storage and retrieval but also an eventual bioinformatics analysis. The purpose of this communication is to describe such a webbased infrastructure specially designed to fit the purposes of EURIS -the EURISWEB.
Several potential conflicting attributes had to be accommodated in the design of such an infrastructure. On the one hand, it was to provide full integration of data from different countries in a common normalized repository, fully accessible to all EURIS participants. On the other hand, it was also supposed to exhibit the properties of a local database with full separation between the countries involved. Finally, it was anticipated that, eventually, EURISWEB would be made available for wider usage for research and public health management at a later stage, with steep requirements of stability, scalability, security, user-friendly access, low cost portability, and transparent implementation for subsequent independent development. In the design of EURISWEB we took into account the multiple goals of such a web-based infrastructure which now includes a relational online database coupled with data retrieval and analysis tools, where registered users can access data and tools by using a personal login name and password through a standard web browser. Ultimately three of the four centers (Iceland, Portugal and Sweden), in which the nature of the pediatric population and mode of sampling were most comparable, chose to combine all data for deposition in EURISWEB, which now covers a large number of participant institutions: 16 DCCs in Portugal, 30 DCCs in Iceland and 25 DCCs in Sweden; a wide variety of sources of data, including demographic, socio-economic factors, clinical data, patterns and types of drug use and drug prescription; microbiological data on the antibiotypes and serotypes as well as molecular types of the pneumococcal isolates and DNA fingerprints of resistant genes.

Availability
A demo version of EURISWEB is available to the general public [17], accessible with username euris and password welcome. For those who intend to receive the e-mails sent by the query agent (see User-Friendly Query System), please request a personal account to the authors. As any modifications applied to the current implementation of the infrastructure will automatically be reflected on the demo version, new features may be already apparent when compared with illustrations and examples used in this manuscript.

Data and data acquisition
The diverse surveillance data (demographic, clinical, microbiological etc.) generated in project EURIS are used to fill five different types of Questionnaires which serve as the source of information to be introduced into the EURISWEB database. The relationship between the five questionnaires is described and illustrated later in this report (see Database structure and Database tables versus online forms). Typically each site will update surveillance information at least once per year. Questionnaires 1 and 2 are provided by the staff of each participating DCC.
• Questionnaire 1 contains information regarding physical features of the DCC (address, number of rooms and windows, area inside and outside the facility, number of children and staff, hygiene protocols and practice) -see Figure 1. Figure 1 Data entry forms Data entry forms for the DCC and room (unit within the DCC) questionnaires. Upon pressing the Insert button, the validation procedure checks the data and either inserts it in the database or informs the user about errors in the data.

Data entry forms
• Questionnaires 2 provide the same type of information for each room (also referred to as "unit") in the particular DCC -see Figure 1.
• Questionnaires 3 are filled at least once every year by the parents of the children. They contain demographic information on the household and environment where the child lives, including number and age of siblings, shared bedrooms, and specific conditions such as smoking in the house.
• Questionnaires 4 are filled by the parents just prior to each strain collection. They provide information on antibiotic consumption prior to sampling (type of antibiotic, taken when and for how long). Also provided are data on illness and hospitalizations of the child.
• Questionnaires 5 are filled by the participating microbiology and molecular biology laboratories. They contain characterization of each S. pneumoniae isolate for serotype; antibiotype (susceptibility to oxacillin, chloramphenicol, erythromycin, clindamycin, tetracycline, sulfamethoxazole-trimethoprim, and levofloxacin); MIC values for penicillin and ceftriaxone; molecular type by PFGE (Pulsed-Field Gel Electrophoresis) and MLST (Multilocus Sequence Typing) (for selected isolates); DNA probes for antibiotic resistant factors; and RFLP (Restriction Fragments Length Polymorphisms) for pbp (penicillin binding protein) genes of selected penicillin resistant isolates. All data in questionnaire 5 are obtained by common harmonized methods.

Database conception
Although there was an effort towards the normalization of data acquisition taking place in the different participant countries, the questionnaires delivered to the DCCs and to the parents contain various questions that reflect realities specific to the country involved. Accordingly, some questions only appear in the questionnaires of some countries. Also, the frequency with which updated information is collected differs between countries. Since discarding data was to be avoided at all costs in order not to confront local practices, the normalization process had to be extended to database conception itself. Instead of designing an optimal database structure for each country, an iterative consulting process was followed for nearly a year to produce a normalized database structure that fits the reality presented by all the countries involved. The final structure of the EURISWEB database accommodates both country specificity and common European health management practices.

Data retrieval
Besides providing comprehensive data storage, the webbased data management infrastructure must also allow the easy querying and retrieval of the data it contains. Since some of the retrieval requests may generate large amounts of data, or may require intensive computation, the requests are processed as background processes managed by a software agent that send e-mails to the user with information on the execution state of each request and, finally, a link to the completed report. User-friendliness was the primary concern in building the interface available to make these requests, with current version reflecting extensive user feedback.

Software and Hardware
All the software used to implement this infrastructure is Open Source and is provided under public license.

Database structure
The basic internal structure of the database consists of 9 tables with an average of 14 fields per table. Figure 2 shows a simple model of this structure, where boxes represent tables and lines represent the relations between them. There is only one type of relation in this structure, which is "one-to-many", the "one" side being represented by the single line and the "many" side being represented by the forked line. A one-to-many relation between two tables means that one record from one table can be associated to several records from the other table. For example, one DCC can be associated to several rooms (units) in the same DCC; one unit can be associated with several children; and one child can be associated with several siblings.
The description illustrated in Figure 2 is country specific. A separate set of tables was defined for each of the three participant countries -Portugal, Iceland and Sweden, all inside the same database, but not formally connected to each other. Although the questionnaires for the different countries have significant differences, as some countries may lack many fields or even whole tables of this structure, the critical feature is that all the common fields can be found in exactly the same location in each country-specific structure. Equally critical, the key fields are obligatorily shared by all countries, a feature that can only be easily achieved if a common host infrastructure is in place, which is the case in EURISWEB. Because the frequency with which updated information is collected differs between countries (see Data and data acquisition, and Database conception), many of the key fields are related to the specification of the sampling periods, playing an important role as temporal normalization features. Once the access restrictions are lowered, the conservation of ontology and structure enables intersection between country-specific structures to produce comprehensive data sets jointly describing epidemiological data, which are valid for all the participating countries. Furthermore, because all countries also share the same data retrieval system (see User-Friendly Query System), queries already built by different countries produce compatible results that can be promptly joined after removing the countryspecific fields.

Online interface
The interface between the database and the users is made of standard HTML pages (no external applications, "plugins", needed on the client side). Data entering is performed through five online forms that mimic the original paper questionnaires, to facilitate the insertion task (see examples of two forms in Figure 1). All data entered in the forms is submitted to online validation procedures before entering the database, thus avoiding some of the most common user errors that may cause integrity or consistency violations in the database. Upon pressing the Insert button for submitting data, the user is promptly informed of all its mistakes and given a chance of resolving them on the same page (example in Figure 3). Only after passing all the checks is the data effectively inserted in the database, and fitted into the respective internal data structure (see Database structure). Searching and visualizing data can be done on a record-by-record basis, using the same five forms format, or by browsing as a table that shows several records at the same time (example in Figure 4). Some simple statistics can also be requested online. For convenience, most of the tables presented can be directly viewed or saved in Excel format.
Data retrieval requests can also be made by filling a simple online form in which the amount of typing required is kept to a minimum (see User-Friendly Query System). The results can be viewed and downloaded in delimited text format, also readily importable into Excel.

Database tables versus online forms
The relationship between the internal database structure and the set of online forms available to the user is not a one-to-one association. Behind each form there can be more than one table, as shown in Figure 5. Although the mimicking of the original questionnaires by the online forms is meant to facilitate the user's adaptation to the data insertion and visualization, that is not the optimal data organization in a relational database. For example, the repeated set of questions about each antibiotic taken prior to sampling (see Data and data acquisition) should not result in a repeated set of fields in the same database table (table QUESTIONNAIRE, see Figure 2). Instead, each set of questions constitutes a row of fields in a different table (table ANTIBIOTICS, same figure).

Operational model
The operational model of the database interface is depicted in Figure 6, where the arrows represent flow of information between the various entities. The five online forms for record insertion and visualization, available to the user, are all built with the same general procedure (PHP engine). This program, written in PHP, reads files that contain all the information regarding the forms layout (layout files), designs the forms and manages all the interactions between the users and the database. Each layout file describes a form (for all countries) and consists of a few lines written in a subset of the PHP language, which indicate each field's properties, such as whether it is a numeric or Boolean field, a date or time field, and what are the range and type of values allowed. This program and the subset of PHP used to define the layout files are the core of the surveillance system reported here. Accordingly, to alter an existing form, or generate a new one, all the database manager has to do is update or build a layout file. Figure 2 Database structure Boxes represent tables and lines represent the relations between them. There is a similar set of tables for each country. There is only one type of relation in this structure, which is "one-to-many". The single line represents the "one" side and the split line represents the "many" side. A one-to-many relation between two tables signifies that one record from one table can be associated to several records from the other table.

Database structure
The layout files also include the description of the connection between the form fields and the actual database fields. This information must be in accordance to the internal database structure, which is managed by SQL (Structured Query Language) code also stored in files (structure files). Therefore, the database manager will need to keep them consistent with any changes in the structure files required by modifications in the online interface. These two simple tasks ensure both the automatic construction of personalized forms -together with online validation check procedures -and a smooth linkage between them and the database internal structure.

User-Friendly Query System
Although SQL is the standard way to access data stored in a database, using it requires some prior knowledge and experience from the user. The User-Friendly Query System, available to all the EURISWEB users, is an interface that facilitates query construction in order to make the wide range of possibilities offered by SQL amenable to the untrained user. The users are presented with a series of selection boxes where they can select the fields they want to see, the restrictions they want to apply to the records returned, and how the returned records are to be grouped (Figure 7). The chosen options are then transformed into actual SQL formatted statements that are sent to the query management agent, through the PHP engine, as shown in Figure 8. The arrows in the figure represent flow of information between the various entities (see Figure 9 for the whole operational model).
The query agent manages all the requests and runs them exclusively in background, so that high usage rates and complex requests do not interfere with the normal usage of the database interface. The agent interacts with the database and informs the users, by e-mail, of when their requests start being processed and when they finish, including the information of whether the query was Validation checks Figure 3 Validation checks Example of data entry form for the child records with error warnings to the user. The user must correct all the errors before being able to insert the record.
successfully answered (the interface gives users enough freedom to request impossible things) or not, in which case the results presented are an empty text page. Due to security reasons, the results of queries are never sent by email -they can only be downloaded from the server via an SSL connection.
Users can rerun, edit, or delete saved queries. They can also group queries into reports, so that a single request will yield all the results from the several queries of that report. Furthermore, users can save restrictions used often, and apply them to other queries. To minimize the time and effort required of the users, we have provided several pre-made queries, already aggregated into several logical reports. This feature may prove particularly useful if standard reporting formats become a regulatory requirement. Figure 4 Browsing records Browsing page for microbiology records. Here the user can browse through several records of the same table. Selection of visible fields and ordering by one or two fields is possible. Direct access to individual records is provided by the View, Edit, and Delete buttons.

Usage
The EURIS online database has been adopted as the data storage standard by three of the EURIS participant countries -Portugal, Iceland, and Sweden. Growing steadily since its birth, February 2001, it now has 24 registered users and contains a total of 213 DCC records, 720 unit records, 10991 children records, 13207 questionnaire records, and 13504 microbiology records, totaling more than 25 megabytes of data. The User-Friendly Query System, available since April 2002, now contains 400 premade and 786 user-made queries, aggregated into several reports.

Privacy and security precautions
In EURISWEB, each user registry includes not only its login name and password, but also its country identification, which completely blocks access to data belonging to other countries. In fact, by using different table sets for each country, the central database can behave like several different local databases, and the user is never aware that its access is restricted to only a subset of the complete system. In a near future, the fact that all countries use, after all, the same normalized structure, will allow simple queries and complex data analysis to be performed in the common data, as if we were dealing with a single country.
The user registration process also includes a level access number that defines restrictions for each type of user. Although the system was built anticipating this need, other precautions proved sufficient to monitor and to recover from possible destructive actions. All user inputs are scanned for invalid characters to prevent SQL code injection, and a record of all actions performed in the database is kept, including who did what, and when. Any accidentally deleted record can be promptly restored by the database manager; all the updates a record has undergone since its insertion can be tracked; and many database usage statistics can be easily performed.
Tables versus forms Figure 5 Tables versus forms Relationship between database tables and online forms. The optimal data organization in a relational database may not be agreeable to the user. There may be several database tables behind each online form.
Intrusion by unauthorized parties (hackers) is repelled by the need to log on with login name and password, and subsequent identification of the user with cookies protected by SSL, without which no page is ever shown and no query is run. A brute force attack is also limited by a delay introduced in the password checking cycle, and resources consumption at the server. Additionally, page accesses are monitored on a daily basis. Repeated login attempts would therefore be promptly detected before a sufficiently high number of probes take place. Furthermore, a firewall protects the server from being accessed on other ports apart from the HTTPS port, and the server soft-ware (Apache, PHP, kernel, etc.) is promptly updated if any security breach is detected in the current versions.
In all cases, names of children and DCCs are not kept in the database, instead being replaced by codes and acronyms manually assigned prior to insertion.

Scalability
The operational model described in the Results section is the basis for easy improvements and extensions to the whole infrastructure of EURISWEB. As a consequence of the design described in that section, database management can be fully dealt with by manipulation of the Operational model -database Figure 6 Operational model -database Operational model of the EURISWEB database. The arrows represent flow of information between the various entities. The database manager builds the layout files, used by the PHP engine to build the online forms, and the structure files, used to define the structure of tables and fields in the database. Layout and structure must be in accordance with each other. The PHP engine manages all the interactions between the database and the users. Figure 7 User-Friendly Query System interface Interface where the users build their queries, which are then translated into SQL. Saved queries can be rerun, edited, deleted, or grouped into reports. Commonly used sets of restrictions can also be saved and later applied to new queries.

User-Friendly Query System interface
layout and structure files. The core element of this operational functionality is supported by the PHP engine described in the Results section. As a result both maintenance and development scale well with increasing usage, particularly since availability of high performance hardware and Internet access have ceased to be an issue. It is noteworthy that the layout and structure files are particularly suited for extensions to the current model, including having new types of data integrated into the set already stored; having new countries and new country-specificities accommodated, while retaining previous accessibility and privacy. The demo version of the database shell and query system, made publicly available (see Availability for URL and login directions), was built by configuring a fictitious new country structure, which was achieved by performing minor modifications in the layout files.
This comes to illustrate that a possible useful extension to this system would be to allow selected users to manage their own tables and forms by providing a web-based interface to the layout and structure files. These users with management access permissions would not need to know the PHP or SQL languages, but simply interact with online forms with the same level of complexity as the regular database query forms. In our experience, most of the tasks requested to the database manager are simple and pose no risk whatsoever to the data, like adding an item to a dropdown selection field, resizing a text edit field, or even adding a new field to an already existing table. Given the wide geographic distribution of the EURISWEB users, describing what needs to be done to the database manager is as time consuming as specifying it in such idealized management forms. The database manager could then be left with only the more complex and "dangerous" tasks. Figure 9 shows how the operational model implemented, including the query system, could be configured to greatly remove the need for low level data management.

Future directions
The extension of data management to include data analysis is particularly suited for web-based implementationssuch as EURISWEB -since all computation takes place on the server side. This approach enables a bioinformatics approach to establish itself alongside data storage. As a consequence, advanced data mining tools such as multivariate statistical analysis or the identification of artificial intelligence predictive models using neural networks [25] and rule extraction by genetic algorithms can be made available alongside the data itself. This is mutually beneficial for usage and development and, on the other Operational model -query system Figure 8 Operational model -query system Operational model of the EURISWEB query system. The arrows represent flow of information between the various entities. The user makes a request through the interface, which is translated into SQL by the PHP engine and sent to the query management agent. The query agent interacts with the database in background and informs the users, by e-mail, about the state of their requests.
Operational model -the whole picture and hypothetical scenario Figure 9 Operational model -the whole picture and hypothetical scenario Operational model of the whole EURISWEB infrastructure, by joining Figures 6 and 8, plus the hypothetical scenario where the database manager could be replaced by an extension to the PHP engine that would allow privileged users to manage both the layout and structure files through web pages.