This article has Open Peer Review reports available.
Autonomic care platform for optimizing query performance
© Steurbaut et al.; licensee BioMed Central Ltd. 2013
Received: 22 February 2013
Accepted: 16 October 2013
Published: 27 October 2013
As the amount of information in electronic health care systems increases, data operations get more complicated and time-consuming. Intensive Care platforms require a timely processing of data retrievals to guarantee the continuous display of recent data of patients. Physicians and nurses rely on this data for their decision making. Manual optimization of query executions has become difficult to handle due to the increased amount of queries across multiple sources. Hence, a more automated management is necessary to increase the performance of database queries. The autonomic computing paradigm promises an approach in which the system adapts itself and acts as self-managing entity, thereby limiting human interventions and taking actions. Despite the usage of autonomic control loops in network and software systems, this approach has not been applied so far for health information systems.
We extend the COSARA architecture, an infection surveillance and antibiotic management service platform for the Intensive Care Unit (ICU), with self-managed components to increase the performance of data retrievals. We used real-life ICU COSARA queries to analyse slow performance and measure the impact of optimizations. Each day more than 2 million COSARA queries are executed. Three control loops, which monitor the executions and take action, have been proposed: reactive, deliberative and reflective control loops. We focus on improvements of the execution time of microbiology queries directly related to the visual displays of patients’ data on the bedside screens.
The results show that autonomic control loops are beneficial for the optimizations in the data executions in the ICU. The application of reactive control loop results in a reduction of 8.61% of the average execution time of microbiology results. The combined application of the reactive and deliberative control loop results in an average query time reduction of 10.92% and the combination of reactive, deliberative and reflective control loops provides a reduction of 13.04%.
We found that by controlled reduction of queries’ executions the performance for the end-user can be improved. The implementation of autonomic control loops in an existing health platform, COSARA, has a positive effect on the timely data visualization for the physician and nurse.
With an increased growth of clinical support services and data sources, clinical information service platforms are becoming more and more complex. The emergence of medical devices, which monitor and collect data at high frequency, the availability of data in numerous databases and the increased utilization of the electronic patient data to support physicians’ clinical decisions, demand a high speed of data processing. Physicians and nurses put trust in electronic medical records to evaluate the patients’ conditions and to treat patients by taking therapeutic decisions. Slow data retrievals force the physician to wait longer for results of the current state of the patient. Due to the large amount of data variables and hence a high number of database queries, manual maintenance operations are no longer possible. For example, manually disabling time-consuming non-priority data retrievals in case of high load on the system is difficult. Moreover, in the medical environment the contents of the database is constantly changing with inserts of medical data or updates of existing values from medical devices which monitor the patient at high frequency or analyse the patients’ laboratory samples. Despite system administrators’ efforts to maintain critical health systems, symptoms of data slowdown cannot be detected in time and actions cannot be taken quickly enough to prevent performance decrease or system failure. This leads to a degradation of service quality and availability. Therefore, the manual reaction to such slow processes undermines the robustness and performance of the complete system.
Adding autonomic capabilities to the COSARA system
The autonomic computing paradigm aims to develop systems capable of self-management, which make decisions on their own and respond with appropriate actions on system failures or optimizations. This concept is in analogy with the autonomic nervous system, which manages our vital functions in the body without conscious directions. In autonomic computing, an autonomic manager implements control loops in which the managed element and the environment is monitored, data is analyzed, and actions are taken if components are in an undesirable state. It envisions a self-aware software system. In this article, we extend the existing COSARA health care platform with autonomic components. COSARA is an infection surveillance and antibiotic management service platform for the Intensive Care Unit (ICU). We propose extensions to COSARA by introducing multiple autonomic control loops. The reactive control loop takes an immediate action when slow data query executions are detected. In the deliberative control loop the decision to act is evaluated in an anomaly detection algorithm with detection of anomalies in the execution times of data retrievals. Anomalies are also predicted in the reflective control loop by detecting temporal periods with slow performance. A detailed analysis has been performed based on real-life data logs from the COSARA platform in the ICU of Ghent University Hospital. This article is structured as follows. In Section 'Related work’, an overview of autonomic computing architectures is presented and specific models from the health care domain are explored. The problem of managing COSARA data queries is thoroughly explained in Section 'Problem statement’. The extended architecture of the COSARA service platform is presented in Section 'Architecture’. Section 'Design of FOCALE-based control loops in the COSARA architecture’ describes the multiple control loops, which enhance performance of data queries. This includes a reactive loop, a deliberative loop that takes a decision by executing an anomaly detection algorithm and a reflective control loop that takes a proactive approach by detecting temporal patterns. Subsequently, the optimizations are evaluated in detail in Section 'Results and discussion’. Finally, Section 'Conclusions’ presents the conclusions of this paper.
Although autonomic management has received attention in enterprise wide network platforms, only a limited number of studies apply autonomic management to health care platforms. In this section we examine related work in both domains.
Autonomic management in health care
Autonomic computing has already been applied in body area networks in health care. On-body sensors monitor the patient’s vital functions such as heartbeat, body temperature or electrocardiogram (ECG) in a body area network and transmit the signals to a processing unit. Since this equipment is hard to maintain by its developers, the system should adapt automatically to changes. The telemonitoring applications that use continuous monitoring of patients’ health conditions require the self-management ability that autonomic systems propose. In, an event service for autonomic management support for e-health systems is proposed using Self-managed cells (SMCs). SMCs are autonomic systems that are able to add or remove components, detect failures of sensors automatically and adapt the system. In, it is described as an architectural pattern to provide Autonomic Management of Ubiquitous e-health Systems (AMUSE). The system needs to be self-configuring and self-managing with limited user interaction and autonomously adapts to changes in user activity, device failure and service addition. The SMC consists of an event bus, for communication between devices and management services, a discovery service and policy service. The policy service specifies the adaptation strategy (adaptation, authorization policies and event-condition-action rules) whereas the discovery service implements the protocol to search and integrate new devices in the SMC and maintains the connections. Changes in the environment are indicated by events, which trigger policies in the policy service and hence perform the action. In the used publish-subscribe mechanism, messages are published on the event bus and delivered to its subscribers, instead of directly delivering the message. In the VESTA system, the AMUSE system is extended with security support and policy management for authentication and access control. In, an autonomic model for the management of health care applications has been presented, adopting the MAPE control loop. This control loop consists of monitor, analyse, plan and execute phases and interacts with a knowledge layer. The model has been used to assure process quality of the medical information system and as supervisor of the compliance of medical decisions with the protocols. It has been applied for the treatment planning of diabetes. The prediction service in this system, which predicts the patient’s diagnosis using multiple regression, is implemented as a web service. Autonomic computing has also been applied in the hospital’s emergency department to maintain optimal quality of service and optimize performance of operations. These departments suffer from a high workload due to an increased demand on health resources and a limited clinicians staff. Sensors monitor the state of the environment (for example by using optical sensors, radio-frequency identifiers (RFIDs) and counters for people and workload). However, related work in papers covering autonomic health care mainly concentrates on the architectural models. To the authors best knowledge, no previous studies have been conducted which design, implement and evaluate autonomic control loops in the intensive care, with the aim to increase performance of data retrievals.
Autonomic architectures have been applied in industry systems to find early indications of failures and to investigate fault causes. The MAGNETO project(2010) focuses on probabilistic fault diagnosis to find the cause of service problems, such as service degradation and service breakdowns, in home area networks. The causes of network failures and observed network variables are modeled in a bayesian network which can infer the probability of the cause of a service failure. Several initiatives for building autonomic network architectures have been investigated in, consisting of hierarchical architectures, flat autonomic architectures and self-organizing networks. One of the hierarchical architectures is the Autonomic Internet project (AutoI) which deals with the autonomic management for the future internet in which autonomic management is applied to the management of virtual resources. The Component-ware for Autonomic, Situation-aware Communications, and Dynamically Adaptable Services (CASCADAS) deals with the development of an autonomic framework for creating, executing, and provisioning situation-aware and dynamically adaptable communication services. In, an anomaly detection framework was proposed to provide techniques to analyze and detect anomalies in runtime data of cloud systems by applying (i) data transformation, (ii) feature selection, (iii) outlier detection. Anomalies or outliers are patterns in data that do not conform to a well defined notion of normal behavior. Detection techniques have been developed to find these patterns which often represent exceptions, indications of system failure or interesting data which should lead to actions. Anomaly detection has been used in a variety of domains such as fraud detection of credit cards, fault detection in safety critical systems, insurance or health care, military surveillance using a diversity of techniques such as statistical methods, data mining, machine learning. Rabatel et al. addressed the problem of maintaining complex systems through preventive maintenance which detects abnormal behavior though collecting sensor data and analysis and found that these anomalies may lead to failure. In our case, we want to detect low data query performance.
The major advantage of the above described FOCALE cognitive control loop approach is its high variety in offered pro-activeness. As it consists of multiple control loops with different characteristics (reactive, deliberative and reflective) urgent tasks can be a less complex control loop (e.g., a reactive variant) and iteratively improved later on by a more complex control loop (i.e., a deliberative or reflective control loop). The FOCALE cognitive control loops have mainly seen an implementation in the area of network and service management. In previous work, we applied it to manage multimedia services by extending it with semantic capabilities. In this work, the focus was on combining different elements to jointly manage a service such as the streaming of multimedia in a computer network. Choi et al. have embedded the FOCALE control loop in their HiMang architecture. Their focus is more on the architectural aspect of the FOCALE architecture and less on the algorithmic implementations of the different control loops: they investigate the integration with policies and information models. Moreover their application domain is different to our approach as they study cloud-based networks, Quality of Service management and fault management. Kim et al. have implemented the FOCALE control loops to manage OpenFlow-based networks (i.e., the protocol that steers the Software-Defined Networking paradigm). Their solution uses the FOCALE control loops to allow setting up and maintaining paths in OpenFlow, even if unexpected link failures occur. Their approach focuses on maintaining datapath connectivity, while our approach has query optimization and management as primary goal. For this reason, the algorithmic approach is completely different. The same authors have also implemented the FOCALE control loops to prioritize and group alarms raised by a network management system. As this corresponds with a classification problem, it is more related to our approach. However, we use a semi-supervised learning approach through an anomaly detection algorithm, while they propose a more static rule-based approach.
Control theory approaches to query optimization
In this article, we propose an autonomic management approach to query optimization in a health information system. We present anomaly detection based algorithms that implement the aforementioned FOCALE cognitive control loops. The concept of control loops stem from control theory, which is a paradigm that allows to dynamically manage a system based the maximization of an objective and periodic or continuous feedback from the managed system. In the past, control theory has been successfully applied to many application domains (e.g., resource allocation, web server management, application server management). Typically these control loop approaches try to predefined service level objectives. Parekh et al. describe a methodology for designing control loops for managing service level objectives in performance management. Through the design of a statistical model, fit to historical measurements, they avoid requiring large and complex mathematical models. Our proposed system, and more specifically the reflective and deliberative control loops, uses the same approach: by detecting temporal patterns in historical data optimal actions are learned without requiring a model of the complete system. Hellerstein et al. also introduced the concept of such a statistical approach to predict future demands of software systems (e.g., a Web Server). Our approach extends this idea and introduces different levels of learnings.
Optimization of query processing often re-uses concepts of control theory and has mainly been studied in the context of grids. More recently, with the growing attention towards Big Data, new application domains of this research have been found. These approaches typically focus on large scale and distributed databases. As such, they are complementary to our approach as they can be used if the scale of the database system itself increases (e.g., introducing a higher level of replication). Also in the area of query optimization, dynamic management approaches have been proposed. Paton et al. propose an adaptive query processing algorithm with the same goal as our approach: reducing the overall response times of query processing. They focus mainly on the joint optimization of multiple queries as they are often grouped, requested by a single user. Park et al. present an approach where queries consisting of multiple joins are optimized. Their main approach consists of developing multiple candidate processing plans and only selecting the best plan after some initial pre-processing steps. Avnur et al. focus on adaptive query processing in large-scale and federated databases by continuously reordering joins inside a single query based on the observed - and highly dynamic - response times of subqueries in such a federated database. The above approaches can be seen as complementary solutions to our approach as they focus on the optimization of specific queries. As such, they investigate the structure of each query but are agnostic to the application demands regarding these queries. Instead, we focus on the application demands and allow the application to prioritize the queries based on the application logic and user expectations. By disabling less important queries we can already considerably improve the response time. This response time could even be improved further by applying the aforementioned techniques.
More generally speaking, the problem investigated in this article relates to dynamic scheduling in a resource constrained environment. In this area, several well-known techniques exist for scheduling requests (in our case: queries) such as earliest deadline first, first come first served, etc. We refer to the work of Suresh et al. for a complete survey of these approaches. Also in grids and, more recently, cloud systems job and application scheduling algorithms have been successfully applied. These techniques typically have a very broad application domain but also often parameters to be set (e.g., the deadline of every request). In our approach, these techniques can be used as an alternative to the current applied optimization action: the disablement of less important queries. We chose not to use these actions as we observed little effect in performance compared to a higher complexity in the configuration of the algorithm.
In summary, compared to the current state of the art, our approach is novel for the following reasons. First, the adopted three-layer approach, inspired by the FOCALE cognitive control loops, provides an important flexibility in the level of proactiveness that can be achieved. Compared to other data management approaches, we are able to quickly react to local problems and at the same time carry out more complex optimizations on a larger time scale. Second, to the best of the authors’s knowledge, our approach is the first that implements these control loops for data management. As such, the proposed anomaly detection algorithms and their integration with the three different loops, are completely novel and fundamentally different from previous approaches, which focused more on network and service management.
The COSARA platform is a platform for infection surveillance and antibiotic management in the intensive care. It is being used by physicians and nurses at the ICU of Ghent University Hospital, as part of the clinical workflow. COSARA is designed as a service oriented architecture and manages the antibiotic consumption and infection related information in the ICU. The COSARA system collects data from the laboratory, the clinical information system, and its own historical COSARA-database, processes these data, and presents the information or medical advice on a bedside computer, desktop at the physician’s office or at a mobile device.
The most frequently consulted data on the bedside computers consists of the patient’s clinical values and the microbiology results in this ICU. COSARA has a module offering a clinical overview with the values of temperature, white bloodcell count (WBC), thrombocytes, organ failure score, and prescribed antibiotics, and a module giving all microbiology results (samples with cultures, antibiogram and blood analyses). The COSARA system is designed in such a way that each time a physician or nurse requests the clinical values of a patient, all necessary information is requested through a series of queries (called a query group). This typically results in a burst of queries, each time a patient record is requested.
Note that we chose to disable the queries instead of opting for more complex scheduling algorithms as discussed in Section 'Related work’. The reason for this is the following: priority-based algorithms would assign lower priorities to less relevant queries. The particular problem tackled in this manuscript, however, has to deal with long peak periods. As during peak periods, new high priority queries are constantly being introduced, low-priority queries are continously ignored, until their relevance becomes obsolete. As a result, their execution becomes useless and the effect is similar to action we chose, which is disabling the less important queries from the start.
Query selection process
As described above, the approach presented in this article aims at reducing the response times of queries during peak hours in the COSARA architecture by temporarily disabling less important queries. A first step in achieving this is the selection of queries, which have lower priority for the users of the COSARA system. This requires domain knowledge about the application. In this section, we first describe how this knowledge is modelled and subsequently present an algorithm for selecting candidate queries.
We model the domain knowledge of the COSARA application using RDFa, which is a standardized model for data interchange. One advantage of RDF is that models for well-known and generic concepts are available in RDF or the ontology language OWLb, which uses RDF-based serialization. For this reason, we re-use existing models for incorporating the relevant knowledge as much as possible.
More specifically, we use the IntelLEO Workflow ontologyc to model how users and background processes interact with the COSARA system. The IntelLEO Workflow ontology defines concepts such as roles and users, and allows defining a sequence of activities and tasks. As will be described below, this is used to define the clinical decision process of the users in combination with the COSARA system. In defining these workflows, we often need to refer to medical terminology. Therefore, we use the Galen ontology, which defines this terminology and their links. Finally, we need to model the application specific knowledge linked with the COSARA application (e.g., which queries are executed when). We do this by linking newly defined concepts to the tasks and activities defined in the workflow. More specifically, we allow a task to refer to a set of queries: each query can have subqueries and has attributes such as a value defining its complexity and priority.
Identification of candidate queries for disablement
In this section, we present an assisted algorithm for identifying less important queries. The algorithm ranks the different queries in the COSARA system based on their importance for the users. This ranking is then given to an administrator as an aid in selecting the queries that can be disabled when a scarcity in resources occur.
Overall, the ranking is done based on three factors (i) the visibility of the queries in the application, (ii) the query’s complexity and (iii) the importance of the queries in the clinical decision process. The algorithm provides a formal implementation of a manual heuristic, which was initially carried out by the administrator of the COSARA system. An overview of the algorithm is given in Algorithm 1. As shown, the algorithm iterates on all different activities and corresponding tasks and queries in the workflow. Each query is given an initial rank of 1, which corresponds to the highest importance. However, several factors can increase the rank value (i.e., lower the query’s importance). For example, background processes and optional activities or tasks are penalized with a factor (PenaltyNonUser, PenaltyOptionalActivity and PenaltyOptionalTask, respectively). Moreover, high complexity and priority values further increase the rank value. The set of queries and their according rank are stored, sorted and finally presented to the administrator for revision. The query selection process only determines the order of the queries and leaves it up to the administrator to select the actual queries for disablement.
Design of FOCALE-based control loops in the COSARA architecture
Following the FOCALE cognition model, we define three dynamic control loops with the aim of optimizing the performance of query execution. The goal of all three control loops is to keep the response time of a query group below a threshold, which corresponds with an acceptable delay. Note that the quantization of an acceptable delay is a subjective matter as it relates to how the users of the system perceive it quality (i.e., the so called Quality of Experience). Furthermore, what is acceptable also depends both on the type of application and type of operation performed (i.e., the type of query group). For our COSARA system, an acceptable delay for the most occurring query group (i.e., the microbiology query group) is up to approximately 25 seconds. This value was determined through discussions with the users of the COSARA system and based on their day-to-day experience with the system.
Reactive control loop
The goal of the reactive control loop is to detect the occurrence of a large disruption of the system. Only if the performance of the system is severely affected, an immediate action is taken corresponding with the disablement of less important queries. As such, the reactive control loop continuously monitors the delay of page loads, as observed by the physician, in the COSARA application. To do this, the execution times of all all queries in the DLS component are monitored (FOCALE’s monitor step) and summed to a total delay as several queries will be responsible for a single page load (observe step). When this total delay is unacceptably high, denoted by the threshold t reactive , an alarm is raised in the control loop (Compare step). The effect of this alarm is the following: on one hand the administrator gets notified of the data problem to allow him to have a closer look of the root cause of the anomaly. On the other hand, an automatic action is also taken to ensure a graceful degradation of the system. Therefore, the automatic execution of a subset of queries, corresponding to less important data retrievals (e.g., cron jobs, side information) is disabled for a time window W reactive . As the total number of queries will decrease, the goal of the reactive control loop is to considerably reduce the overall perceived delay. For example, to improve the execution of microbiology samples, redundant queries of urine sediment are disabled because these queries are not shown in the module. If the physician wants to consult this urine value, a warning informs him that the query is disabled temporarily. The physician can retrieve the value by clicking a request button, in which case the value is retrieved using a duplicate urine sediment query (which can only be executed on request and is not filtered). As these urine sediment queries will only be executed when the data is actually required by the physician, the number of queries will be considerably reduced.
Deliberative control loop
In the deliberative control loop, decisions and actions are made using an anomaly detection algorithm. In this loop there is an explicit evaluation of the decision before acting. This control loop continuously monitors the query execution time of each individual query and groups them according to the query type. Note that, in contrast to the reactive control loop, the monitoring occurs based on each individual query and not on the grouped perceived page load. By monitoring each query type, a specific model is built that represents the typical expected query execution time of each query type. Based on this model, an anomaly detection algorithm can detect out of profile behaviour, i.e., outliers. If the share of recently detected outliers becomes abnormally high, a similar query disablement action as carried out in the reactive control loop is executed. Queries are proactively disabled when a disruption of the system is likely to occur (i.e., signalled by an increased share of abnormal individual query executions). This is in contrast to the reactive control loop where queries are disabled after a system disruption is detected. Therefore, the deliberative control loop detects patterns that typically occur just before a shortage of resources (leading to high response times) and enforces a pro-active disablement of queries to avoid the resource shortage. Additionally the deliberative control loop incorporated knowledge from a domain expert to detect the outliers. As such, the control loop consists of a training phase, where the system is trained to build the model and detect outliers, and a deployment phase, where the outliers are detected on-line and appropriate actions are taken. We discuss the algorithmic details of both phases in the remainder of this section.
Here, μ(D train ) denotes the mean of D train , while σ(D train ) is the standard deviation of D train . The larger the calculated z-score is, the more likely the sample x is to be an outlier. However, it is difficult to define a threshold for this as this depends on the distribution of the dataset, which is unknown. To address this, in a third and final step, the remaining dataset D test ≡ D∖D train is used for determining a threshold z t . Based on this threshold, a random sample x can be classified as an outlier (if z(x) > z t ) or not. Note that D outlier ⊂ D test : hence, D test will contain both normal and out of profile query execution times. For each x ∈ D test , the z-score as defined in Equation (1) is calculated. Furthermore, for several possible values of z t , the samples x are classified and the classification is compared with the labelling of D outlier and D normal by the domain expert. By comparing the classification for a given z t parameter configuration and the classification by the domain expert, the best z t parameter configuration can be chosen. More specifically, we select the z t parameter that maximises the precision and recall values, two metrics that are used to assess the accuracy of a classification system. Precision is calculated as the number of true positives (i.e. the number of true outliers) divided by the total number of elements belonging to the positive/outliers class (i.e. the number of detected outliers by the algorithm, and also including those that were listed as outlier but are not observed outlier). Recall is defined as the number of true positives divided by the total number of elements observed as outlier (i.e. the number of outliers that were detected, and also including those that were missed by the outlier detection). The result of the training phase is (i) a model for each query type, defined through the mean and standard deviation of a population of that type, that defines the normal behaviour of query execution times for that query type and (ii) a threshold z t for each query type that can be used to perform an on-line outlier detection in the deployment phase.
Once trained, the calculated model and threshold can be used to detect the occurrence of outliers on-line for each query type and act accordingly. Therefore, the deliberative control loop will continuously monitor the query execution times for each individual query and will classify each execution time as being normal or out profile according to the trained configuration. Next, the share of outliers compared to the total set of queries in the last time window W delib is continuously calculated. If the calculated share exceeds a predefined threshold s outlier , the deliberative control loop assumes that there is a high risk of system degradation. As a reaction, it decides to execute actions that can reduce the typical query execution times (e.g., disabling other queries as discussed in Section 'Identification of candidate queries for disablement’.
Reflective control loop
In the reflective control loop, the long term memory is taken into account to take proactive actions (i.e, before the execution of the actual queries). The reflective loop detects temporal patterns in the occurrence of outliers and proactively disables low priority queries. In practice, the COSARA system often experiences quality degradations during peak periods (e.g., at the beginning and end of the work day of physicians). The goal of the reflective control loop is to autonomically detect these peak periods and disable the queries accordingly. Therefore, the reflective control loop has a similar goal as the deliberative control loop: it proactively disables queries to avoid high response times. The main difference is that the reflective control loop focuses on diurnal effects (i.e., patterns that can be observed on a daily basis) and disables queries for a longer time period (i.e., at least 30 minutes) based on detected historical patterns on a very long time frame (i.e., several weeks).
The reflective control loop works as follows. Based on the data set D outlier , constructed in the deliberative control loop using the z-score-based anomaly detection algorithm, a new data set D refl is derived. As discussed, D outlier contains all queries (with their response time and time of execution information), which are identified by the deliberative control loop of having abnormally high execution times (i.e., being outliers).
Based on this set of outliers, the data set D refl denotes the frequency of outliers given the current time of day. To determine the frequency, we use bins of corresponding to a 30 minute time window. Hence, D refl contains 48 elements. Note that, two outliers occurring on complete different days but on the same time of day will be assigned to the same bin. The relevance of this newly constructed data set is the following: we observed that peak periods often occur at the moment in time across days. This is because the physicians and nurses often use the COSARA system, as part of their routine, at the same time each day. The typical busy hours every day correspond with the start and end of every working day as well as the lunch break (around noon).
In this context, the detection of an outlier signifies a time window of 30 minutes, where the deliberative control loop has found an abnormal amount of high response times. Hence, this builds further on the knowledge learned in the previous control loop. If such a time window is identified as an outlier, the queries are disabled during that time window. This is done pro-actively on every day in the future, until the reflective control loop no longer flags this as a busy period.
Results and discussion
In this section we study the influence of the actions of the FOCALE based reactive and deliberative control loops on the query execution time. The executed queries of a random day in January 2012 were taken and executed again in a test environment, as described in the evaluation setup. The duration of each COSARA query was measured. In these experiments we evaluated the impact of an immediate action in reactive control loop, the decision to take action by the described anomaly detection algorithm in the deliberative loop and its impact.
Performance evaluation of the control loops
Performance evaluation of the reactive control loop
In the reactive control loop an action is immediately taken if the performance is severely affected. We set parameter t reactive to 90,000 ms and parameter W reactive to 2 minutes. The value t reactive corresponds to the execution time, which is considered very obstructive by the COSARA user. It is expected that similar high execution times are prevented by setting a time window of 2 minutes for the action.
Performance evaluation of the deliberative control loop
Performance evaluation of the reflective control loop
The proposed solution introduces an alternative to simply increasing the amount of resources by upgrading the physical infrastructure. The control loop solution is viable as the response times in query processing experience important peak periods during certain moments in time (e.g., the beginning and end of the day). Upgrading the physical infrastructure to accommodate these peaks is possible, but at the same time costly as the infrastructure would often be idle during less busy periods. As our solution takes care of the abnormally high peaks in response time, the result is a more flat behaviour of response time over the day. Therefore, if the system’s usage would increase further, deploying alternative solutions such as database replication in combination with our proposed solution will be more advantageous.
Overall, the solution has a high scalability for several reasons. All three control loops rely on detecting peaks based on summarized data. This means that the memory consumption does not grow linearly with an increasing number of users of the system. Furthermore, the control loops introduce only a marginal overhead in terms of computational complexity. Finally, in the design of all three loops, care has been taken to maintain a good scalability. For example, the reactive control loop was deliberately kept relatively simple in terms of computational complexity and memory consumption as it needs to run at a very high frequency (i.e., in the order of seconds). Control loops which run on a more daily basis (e.g., the reflective control loop) are allowed to introduce a higher complexity. Note that, as the reflective control loop mainly relies on clustering, it also has a high scalability as the number of users increases.
This paper presents the extension of the existing health care platform COSARA in the ICU with autonomic control loops. The introduced control loops provide an automated mechanism to detect low performance and to take action, thereby limiting human technical interventions. The monitoring of the execution times of the data queries of this real life intensive care platform allow the investigation of low performance. A reactive, deliberative and reflective control loop have been proposed to optimize the data query performance and thus the page load of the microbiology module. In the reactive control loop the action is immediately taken when the performance of the system is affected. The action disables less important queries not relevant for the display of microbiology data. In the deliberative control loop we use an anomaly detection algorithm with an explicit evaluation of the decision before the action is taken. In the reflective control loop, proactive actions are taken after temporal patterns of outliers are detected. We evaluated the impact of the reactive, deliberative and reflective control loop on the query execution of the microbiology data. The results show a time reduction of 8.61% by the reactive control loop on the average query execution times. The addition of the deliberative control loop reduced the average query execution time by 10.9% and by combining the three control loops the average execution time was reduced by 13.04%.
The authors would like to thank the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT) for supporting the COSARA research project.
- Kephart JO, Chess DM: The vision of autonomic computing. Computer. 2003, 36: 41-50. 10.1109/MC.2003.1160055.http://dx.doi.org/10.1109/MC.2003.1160055,View ArticleGoogle Scholar
- Steurbaut K, Colpaert K, Gadeyne B, Depuydt P, Vosters P, Danneels C, Benoit D, Decruyenaere J, De Turck F: COSARA: integrated service platform for infection surveillance and antibiotic management in the ICU. J Med Syst. 2012, 36 (6): 3765-3775. 10.1007/s10916-012-9849-8.http://dx.doi.org/10.1007/s10916-012-9849-8,View ArticlePubMedGoogle Scholar
- Pour G: Prospects for expanding telehealth: multi-agent autonomic architectures. Computational Intelligence for Modelling, Control and Automation, 2006 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on. 2006, Sydney: IEEE, 130-130.http://dx.doi.org/10.1109/CIMCA.2006.166,Google Scholar
- Strowes S, Badr N, Heeps S, Lupu E, Sloman M: An event service supporting Autonomic Management of Ubiquitous Systems for e-Health. 26th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW'06). 2006, Lisbon: IEEE, 22-22.http://dx.doi.org/10.1109/ICDCSW.2006.17,View ArticleGoogle Scholar
- Lupu E, Dulay N, Sloman M, Sventek J, Heeps S, Strowes S, Twidle K, Keoh SL, Filho AS: AMUSE: Autonomic Management of Ubiquitous e-Health Systems. Concurr Comput Pract Exper. 2008, 20 (3): 277-295. 10.1002/cpe.1194.http://dx.doi.org/10.1002/cpe.v20:3,View ArticleGoogle Scholar
- Zhu Y, Sloman M, Lupu E, Loong Keoh S: Vesta: A secure and autonomic system for pervasive healthcare. Proceedings of the 3d International ICST Conference on Pervasive Computing Technologies for Healthcare. 2009, London: ICST, 1-8.http://dx.doi.org/10.4108/ICST.PERVASIVEHEALTH2009.5939,Google Scholar
- Omar WM, Samir K, Taleb-Bendiab A: Autonomic model for managing complex healthcare applications. Fourth IEEE International Workshop on Engineering of Autonomic and Autonomous Systems (EASe'07). 2007, Newcastle-under-Lyme: IEEE, 94-98.http://dx.doi.org/10.1109/EASE.2007.7,View ArticleGoogle Scholar
- Almomen S, Menascé DA: An autonomic computing framework for self-managed emergency departments. HEALTHINF. Edited by: Traver V, Fred ALN, Filipe J, Gamboa H, Traver V, Fred ALN, Gamboa H. 2011, Rome: SciTePress, 52-60. [http://dblp.uni-trier.de/rec/bibtex/conf/biostec/AlmomenM11],Google Scholar
- Arozarena P, Toribio R, Kielthy J, Quinn K, Zach M: Probabilistic fault diagnosis in the MAGNETO autonomic control loop mechanisms for autonomous management of networks and services. HEALTHINF Volume 6155 of Lecture Notes in Computer Science. 2010, Berlin, Heidelberg: Springer Berlin / Heidelberg, 102-105.http://dx.doi.org/10.1007/978-3-642-13986-4_14,Google Scholar
- Movahedi Z, Ayari M, Langar R, Pujolle G: A survey of autonomic network architectures and evaluation criteria. Commun Surv Tutorials, IEEE. 2012, 14 (2): 464-490.http://dx.doi.org/10.1109/SURV.2011.042711.00078,View ArticleGoogle Scholar
- Marquezan CC, Granville LZ: State of the art self-* and P2P for network management. Communications Surveys & Tutorials, IEEE, SpringerBriefs in Computer Science. 2012, London: Springer London, 5-25.http://dx.doi.org/10.1007/978-1-4471-4201-0_2,Google Scholar
- Smith D, Guan Q, Fu S: An anomaly detection framework for autonomic management of compute cloud systems. Computer Software and Applications Conference Workshops (COMPSACW) 2010 IEEE 34th Annual. 2010, IEEE, 376-381.http://dx.doi.org/10.1109/COMPSACW.2010.72,View ArticleGoogle Scholar
- Chandola V, Banerjee A, Kumar V: Anomaly detection: a survey. ACM Comput Surv. 2009, 41 (3): 1-72.http://dx.doi.org/10.1145/1541880.1541882,View ArticleGoogle Scholar
- Rabatel J, Bringay S, Poncelet P: Anomaly detection in monitoring sensor data for preventive maintenance. Expert Syst with Appl. 2011, 38 (6): 7003-7015. 10.1016/j.eswa.2010.12.014.http://dx.doi.org/10.1016/j.eswa.2010.12.014,View ArticleGoogle Scholar
- Strassner J, Hong JWK, van der Meer S: The design of an autonomic element for managing emerging networks and services. Ultra Modern Telecommunications & Workshops, 2009, ICUMT. International Conference on. 2009, Saint-Petersburg: IEEE, 1-8.http://dx.doi.org/10.1109/ICUMT.2009.5345533,View ArticleGoogle Scholar
- Kim SS, Seo S s, Kang JM, Hong JWK: Autonomic fault management based on cognitive control loops. Network Operations and Management Symposium (NOMS) 2012 IEEE. 2012, Maui: IEEE, 1104-1110.http://dx.doi.org/10.1109/NOMS.2012.6212036,Google Scholar
- Lozano JA, Castro A, González JM, López de Vergara JE, Villagrá VA, Olmedo V: Autonomic Provisioning Model for Digital Home Services Modelling Autonomic Communications Environments. Modelling Autonomic Communications Environments Volume 5276 of Lecture Notes in Computer Science. Edited by: Meer S, Burgess M, Denazis S. 2008, Berlin, Heidelberg: Springer, Berlin / Heidelberg, 114-119.http://dx.doi.org/10.1007/978-3-540-87355-6_11,Google Scholar
- Sloman A: Designing human-like minds. In Proceedings of the 1997 European Conference on Arti Life (ECAL-97). 1997, Brighton, Heidelberg: Springer Berlin, [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.210.3370],Google Scholar
- Famaey J, Latré S, Strassner J, Turck FD: Semantic context dissemination and service matchmaking in future network management. Int J Netw Manag. 2012, 22 (4): 285-310. 10.1002/nem.805.http://dx.doi.org/10.1002/nem.805,View ArticleGoogle Scholar
- Choi T, Lee TH, Kodirov N, Lee J, Kim D, Kang JM, Kim S, Strassner J, Hong JK: HiMang: Highly manageable network and service architecture for new generation. Commun Netw, J. 2011, 13 (6): 552-566.View ArticleGoogle Scholar
- Kim S, Kang JM, Seo S s, Hong JWK: A cognitive model-based approach for autonomic fault management in OpenFlow networks. Int J Netw Manag. 2013. pre–print,http://dx.doi.org/10.1002/nem.1839,Google Scholar
- Abdelzaher T, Diao Y, Hellerstein J, Lu C, Zhu X: Introduction to control theory and its application to computing systems. Performance Modeling and Engineering. Edited by: Liu Z, Xia C. 2008, US: Springer, 185-215.http://dx.doi.org/10.1007/978-0-387-79361-0_7,View ArticleGoogle Scholar
- Abdelzaher T, Stankovic J, Lu C, Zhang R, Lu Y: Feedback performance control in software services. Control Syst IEEE. 2003, 23 (3): 74-90. 10.1109/MCS.2003.1200252.View ArticleGoogle Scholar
- Lu C, Lu Y, Abdelzaher T, Stankovic J, Son SH: Feedback control architecture and design methodology for service delay guarantees in web Servers. Parallel Distributed Syst IEEE Trans. 2006, 17 (9): 1014-1027.View ArticleGoogle Scholar
- Hellerstein JL, Morrison V, Eilebrecht E: Applying control theory in the real world: experience with building a controller for the.NET thread pool. SIGMETRICS Perform Eval Rev. 2010, 37 (3): 38-42. 10.1145/1710115.1710123.http://doi.acm.org/10.1145/1710115.1710123,View ArticleGoogle Scholar
- Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J: Using control theory to achieve service level objectives in performance management. Integrated Network Management Proceedings, 2001, IEEE/IFIP International Symposium on. 2001, Seattle: IEEE, 841-854.Google Scholar
- Hellerstein JL, Zhang F, Shahabuddin P: A statistical approach to predictive detection. Comput Netw. 2001, 35: 77-95. 10.1016/S1389-1286(00)00151-1. [http://www.sciencedirect.com/science/article/pii/S1389128600001511],View ArticleGoogle Scholar
- Hameurlain A, Morvan F, El Samad M: Large scale data management in grid systems: a survey. Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on. 2008, Damascus, 1-6.View ArticleGoogle Scholar
- Doulkeridis C, Nørvåg K: A survey of large-scale analytical query processing in MapReduce. The VLDB J. 2013. pre–print,http://dx.doi.org/10.1007/s00778-013-0319-9,Google Scholar
- Paton NW, de Aragão MA, Fernandes AA: Utility-driven adaptive query workload execution. Future Generat Comput Syst. 2012, 28 (7): 1070-1079. 10.1016/j.future.2011.08.014. [http://www.sciencedirect.com/science/article/pii/S0167739X11002123],View ArticleGoogle Scholar
- Park HK, Lee WS: Adaptive optimization for multiple continuous queries. Data Knowl Eng. 2012, 71: 29-46. 10.1016/j.datak.2011.07.008. [http://www.sciencedirect.com/science/article/pii/S0169023X11001054],View ArticleGoogle Scholar
- Avnur R, Hellerstein JM: Eddies: continuously adaptive query processing. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. 2000, New York: ACM, 261-272.http://doi.acm.org/10.1145/342009.335420,View ArticleGoogle Scholar
- Suresh V, Chaudhuri D: Dynamic scheduling—a survey of research. Int J Production Econom. 1993, 32: 53-63. 10.1016/0925-5273(93)90007-8. [http://www.sciencedirect.com/science/article/pii/0925527393900078],View ArticleGoogle Scholar
- Jiang C, Wang C, Liu X, Zhao Y: A survey of job scheduling in grids. Advances in Data and Web Management, Volume 4505 of Lecture Notes in Computer Science. Edited by: Dong G, Lin X, Wang W, Yang Y, Yu J. 2007, Heidelberg: Springer Berlin, 419-427.http://dx.doi.org/10.1007/978-3-540-72524-4_44,Google Scholar
- Bharathi S, Chervenak A: Scheduling data-intensive workflows on storage constrained resources. Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09. 2009:3, New York: ACM, 1-3:10.http://doi.acm.org/10.1145/1645164.1645167,View ArticleGoogle Scholar
- Kerr K, Norris T, Stockdale R: Data quality information and decision making: A Healthcare case study. 18th Australasian Conference on Information Systems. 2007, Melbourne: Association for Information Systems ResearchGoogle Scholar
- Rector A, Rogers J, Zanstra P, Van der Haring E: OpenGALEN: open source medical terminology and tools. Proceedings of the AMIA Annual Symposium. 2003, American Medical Informatics Association: WashingtonGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/13/120/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.