Accessing the public MIMIC-II intensive care relational database for clinical research
© Scott et al.; licensee BioMed Central Ltd. 2013
Received: 15 August 2012
Accepted: 31 December 2012
Published: 10 January 2013
Skip to main content
© Scott et al.; licensee BioMed Central Ltd. 2013
Received: 15 August 2012
Accepted: 31 December 2012
Published: 10 January 2013
The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image.
QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”.
QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database  (http://physionet.org/mimic2) is a public research archive of data collected from patients in intensive care units (ICUs). Although other clinical research databases exist [2, 3], such databases are often privately owned, have highly restricted access or require fees for access. MIMIC-II has been fully deidentified in a Health Insurance Portability and Accountability Act (HIPAA) compliant manner and is available free of charge for public use, subject to completion of an appropriate online human-subjects training course and signing of a data use agreement. The database is available via PhysioNet [4, 5], a web-based resource for the study of physiologic data.
The data comprising MIMIC-II was collected at the Beth Israel Deaconess Medical Center in Boston, MA, USA from patients who were admitted from 2001 to 2008. The available clinical information includes: patient demographics, laboratory test results, vital sign recordings, fluid and medication records, charted parameters and free-text reports such as nursing notes, imaging reports and discharge summaries. There is a second component of MIMIC-II consisting of high resolution waveform recordings of electrocardiograms, blood pressures, pulse plethysmograms and other monitored signals that were archived from bedside monitors for a subset of the patients. The waveforms, and derived trends and alarms are the subject of much research interest [6–8]. Here, however, we focus primarily on the “clinical data”, stored in a relational database. For a detailed description of the MIMIC-II database, please see . The MIMIC-II project was approved by the Institutional Review Boards of the Beth Israel Deaconess Medical Center and the Massachusetts Institute of Technology (Cambridge, MA, USA). The requirement for individual patient consent was waived because clinical care was not affected and all protected health information (PHI) was deidentified.
As of the end of 2012, over 500 users have been approved for access to the MIMIC-II relational database, which reflects researchers’ interest in the clinical data of MIMIC-II. Numerous innovative and significant studies on a broad range of topics are based on MIMIC II and establish its importance. The software tools that make it feasible for a large worldwide community of investigators to draw on MIMIC II are essential contributors to its value and utility for intensive care research. Providing public access to a relational database for users who are geographically separated and from a wide range of backgrounds is a challenging task. While there are tools available for web-based administration such as phpPgAdmin  and even searching of clinical data , they are not always appropriate in any given situation. For MIMIC-II, we have developed an easy-to-use, read-only interface capable of performing exploratory searches and a more powerful tool for complex data processing. These two access tools currently serve as the main gateways to the MIMIC-II relational database. In the present article, we describe their implementations and vital roles in conducting clinical research using MIMIC-II.
The MIMIC-II relational database (version 2.6) contains records from over 32,000 subjects, including over 7,000 neonatal patients. The raw data is stored in various base tables, generally organized by subject, hospital and ICU-stay IDs. Several database views, which summarize and collate information, have been generated to allow users to become familiar with the available data and to find records of interest. Users can access the database via a web-based online tool (QueryBuilder) and a downloadable virtual machine (VM) image, which are discussed in the ensuing sections. Flat file exports of the database tables and a PostgreSQL compatible dump file are also available, but are not discussed here. To the best of our knowledge, there are no other publicly available software tools that allow users to query a clinical database in SQL (Structured Query Language), either in a web-browser or in a virtual machine environment.
QueryBuilder, which is accessible through desktop or mobile web browsers, allows users to explore the structure of the various tables and views in the database and to examine the relationships among them. SQL queries allow users to examine and process the data as desired; the resulting datasets can be exported in CSV (Comma-Separated Values) format for further processing. In order to prevent a given user from excessively consuming shared resources on QueryBuilder (e.g., exporting all tables in MIMIC-II), we limited the maximum number of exportable rows to 1,000.
The increasing number of users, and their desire to run more complex queries, has begun to overload the computer systems hosting QueryBuilder. To mitigate this problem, we have developed a system allowing users to run a copy of the relational database on their own computers, providing much faster, uncongested access using a VM. A VM is a completely isolated operating system installation that can be run within a host environment. The MIMIC-II VM employs Oracle’s VirtualBox virtualization environment, providing an Ubuntu 10.04 Linux operating system distribution and a pre-configured PostgreSQL 8.4 database server.
To use the MIMIC-II VM, users must first install the VirtualBox host software, and download the MIMIC-II VM image for import into VirtualBox. Once the VM has been started, a simple script will download and import the MIMIC-II database into the local PostgreSQL server. The resulting system contains a complete clone of the MIMIC-II relational database that can be queried using a command line client, a GUI (Graphical User Interface) desktop application (pgAdmin III), and JDBC interfaces. The VM also includes an SQL cookbook which is a compilation of example SQL queries that users can use as a starting point for their research studies.
We also created a demo VM (and, to suit users’ preferences, a bootable ISO image) containing data from 4,000 patients who have been deceased for two years or more. Since the demo VM and the ISO image contain neither PHI, nor free text, nor any data from recently living individuals, they are exempt from HIPAA restrictions, and interested researchers may download them freely.
Both QueryBuilder and the VM are routinely used by researchers around the world. As of November 6, 2012, 508 MIMIC-II users have a QueryBuilder account. Between January 1 and November 7, 2012, an average of 4.5 users logged into QueryBuilder per day and a total of 221 unique users logged in at least once. The VM with complete data have been downloaded 129 times between July 2011 and October 2012, whereas the demo VM and bootable image have been downloaded 2,234 times between August 2011 and October 2012.
Although QueryBuilder and the VM provide immediate access to the MIMIC-II relational database, the user needs to have working knowledge in both SQL and the MIMIC-II database schema. SQL is a rare skill among clinicians, and becoming familiar with the structure of the MIMIC-II clinical data requires substantial time and effort. In order to guide new MIMIC-II users, we present a few example queries and research studies in the subsequent sections. We recommend using QueryBuilder for the simple example queries in Section “Example usage” and using the VM for the computationally expensive studies in Section “Applications”.
The ICUSTAY_DETAIL view summarizes ICU stays for all patients and can be used to obtain general statistics for the entire population of MIMIC-II. The following example query obtains ICU mortality statistics broken down by gender.
SELECT gender, icustay_expire_flg, COUNT(*) FROM mimic2v26.icustay_detail WHERE subject_icustay_total_num = 1 GROUP BY gender, icustay_expire_flg
ICU mortality statistics
The ICUSTAY_DETAIL table is also used to obtain patient cohorts by using the WHERE clause to restrict the query to obtain ICU stays of interest. The second example query obtains all ICU stays for patients who have a SAPS (Simplified Acute Physiology Score) I  score between 15 and 20, are between 20 and 30 years old, had 2 ICU admissions in total and died in the hospital.
SELECT icustay_id, subject_id, gender, dob, dod FROM mimic2v26.icustay_detail WHERE icustay_admit_age BETWEEN 20 AND 30 AND sapsi_first BETWEEN 15 AND 20 AND subject_icustay_total_num = 2 AND hospital_expire_flg = ‘Y’
Specific patient cohort
The patient’s age was not between 20 and 30 for one of his two ICU admissions.
We can obtain all of the ICU stays for the subjects who were returned in Table 2 by querying for specific subject IDs.
SELECT icustay_id, subject_id, icustay_admit_age, sapsi_first, icustay_intime, icustay_outtime FROM mimic2v26.icustay_detail WHERE subject_id IN (4828,24431,27109)
3081-01-02 14:13:00 -05:00
3081-01-02 21:39:00 -05:00
3081-08-30 21:41:00 -05:00
3081-09-03 04:09:00 -05:00
2726-09-22 00:17:00 -05:00
2726-09-29 20:58:00 -05:00
2726-10-02 18:44:00 -05:00
2726-10-02 18:51:00 -05:00
3061-06-17 17:07:00 -05:00
3061-06-20 13:43:00 -05:00
3061-06-24 03:44:00 -05:00
3061-07-09 21:51:00 -05:00
The MIMIC-II database contains complex, detailed data and apparently simple queries can return unexpected results. The rich, detailed information it contains has stimulated a variety of research interests.
MIMIC-II has attracted research in data mining, pattern recognition and signal processing. There have been a wide variety of publications based on the data contained within MIMIC-II and its public availability encourages reproducible research and permits comparison of results. We now discuss two of the recent research problems that have been investigated using data from MIMIC-II. Subsequently, the PhysioNet/Computing in Cardiology (CinC) Challenges that utilized MIMIC-II are also described. The examples below illustrate what kinds of clinical research are possible with the MIMIC-II relational database.
Acute kidney injury (AKI) is a serious and frequent condition in critically ill patients . There are established criteria  defining three severities of AKI based on patient urine output over 6, 12 or 24 hour periods and increases in serum creatinine levels over a two-day window. MIMIC-II contains hourly urine output measurements and daily serum creatinine laboratory test results that permit a thorough investigation into the AKI classifications. Using the data in MIMIC-II, we were able to determine AKI stages for all patients and build multivariate logistic regression models to determine whether AKI stages can be used as biomarkers of increased hospital mortality . Owing to the high temporal resolution of the data, we were able to build models for a large range of urine output thresholds and durations to determine that the existing AKIN definitions employ clinically meaningful criteria .
The first 72 hours after admission are critical for ICU patients. Suboptimal fluid management during this period can result in episodes of hypotension, leading to reduced organ perfusion. In practice, clinicians perform the difficult task of estimating maintenance fluid requirement by estimating fluid loss. Providing an accurate prediction of a patient’s fluid requirements would assist clinicians in making their decision.
MIMIC-II contains detailed fluid input/output measurements as well as vasopressor administration, demographics and physiologic variables. Using data from the first day of a patient’s ICU admission in a linear regression model combined with a Bayesian network, Celi et al. were able to accurately estimate patient fluid requirements for day twoa.
The annual PhysioNet/CinC Challenges (http://www.physionet.org/challenge/) invite participants to tackle clinically interesting problems. The challenges in 2009  to predict hypotensive episodes in the ICU and 2010  to attempt to reconstruct missing or corrupted signals, both used data from the MIMIC-II database. The 2012 PhysioNet challenge entitled “Predicting Mortality of ICU Patients” also used MIMIC-II data and asked participants to develop a patient-specific prediction of in-hospital mortality. The dataset consisted of MIMIC-II records from 12,000 ICU stays each at least 48 hours in duration providing up to 41 different variables. Five of the 41 were “general descriptors” (recordID, age, gender height and weight), recorded once, on admission. The remainder were “time series” variables such as vital signs and laboratory test results and were recorded multiple times throughout the 48 hour period. The aim of the challenge was to predict for each patient, whether they died in the hospital. Participants discussed their approaches to the challenge problem during the CinC 2012 conference (http://physionet.org/challenge/2012/).
The MIMIC-II database is a valuable research tool that is gaining popularity as it is expanded and improved over time. Its clinical data can be accessed using a variety of methods, including the web-based QueryBuilder and standalone virtual machine technology. These publicly available software tools play a vital role in connecting a broad community of researchers to MIMIC-II, providing them with immediate access to a one-of-a-kind ICU database and making it feasible for them to perform a wide variety of innovative studies with it.
Explore the clinical data in MIMIC-II using the demo VM and conduct a feasibility test for an envisioned research study.
Use QueryBuilder to conduct a further feasibility test by looking in the tables that are not part of the demo VM and by checking cohort size.
Write and debug an appropriate SQL query in QueryBuilder to extract desired patient data.
If the final SQL query requires substantial computing time or the results contain more than 1,000 rows, run the query in the VM with complete data.
In our experience, clinical research such as that presented in this article is best approached using an inter-disciplinary team combining clinicians who provide the research direction and interpretation of results with engineers who provide data extraction and statistical modeling .
Being a web-based tool, QueryBuilder ensures minimal setup time and effort. MIMIC-II users only need a web browser and Internet connection to be able to launch QueryBuilder. Installing a complete MIMIC-II VM on a local computer involves more steps and requires a longer time, but is an effective method when the shared resources for QueryBuilder become the bottleneck in conducting a research study.
We are working to introduce and improve tools for searching and visualizing the data available in MIMIC-II. Our existing QueryBuilder and VM require users to know or to learn SQL; our next generation of tools will provide an intuitive graphical interface that will be immediately accessible to a wider user community that includes many more clinicians. Additionally, we are expanding the database by adding additional patient records, and enlarging the records of existing patients. Improved tools and expansion of the database will further support retrospective clinical research.
In the present article, we have discussed simple example SQL queries as well as representative clinical studies that have been performed using MIMIC-II. We have also described the PhysioNet/CinC 2012 Challenge “Predicting Mortality of ICU Patients”. These examples hint at the range of problems that can be studied using MIMIC-II. They illustrate how investigators can formulate and answer research questions using open-source tools to explore the rich contents of the first (and so far the only) large and publicly available database for intensive care research.
MIMIC-II is an invaluable public database for intensive care research, and we have successfully developed two freely available tools that facilitate accessing MIMIC-II. QueryBuilder is a web-based tool that allows a user to query MIMIC-II in SQL. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. A demo VM is also available for interested users who wish to explore MIMIC-II with minimal setup time. We believe that QueryBuilder and the MIMIC-II VM are integral parts of the MIMIC-II research community, which is corroborated by extensive utilization of both tools by MIMIC-II users.
Project name: QueryBuilder and MIMIC-II virtual machine
Project home page: http://physionet.org/mimic2
Operating system(s): Platform independent
Programming language: Java, SQL
Other requirements: Any web browser, Oracle VirtualBox
License: Open source
Any restrictions to use by non-academics: None
aThe provided accuracy was 77.8%, which is the percentage of correctly estimated fluid requirements when the actual fluid requirements in the test dataset were divided into quartiles.
This research was supported by grant R01-EB001659 and U01-EB008577 from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health (NIH). JL also holds a Postdoctoral Fellowship from the Natural Sciences and Engineering Research Council of Canada (NSERC).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.