Skip to main content

An automated ICU agitation monitoring system for video streaming using deep learning classification



To address the challenge of assessing sedation status in critically ill patients in the intensive care unit (ICU), we aimed to develop a non-contact automatic classifier of agitation using artificial intelligence and deep learning.


We collected the video recordings of ICU patients and cut them into 30-second (30-s) and 2-second (2-s) segments. All of the segments were annotated with the status of agitation as “Attention” and “Non-attention”. After transforming the video segments into movement quantification, we constructed the models of agitation classifiers with Threshold, Random Forest, and LSTM and evaluated their performances.


The video recording segmentation yielded 427 30-s and 6405 2-s segments from 61 patients for model construction. The LSTM model achieved remarkable accuracy (ACC 0.92, AUC 0.91), outperforming other methods.


Our study proposes an advanced monitoring system combining LSTM and image processing to ensure mild patient sedation in ICU care. LSTM proves to be the optimal choice for accurate monitoring. Future efforts should prioritize expanding data collection and enhancing system integration for practical application.

Peer Review reports


Assessing sedation in non-communicative critically ill patients is crucial. Excessive sedation can prolong mechanical ventilation and increase morbidity and mortality, while insufficient sedation may cause agitation, anxiety, and pain [1, 2]. Hence, an evaluating sedation tool is crucial for monitoring the sedation levels of critically ill patients. Sedation tools, such as the Bispectral index (BIS) [3], are recommended but not universally available. Currently, nurse-protocolized (N-P) targeted sedation protocols, employing scales like the Richmond Agitation-Sedation Scale (RASS), are commonly used [4,5,6]. Unfortunately, they cannot continuously monitor the sedation levels to titrate the sedatives.

The COVID-19 pandemic has prompted a heightened emphasis on wireless sensing technologies to reduce human interactions and prioritize non-contact healthcare, particularly for healthcare workers, to mitigate virus spread [7]. Utilizing continuous and remote non-contact monitoring systems has proven effective in detecting various health conditions such as sleep disorders, heart failure, arrhythmia, activity levels, and stress [8,9,10,11]. This approach aligns with infection control measures and enables real-time optimization of care through fine-tuned treatment strategies.

Recent advances in deep learning and AI models have gained popularity in clinical applications for disease diagnosis and prevention. Fang et al. developed a video-based non-invasive respiration monitoring system that detects infants’ respiratory frequency to alert caregivers to potential incidents and mitigate Sudden Infant Death Syndrome (SIDS) risks [12]. Another study demonstrated the effectiveness of deep learning-based pain classifiers using facial expressions for automated pain assessment in critically ill patients, achieving promising accuracy in both image- and video-based classifiers. Additionally, deep learning can be applied to screen for depression, observe behaviors, track posture, and monitor epilepsy [8, 13].

Our study aimed to design an AI-assisted automatic classifier of agitation, which could be applied in a non-contact, continuous sedation monitoring system. The system could aid nurses in assessing and monitoring the movement of intensive care unit patients and facilitate timely intervention and treatment based on the assessment outcomes. Using artificial intelligence and deep learning, we successfully extracted the features of real-time video and constructed the models to classify the agitation status automatically.

Materials & methods


This study was conducted in the intensive care units of Taichung Veterans General Hospital (TCVGH), a 1530-bed medical center in central Taiwan. The study was approved by the Institutional Review Board and Ethics Committee of TCVGH (IRB No. CG21307B). Informed consents were obtained, and digital video recordings of ICU patients were taken without disrupting standard care. Exclusion criteria were applied to patients under 20, pregnant individuals, and HIV patients. The patient’s age, sex, RASS, and restraint status were recorded too.

Research framework

The study consisted of seven major steps

(1) patient video collection in the ICU, (2) video segmentation, (3) annotation, (4) patient movement quantification (MediaPipe, Background Subtractor MOG2), (5) Data preprocessing (data replacement, normalization), (6) model construction with three methods, (7) evaluation of model performance (Fig. 1).

Fig. 1
figure 1

Research framework consisted of 7 steps: (1) patient video collection in ICU, (2) video segmentation, (3) annotation, (4) patient movement quantification (MediaPipe, Background Subtractor MOG2), (5) Data preprocessing (data replacement, normalization), (6) model construction with three methods, (7) evaluation of model performance

Patient video collection

The digital video recordings were captured with a 4 K webcam at 1080p/30fps from the patients in the ICUs of Taichung Veterans General Hospital. Each patient, on average, had 8 min of recorded video footage.

Video segmentation

A subset of patients with the sedation level RASS ≤ -3 were excluded because of deep sedation and no movement. Finally, the study included 61 patients. Video recordings were cut into 30-second (30-s) intervals for continuous observation and categorization. Subsequently, each 30-s segment was cut into 2-second (2-s) sub-segments for single-action classification. Cases with more than 10 s of interference, such as caregiver interventions or camera shake within 30 s, were excluded. In total, 427 30-s segments for continuous observation and 6405 2-s sub-segments for single-action classification were obtained. (Fig. 2).

Fig. 2
figure 2

Data collection from patient enrollment to video recording segmentation


Based on clinical experience, movements in different body regions pose varying levels of risk. For instance, raising the hands was considered high-risk, while lifting the feet was perceived as lower risk. This differentiation aids the model in learning movement patterns more precisely, ensuring a more accurate assessment of the patient’s condition.

Three experienced ICU nurses were invited for annotation. Before annotating, they discussed the agitated features of postures and movements (head, trunk, and lower limbs) (Table 1). They reached the consensus as the following. Ten cases were randomly selected and marked by two nurses based on patient activity. A third nurse assisted in consensus for cases with different annotations from the first and second nurses. They annotated another 20 cases to validate their consensus in classifying “attention” and “no attention”. Attention was defined as the video recordings showing patients resisting the belt restriction or moving limbs or heads out of the bed with agitation and safety risks, around equal to RASS 2 to 4. The others without the above conditions were labeled as “no attention”. They labeled all the 30-s and 2-s segments.

Table 1 Definitions for Body Regions in Evaluation Criteria

Patient movement quantification

The MediaPipe machine learning framework developed by Google Research is highly valuable in the healthcare field. It is used to track hand movements and assess tremor in Parkinson’s disease, as well as diagnose low back pain by tracking joint positions in the body [14, 15]. Designed specifically for RGB video footage, the Pose model annotates 33 key joint positions for precise measurement.

This study determined the patient’s recumbent position (horizontal or vertical) by analyzing the distance between the y-coordinates of the left and right shoulders and between the right shoulder and right hip. For patients lying horizontally, the next step involved determining the head orientation by comparing the x-coordinates of the left shoulder node to the hip node. The head was above the coordinates of the right shoulder, the trunk was between the coordinates of the right shoulder and the right hip, and the lower limbs were below the coordinates of the right hip (Fig. 3).

Fig. 3
figure 3

Patient’s posture determined by Using MediaPipe Pose in Various Cases: Vertical Position. Considering patient privacy, this paper presents only body parts outside the head, with the patient’s head represented by a circular symbol

The OpenCV Background SubtractorMOG2 algorithm utilizes Gaussian Mixture Models (GMM) for background separation in videos [16]. It learns the background and isolates moving foreground objects by associating each image pixel with a Gaussian distribution. The distribution weight reflects the duration of a color’s presence, helping identify the background. The algorithm effectively separates moving foreground objects. The process involves motion detection, converting the video into a black-and-white image. White areas indicate patient movement, and higher feature values represent more significant movement. These values are calculated by summing and averaging frames within each two-second interval (Fig. 4).

Fig. 4
figure 4

Background subtraction and patient motion quantification by OpenCV Background Subtractor MOG2 algorithm and Gaussian Mixture Models (GMM) [16]

Data preprocessing

After converting the video into numerical data, this study replaced segments affected by external factors using the preceding adjacent numerical values to ensure that the model’s learning was not influenced. Additionally, the data is normalized to optimize the model parameters for this particular case.

Model construction

After preprocessing, the data was provided to the classification model. Thresholds and random forests were used for single-action classification (2-s).

  1. (1)


    The threshold method classified the head, trunk, and lower limbs into three movement severity levels: no movement (1 point), bed movement (2 points), and significant off-bed movement (3 points). Scores for each body part’s classification results were aggregated (3 to 9 points). Thresholds for each body part and classification outcomes were determined using box plots, ensuring clinical requirements were met through confusion matrix indicators.

  2. (2)

    Random forests:

    The Random Forest (RF) algorithm, a highly effective classification method, excels in accuracy for big data scenarios [17]. Utilizing ensemble learning, RF constructs multiple decision trees during training, deriving predictions from identified patterns. This study applied RF to machine learning with quantized movement data, aiming to classify patients every two seconds. Key parameters were set as follows: n_estimators = 100, max_features = auto, criterion = Gini.

  3. (3)

    Long Short-Term Memor (LSTM):

    LSTM simulated continuous observations and classified patients every 30 s. The LSTM model used in this study, featuring 20 hidden units, 2 stacked layers, an input size of 4, and a time step of 15. In this study, model hyperparameters were fine-tuned, including data split ratios, activation function methods, and the consideration of data normalization. The training process was visualized, calculating losses and accuracies on the validation set after each epoch and recording metrics for both training and validation sets. Ultimately, a model with optimal stability and performance was chosen. The data was split into 80% training and 20% testing. We set the parameters for the validation, categorical cross-entropy loss function, adam optimizer, and softmax activation function. The validation data was from the 10% of training set.

Evaluation of model performance

In this study, confusion matrices and ROC curves are utilized as evaluation metrics, including accuracy (ACC), precision (P), recall (R), F1_Score, and cross-validation (kfold = 10) was applied to ensure model stability. The relationship between sensitivity and specificity is also depicted in the ROC curve, and the area under the ROC curve (AUC) value is calculated.

In addition to evaluating model performance, this study used line charts to analyze patient movement over 30 s. The goal was to confirm if the model’s assessments and image quantification align with real-world scenarios. Two representative cases have been selected. Cases of attention involved significant cross-zone movements, posing potential risks, while cases of no attention related to bed movements. Through motion analysis, the study clearly illustrated these distinctions and provided quantified results.


System configuration

All experiments conducted in this paper were completed using the system configuration outlined in Table 2.

Table 2 System configuration

Patients and video recording

We collected the video recordings of 150 patients. Only 61 patients with RASS scores ≥ -2 were enrolled for analysis. The average age was approximately 60 years old. Male patients predominated (M/F 46/15), particularly in videos featuring patients with a RASS score 0 (Table 3). The video recordings of the 61 patients were cut into 427 30-s and 6405 2-s segments (Fig. 2).

Table 3 Agitation status of the 61 patients

Model construction

Thresholds definition

Figure 5 presents the threshold definitions by using a box plot. The specified cut-off values for different body parts were set at 0.8 and 5 for the head, 0.8 and 14 for the trunk, and 0.8 and 11 for the lower limbs (Figure 5A). Additionally, the aggregate scores for all body parts were subjected to a cut-off value of 5 (Figure 5B).

Fig. 5
figure 5

The threshold definitions by using a box plot. (A) Illustrates the threshold definitions for different body parts, with box plots encompassing quantified movement values and corresponding categories of no movement, in-bed movement, and significant out-of-bed movement. The bottom of the box plots presents the threshold definitions for various levels of movement. (B) Illustrates the threshold definitions for the aggregate score, with box plots comprising the aggregate scores for all body parts and their corresponding classification results (no attention and attention). The bottom of the box plots presents the threshold definitions for different categories

Model construction

In this study, the validation was conducted through confusion matrices and ROC curves to compare three classification methods. A cross-validation average accuracy(k-fold = 10) of RF and LSTM was 0.90. The LSTM model achieved the highest accuracy (ACC = 0.92). LSTM, using the time series data for classification, yielded the highest sensitivity (recall) for patients requiring attention and significantly improved various performance evaluation metrics (Table 4).

Table 4 Model performance among the models of Threshold, Random Forest, and LSTM

Additionally, by examining the ROC curve, it was found that the AUC performance of the LSTM model surpassed other methods (AUC = 0.91) (Fig. 6. This result emphasizes the outstanding performance of the LSTM model in simulating time series data of patient clinical observations.

Fig. 6
figure 6

AUROC of Threshold, Random Forest, and LSTM. AUROC is the highest in the LSTM model

Patient movement analysis

Patient movement analysis of all case

We classified them into “Attention” and “Non-attention.“. We further stratified them into “Non-attention Without Restraint Belt,” “Non-attention With Restraint Belt,” “Attention Without Restraint Belt,” and “Attention With Restraint Belt.”

Referring to Fig. 7, it becomes apparent that the outcomes of image analysis align with clinical observations. There exists a notable contrast in movement between patients classified as “Attention” and those as “Non-attention.” Patients in the “Attention” category exhibit significantly more extensive movements, including those spanning different body regions. Within the “Attention” category, a noteworthy distinction surfaces between patients with and without restraint belts, with patients under restraint belts displaying reduced movement in the trunk area.

Fig. 7
figure 7

Patient movement analysis for All Cases: the x-axis illustrates motion quantified value, the y-axis denotes the time axis (seconds), and distinct colored lines represent different body parts

Patient movement analysis of the representative cases

Patient 49 was categorized as “Non-attention,” with the image module detecting minimal head and limb movement. (Fig. 8a) Patient 58 was classified as “Attention,” with motion quantification revealing significant head and limb movements, including inter-regional motion (Fig. 8b). The analysis results from the image module align with the observed patient movements, demonstrating its accurate detection of displacement in each region. Due to privacy considerations, the patient’s head is not shown in the video. These movements are correlated with the analytical data, and corresponding videos will be included in the supplementary material.

Fig. 8
figure 9

Patient movement analysis (left) and video(right)of individual cases. (a) Patient 49 (b) Patient 58


Ensuring mild patient sedation in ICU care is crucial, but current clinical assessment methods encounter challenges like low frequency, subjectivity, and evolving professional standards, emphasizing the need for advanced, continuous monitoring methods [18, 19]. This study proposes a monitoring system that combines LSTM and image processing to address challenges such as ICU lighting variations and effective activity detection even when the patient is covered. The integrated AI technology enhances system accuracy, compensating for current monitoring limitations. Results align with expert observations.

Previous studies used cameras for agitation and sedation monitoring in the ICU. Chase et al. captured limb movements, quantifying sedation, and agitation levels using fuzzy logic methods [20]. Becouze et al. used cameras to record facial expressions, measuring agitation levels in critically ill patients [21]. Martinez et al. employed multiple cameras to observe patient behavior in the ICU for sedation control and accident prevention [22]. However, those researchers faced detection issues and a lack of detailed evaluation metrics.

This study compared three methods, and the results indicated that LSTM is the optimal choice. LSTM is renowned for its feature module’s selective retention of information and discarding unnecessary details, thereby having the potential to enhance performance [23]. Litton et al.‘s research demonstrated that expert-level diagnostic differentiation of various diseases can be achieved using electronic health records (EHR) and recurrent neural networks (RNN) [24]. LSTM technology aids healthcare professionals in diagnosis, prediction, and treatment, potentially enhancing efficiency and accuracy in the medical field and ultimately improving patient experiences and outcomes.

Despite significant progress, there is awareness of certain limitations. This study restricts the length and number of video segments related to patient safety and privacy concerns. These limitations include a relatively small number of cases and fewer cases with agitation (excluding 89 cases), along with marking only attention and no attention. However, practical judgment by clinical personnel confirms that this method holds clinical value in enhancing patient safety through continuous monitoring. Currently, manual interventions by healthcare professionals rely on manual pruning. Addressing these challenges requires improvements in smart device integration and workflows. Standardized methods, image transmission connections, and enhanced system security are crucial for monitoring system implementation and ensuring the legality, privacy, and reliability of the results. Future efforts can focus on expanding data collection, increasing the automation of medical interventions, and improving system integration and security to enhance practicality.

This study still holds significant value in clinical applications and provides solutions for future challenges. Despite existing challenges and risks, the potential benefits in patient care and reducing complications make these advancements promising for future clinical applications.


Our study proposes an advanced monitoring system combining LSTM and image processing to address challenges in ICU care. It offers continuous and accurate monitoring, crucial for ensuring mild patient sedation amidst evolving standards and subjective assessments. LSTM emerges as the optimal choice, leveraging its information retention capabilities for enhanced performance, as seen in other medical applications.

While limitations exist due to patient safety and privacy concerns, our system holds clinical value in enhancing patient safety through continuous monitoring. Addressing these challenges requires improvements in device integration, workflows, and system security. Future efforts should focus on expanding data collection and enhancing system integration and security for practicality.

Data availability

All data supporting the findings of this study are available within the paper and its Supplementary Information.


  1. Page V, McKenzie C. Sedation in the Intensive Care Unit. Curr Anesthesiology Rep. 2021;11(2):92–100.

    Article  Google Scholar 

  2. Jackson DL, et al. The incidence of sub-optimal sedation in the ICU: a systematic review. Crit Care. 2009;13(6):R204.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Devlin JW, et al. Clinical practice guidelines for the Prevention and Management of Pain, Agitation/Sedation, Delirium, Immobility, and sleep disruption in adult patients in the ICU. Crit Care Med. 2018;46(9):e825–73.

    Article  PubMed  Google Scholar 

  4. Sessler CN, et al. The Richmond agitation–sedation scale. Am J Respir Crit Care Med. 2002;166(10):1338–44.

    Article  PubMed  Google Scholar 

  5. Ely EW, et al. Monitoring sedation Status Over Time in ICU patients. JAMA. 2003;289(22):2983.

    Article  PubMed  Google Scholar 

  6. Riker RR, Picard JT, Fraser GL. Prospective evaluation of the sedation-agitation scale for adult critically ill patients. Crit Care Med. 1999;27(7):1325–9.

    Article  CAS  PubMed  Google Scholar 

  7. Saeed U, et al. Machine learning empowered COVID-19 patient monitoring using non-contact sensing: an extensive review. J Pharm Anal. 2022;12(2):193–204.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Jakkaew P, Onoye T. Non-contact respiration monitoring and body movements detection for Sleep using Thermal Imaging. Sens (Basel). 2020;20(21):6307. PMID: 33167556; PMCID: PMC7663997.

    Article  Google Scholar 

  9. Hu M, et al. Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation. PLoS ONE. 2018;13(1):e0190466.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Block VAJ, et al. Remote physical activity monitoring in neurological disease: a systematic review. PLoS ONE. 2016;11(4):e0154335.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wei J, et al. Transdermal Optical Imaging reveal basal stress via heart rate variability analysis: a novel methodology comparable to Electrocardiography. Front Psychol. 2018;9:98.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Fang CY, Hsieh HH, Chen SW. A Vision-Based Infant Respiratory Frequency Detection System. in 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA). 2015.

  13. Ahmed I, et al. Internet of health things driven deep learning-based system for non-invasive patient discomfort detection using time frame rules and pairwise keypoints distance feature. Sustainable Cities Soc. 2022;79:103672.

    Article  Google Scholar 

  14. Güney G, Jansen TS, Dill S, Schulz JB, Dafotakis M, Hoog Antink C, Braczynski AK. Video-Based Hand Movement Analysis of Parkinson Patients before and after medication using high-frame-rate videos and MediaPipe. Sensors. 2022;22:7992.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Hustinawaty T, Rumambi, Hermita M, Motion Detection Application to Measure Straight Leg Raise ROM Using MediaPipe Pose,. 2022 4th International Conference on Cybernetics and Intelligent System (ICORIS), Prapat, Indonesia, 2022, pp. 1–5,

  16. OpenCV. Background Subtraction. Accessed 21 Feb 2023.

  17. Shrivastava D, et al. Bone cancer detection using machine learning techniques. Smart Healthcare for Disease diagnosis and Prevention. Academic; 2020. pp. 175–83.

  18. Sessler CN, Gosnell MS, Grap MJ, Brophy GM, O’Neal PV, Keane KA, Tesoro EP, Elswick RK. The Richmond Agitation-Sedation Scale: validity and reliability in adult intensive care unit patients. Am J Respir Crit Care Med. 2002;166(10):1338-44. PMID: 12421743.

  19. Grap MJ, Hamilton VA, Ann McNallen JM, Ketchum AM, Best, Nyimas Y, Isti Arief PA. Wetzel, Actigraphy: Analyzing patient movement, Heart &Lung, Volume40, Issue3, 2011, Pagese52e59, ISSN 01479563

  20. Chase J, Geoffrey et al. Quantifying agitation in sedated ICU patients using digital imaging. Computer methods and programs in biomedicine 76.2 (2004): 131–41.

  21. Becouze P, Pierrick, et al. Measuring facial grimacing for quantifying patient agitation in critical care. Comput Methods Programs Biomed. 2007;87(2):138–47.

    Article  PubMed  Google Scholar 

  22. Martinez M, Stiefelhagen R. Automated multi-camera system for long term behavioral monitor ing in intensive care units, in MVA, pp. 97–100, 2013.

  23. Shung D, Huang J, Castro E, et al. Neural network predicts need for red blood cell transfusion for patients with acute gastrointestinal bleeding admitted to the intensive care unit. Sci Rep. 2021;11:8827.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lipton Z, Chase DC, Kale. Charles Peter Elkan and Randall C. Wetzel. Learning to Diagnose with LSTM Recurrent Neural Networks. CoRR abs/1511.03677 (2015): n. pag.

Download references


This study was supported by Taichung Veterans General Hospitals (TCVGH-1114404 C), National Science and Technology Council (NSTC 111-2634-F-A49-014-1) and National Science, and Technology Council (NSTC 112-2321-B-075 A-001-1-1). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.



Author information

Authors and Affiliations



P.D. was involved in research conception and design, data analysis and interpretation, article writing, and manuscript preparation. Y.W. was involved in data analysis and interpretation, article writing, and manuscript preparation. R.S. and C.W. oversaw the overall implementation of the project and were involved in research conception and designs, P.L., W.C., G.L., C.W., and L.C. helped to write and review this work. All authors gave final approval of the version to be published.

Corresponding author

Correspondence to Chieh-Liang Wu.

Ethics declarations

Ethics approval

The study was approved by the Institutional Review Board (IRB) of Taichung Veterans General Hospital (IRB No. CG21307B). The video recordings of consented individuals for human trials, include patients admitted to the adult intensive care unit at Taichung Veterans General Hospital from December 1, 2022, to December 31, 2023. All procedures were conducted in accordance with the Declaration of Helsinki of 1964 and its further modifications. Participants were informed of the study’s purpose, the data’s confidentiality, and the voluntary nature of their participation. All participants gave informed consent forms.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, PY., Wu, YC., Sheu, RK. et al. An automated ICU agitation monitoring system for video streaming using deep learning classification. BMC Med Inform Decis Mak 24, 77 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: