Predicting high blood pressure using machine learning models in low- and middle-income countries

Bisong, Ekaba; Jibril, Noor; Premnath, Preethi; Buligwa, Elsy; Oboh, George; Chukwuma, Adanna

doi:10.1186/s12911-024-02634-9

Research
Open access
Published: 23 August 2024

Predicting high blood pressure using machine learning models in low- and middle-income countries

Ekaba Bisong¹,
Noor Jibril²,
Preethi Premnath³,
Elsy Buligwa⁴,
George Oboh¹ &
…
Adanna Chukwuma^4,5

BMC Medical Informatics and Decision Making volume 24, Article number: 234 (2024) Cite this article

740 Accesses
9 Altmetric
Metrics details

Abstract

Responding to the rising global prevalence of noncommunicable diseases (NCDs) requires improvements in the management of high blood pressure. Therefore, this study aims to develop an explainable machine learning model for predicting high blood pressure, a key NCD risk factor, using data from the STEPwise approach to NCD risk factor surveillance (STEPS) surveys. Nationally representative samples of adults aged 18-69 years were acquired from 57 countries spanning six World Health Organization (WHO) regions. Data harmonization and processing were performed to standardize the selected predictors and synchronize features across countries, yielding 41 variables, including demographic, behavioural, physical, and biochemical factors. Five machine learning models - logistic regression, k-nearest neighbours, random forest, XGBoost, and a fully connected neural network - were trained and evaluated at global, regional, and country-specific levels using an 80/20 train-test split. The models’ performance was assessed using accuracy, precision, recall, and F1 score. Feature importance analysis identified age, weight, heart rate, waist circumference, and height as key predictors of blood pressure. Across the 57 countries studied, model performances varied considerably, with accuracy ranging from as low as 58.96% in some models for specific countries to as high as 81.41% in others, underscoring the need for region and country-specific adaptations in modelling approaches. The explainable model offers an opportunity for population-level screening and continuous risk assessment in resource-limited settings.

Peer Review reports

Background

High blood pressure affects over 1.2 billion people globally, with two-thirds of them residing in low- and middle-income countries (LMICs) [1,2,3]. High blood pressure is a risk factor for premature death and disability due to cardiovascular diseases, stroke, and chronic kidney disease [4]. In low- and middle-income countries, the economic impact of high blood pressure and its related complications is significant, often exceeding per capita health expenditure multiple times. For example, the average cost of managing high blood pressure can range from $500 to $1500 per episode, starkly contrasting with the more modest health budgets in these regions [5]. This financial strain is compounded by the broader economic effects, including substantial productivity losses due to the disease.

While individual blood pressure measurements are relatively straightforward to obtain, conducting widespread screenings and maintaining long-term monitoring across large populations, especially in resource-limited settings, remains a significant challenge. This study aims to develop an explainable machine learning model for predicting blood pressure levels using demographic, lifestyle, and other data available across contexts.

An explainable machine learning model that predicts blood pressure across settings using data that can be collected virtually can serve multiple purposes. Firstly, it can function as an efficient initial screening tool to identify high-risk individuals to prioritise for direct intervention in resource-limited settings. Secondly, the model has the potential for early risk identification, potentially flagging individuals at high risk of developing hypertension in the future based on current risk factors. Furthermore, the integration of such a predictive model with existing health data systems could provide continuous risk assessment without requiring frequent direct measurements. Lastly, the model’s ability to predict blood pressure trends over time could offer valuable insights into population health trajectories.

In light of these considerations, this study aims to develop an explainable machine learning model that predicts high blood pressure using clinical and demographic data from a large, diverse population across multiple LMICs.

Methods

The STEPS noncommunicable disease risk factor surveillance instrument

The STEPwise approach to noncommunicable disease (NCD) risk factor surveillance (STEPS) constitutes a standardized tool, designed for low- and middle- income countries (LMICs) to systematically collect, analyze, and disseminate data on key NCD risk factors [6,7,8,9]. This dataset encompasses behavioural risk factors, including tobacco and alcohol consumption, physical inactivity, and unhealthy dietary patterns, as well as biological risk factors such as overweight and obesity, high blood pressure, high blood glucose, and dyslipidemia.

STEPS employs a multistage cluster sampling methodology to generate a nationally representative sample of adults between the ages of 18 and 69 years. Data collection is conducted via in-person interviews with selected respondents at their residences. The survey comprises three distinct levels or “steps”, as detailed in Table 1.

Table 1 The STEPS survey encompasses distinct levels of risk-factor assessment [10]

Full size table

For the study, STEPS data from 57 countries spanning 6 WHO regions were acquired, as delineated in Fig. 1. 57 of the 71 countries comprising the WHO STEPS dataset that were obtained, are those who questionnaire are in English or could be easily converted to English. The country-level sample size ranged from 275 in Liberia to 9,183 in Ethiopia.

Data harmonization

For the harmonization of features of interest in predicting high blood pressure, we utilized author-led expert surveys and analyzed the STEPS questionnaire to identify relevant variables for predicting blood pressure. These variables were then adjusted for consistency across all participating countries based on the harmonization strategy outlined in our feature-engineering plan.

The harmonization process was critical in ensuring that the data collected from different countries could be accurately compared and analyzed. This involved aligning variable definitions, categories, and measurement techniques across different sections of the STEPS questionnaire:

1.
Demographic information: We condensed earnings-related questions into a single “Earnings per year” variable.
2.
Tobacco Use: We merged several questions to create more concise variables, such as “Length of time smoking” and “Number of tobacco products per day.” We also consolidated different types of tobacco products, recognizing their similar health risks.
3.
Alcohol Consumption: We condensed frequency and quantity questions into more manageable variables like “How often do you drink alcohol?” and “How many alcoholic drinks do you consume per day?”
4.
Diet: We consolidated fruit and vegetable consumption questions into a single “How many fruit/vegetables do you eat per day?” variable.
5.
Physical Activity: We simplified work intensity and physical activity questions to capture key information more efficiently.
6.
Medical History: We focused on key variables related to blood pressure, diabetes, cholesterol, and cardiovascular diseases, removing redundant or less relevant questions.
7.
Physical Measurements: We retained key measurements like blood pressure readings, height, weight, waist circumference, and relevant biochemical measurements.

This harmonization strategy allowed us to create a more streamlined and consistent dataset across all countries, focusing on the most relevant predictors of blood pressure while reducing redundancy and potential inconsistencies in data collection across different settings.

The study dataset included 48 variables, as illustrated in Table 2. Among these variables, 11 represent demographic factors, including sex, age, years of schooling, educational level, marital status, and employment status. There are 24 variables associated with behavioural measurements, including factors like smoking habits, alcohol consumption, fruit and vegetable intake, work intensity, and treatment for hypertension and heart disease. Five variables represent the physical measurements of the respondents, including the height, weight, waist, and hip circumference. Additionally, eight variables are related to the biochemical measurements of the respondents, including fasting blood glucose, cholesterol, urinary sodium, urinary creatinine, and the average systolic and diastolic measurements from three readings. The final study dataset comprises 27 numeric and 21 categorical variables in total.

Table 2 Variable descriptions

Full size table

Data processing

For numeric variables, the Z-score method was utilized to identify and remove outliers by setting a threshold value. Observations with a Z-score exceeding this threshold were considered outliers and were removed from the dataset [11, 12]. It must, however, be noted that, our approach only replaced extreme outliers, not all values below Q1 and Q3. This ensures that extreme values do not unduly influence the model training.

To handle categorical variables, a dictionary was created, containing mappings of categorical encodings for each variable. This dictionary facilitated the transformation of the categorical data into numerical values, aiding further analysis. Missing values within categorical columns were replaced with a designated “no response” category to ensure these instances were still accounted for in the dataset.

The creation of the target variable for the analysis involved first determining each person’s blood pressure status by averaging their systolic and diastolic readings. We then used the average of the three blood pressure measurements taken during the STEPS survey to minimize variability and improve the reliability of our predictive models. Based on CDC [13] and AHA [14] guidelines, a comprehensive blood pressure classification system was created that considered both systolic and diastolic measurements. The criteria were:

1.
If the systolic and diastolic readings were below 120 and 80 mm Hg, blood pressure was considered normal.
2.
The “normal” classification was also given if the systolic reading was 120 to 129 mm Hg and the diastolic was below 80.
3.
High blood pressure was defined as a systolic reading between 130 and 139 mm Hg or a diastolic reading between 80 and 89.
4.
The “high” classification was also applied if either the systolic or diastolic reading was 140 or 90 mm Hg.
5.
Finally, “high” status was given if the systolic or diastolic reading was 180 or 120 mm Hg.

A robust approach was applied to address nonsensical outliers in the dataset, thereby enhancing the reliability of the analysis. This approach involved the use of two complementary methods, aiming to identify and replace extreme values that lay outside the bounds of the upper and lower whiskers of the data distribution.

The first method was focused on handling outliers situated above the upper whisker. This boundary was determined by computing the third quartile (Q3) of the data distribution and adding a product of a predefined constant multiplier and the interquartile range (IQR). This is formally represented as $U = Q3 + k \times IQR$, where U is the upper boundary, Q3 is the third quartile, k is a predefined constant multiplier, and IQR is the interquartile range, calculated as $Q3 - Q1$. Upon identification of these upper-bound outliers, they were replaced with random numbers $R_U$ that fell within the interquartile range, or alternatively, between the mean and Q3 (i.e., $\mu \le R_U \le Q3$), ensuring a more representative value in line with the general data distribution.

In a similar manner, the second method was aimed at outliers residing below the lower whisker. This boundary was established by calculating the first quartile (Q1) and subtracting a product of a predefined constant multiplier and the IQR. This is formally presented as $L = Q1 - k \times IQR$, where L is the lower boundary, Q1 is the first quartile, k and IQR are already defined. Once these lower-bound outliers were identified, they were replaced with random numbers either within the interquartile range or, alternatively, between Q1 and the mean of the data distribution (i.e., $Q1 \le R_L \le \mu$). This procedure ensured the replaced values were more harmonious with the overall data distribution, leading to a more accurate and reliable dataset for further analysis. For further reading on robust methods for handling outliers and their theoretical justification, see [15] and [16], which discuss the principles and application of these techniques in statistical analysis.

Model design and evaluation

A comprehensive approach was employed to predict blood pressure status, using multiple machine learning models. The methodology encompassed global, regional, and country-specific levels. This allowed for the tailoring of predictions to each level’s unique characteristics.

The process began with data preprocessing. One-hot encoding was performed for categorical variables, a technique that converts categorical data into a binary matrix representation. This allows machine learning algorithms to work with categorical data in a numerical format. For example, a categorical variable “work status” with categories “employed,” “unemployed,” and “student” would be transformed into three binary columns.

Numerical variables were scaled using the StandardScaler method, which standardizes features by removing the mean and scaling to unit variance. This preprocessing step ensures that all numerical features contribute equally to the model and prevents features with larger magnitudes from dominating the learning process.

This preprocessing stage was crucial to ensure that the models could effectively learn from the data without being unduly influenced by the varying scales of different features. The dataset was split into training and testing subsets using the ‘split’ column, which was randomly assigned to each data point. This allowed for an unbiased evaluation of the models’ performance on unseen data. Table 3 shows the data split for the countries.

Table 3 Country-wise test and train data distribution

Full size table

At each level (global, regional, country), we applied a diverse set of machine learning algorithms to capture different aspects of the data and provide a comprehensive comparison. The chosen models represent a spectrum of approaches in machine learning:

1.
Logistic Regression: A linear model serving as a baseline and representing traditional statistical approaches [17].
2.
K-Nearest Neighbours (KNN): A non-parametric method that can capture local patterns in the data [18].
3.
Random Forest: An ensemble tree-based method known for handling non-linear relationships and interactions [19].
4.
XGBoost: A gradient boosting algorithm that often achieves state-of-the-art performance in structured data problems [20].
5.
Fully Connected Neural Network (FCNN): A deep learning approach capable of learning complex patterns and representations [21].

This selection allows us to compare linear (Logistic Regression) vs. non-linear (all others) models, tree-based ensemble methods (Random Forest, XGBoost) vs. other approaches, and traditional machine learning (first four) vs. deep learning (FCNN) techniques. By including this diverse set, we aim to comprehensively evaluate different modeling paradigms and identify which approaches are most effective for blood pressure prediction across various geographical scales.

Model performance was evaluated using the testing subset, with metrics including accuracy, precision, recall, and F1 score. This multi-metric approach provides a holistic view of model performance, considering both the ability to correctly identify positive cases (precision and recall) and overall predictive accuracy.

For the global models, the entire dataset (comprising all countries) was used for training and evaluation. At the regional level, models were developed for each of the six WHO regions, allowing for a more nuanced understanding of factors influencing blood pressure status in different geographical areas. Country-specific models were trained using data from each individual country to capture unique aspects of blood pressure patterns, maximizing reliability and accuracy of results.

Class imbalance was addressed by computing class weights inversely proportional to class frequencies, which were then incorporated into model training. This ensured that the models were not biased towards the majority class and could effectively learn from minority classes.

By employing this comprehensive, multi-level approach to model design and evaluation, the study aimed to provide accurate and reliable predictions of blood pressure status at global, regional, and country-specific levels.

Model explainability

In this section, the interpretability of predictor variables in a random forest global model is investigated by assessing the feature importance of each respective variable. While machine learning literature often refers to this as “feature importance”, it’s crucial to distinguish this concept from the epidemiological notion of “risk factors associated with raised BP”. Feature importance in a random forest model quantifies the statistical contribution of each variable to the model’s predictive accuracy. A higher value indicates that the feature plays a more significant role in the model’s predictions.

While feature importance can help identify variables that contribute substantially to the model’s predictive power, it’s important to note that high feature importance does not necessarily equate to clinical significance or causality in the context of blood pressure risk factors. Feature importance provides insights into the model’s decision-making process but should be interpreted alongside clinical knowledge and epidemiological evidence.

In the present study, feature importance is computed through two complementary techniques: mean decrease in impurity (MDI) and feature permutation. MDI measures the total decrease in node impurity averaged over all trees of the forest, while feature permutation assesses the decrease in model performance when a feature’s values are randomly shuffled.

Results

Study sample characteristics

The study dataset included 184,674 participants, with a mean age of 40.06 years and an average of 7.6 years of education. The average participant lived in a household with 3.01 members, and participants had an annual income of 1,727.08 USD.

On average participants started smoking at 18.65 years, consumed 7.62 tobacco products, and quit smoking at 29.87 years. They consumed 4.76 alcoholic drinks and had 10.91 servings of fruits and vegetables daily. Vigorous exercise occurred for 4.66 days and moderate exercise for 5.64 days per week. Participants spent 60.23 minutes walking or bicycling and 206.03 minutes sedentary daily.

The average height was 162.12 cm, with an average weight of 66.62 kg, waist circumference of 84.89 cm, and hip circumference of 95.89 cm. Mean fasting blood glucose was 39.67 mg/dL, total cholesterol 76.42 mg/dL, urinary sodium 121.13 mmol/L, and urinary creatinine 55.04 mg/dL. Triglycerides averaged 84.16 mg/dL, HDL cholesterol 17.67 mg/dL, systolic blood pressure 126.91 mmHg, diastolic blood pressure 80.27 mmHg, and resting heart rate 77.48 bpm.

The dataset was divided into training (n=147,739) and test (n=36,935) datasets for model development and validation, with similar characteristics in both sets, ensuring adequate representation (see Table 4).

Table 4 Study sample characteristics between the train and test dataset for the numeric variables

Full size table

In the study, 53.55% of participants were female, 36.40% had completed elementary school, 58.84% were married, 45.12% were employed. In addition, 80.00% did not smoke, 54.77% did not consume alcohol, salt consumption was reportedly normal among 29.11%, work intensity was vigorous-intensity 29.76%.

Blood pressure measurements were reported by 53.06% of the total population, while 80.60% did not respond about taking drugs for raised blood pressure. Blood sugar measurements were reported by 29.30% of the population, with 89.76% not responding about taking diabetes drugs. Cholesterol measurements were reported by 10.93% of the population, and only 0.07% reported taking oral cholesterol treatment. These characteristics are summarized in Table 5.

Table 5 Baseline characteristics between the train and test dataset for the categorical variables

Full size table

Model performance

Performance globally

The global models were trained and evaluated on the entire dataset, encompassing all 57 countries from the six WHO regions. Table 6 presents the performance metrics for each model, including accuracy, F1 score, precision, and recall.

Table 6 Global model results

Full size table

Among the five models, XGBoost achieved the highest accuracy of 68.52%, followed closely by the Fully Connected Neural Network (FCNN) with an accuracy of 68.25% and Logistic Regression with 68.20%. The Random Forest model performed comparably to Logistic Regression, with an accuracy of 68.14%. The K-Nearest neighbours (KNN) model had the lowest accuracy at 63.21%.

The F1 score, which is the harmonic mean of precision and recall, followed a similar trend to accuracy. XGBoost had the highest F1 score of 67.43%, while FCNN and Logistic Regression had scores of 67.66% and 67.19%, respectively. The Random Forest model had an F1 score of 66.67%, and KNN had the lowest score at 62.32%.

Precision, which measures the proportion of true positive predictions among all positive predictions, was highest for XGBoost at 67.85%, followed by FCNN at 67.64% and Logistic Regression at 67.51%. Random Forest had a precision of 67.56%, and KNN had the lowest precision at 62.39%.

Recall, which measures the proportion of true positive predictions among all actual positive instances, was highest for FCNN at 67.69%, followed by XGBoost at 67.27% and Logistic Regression at 67.06%. Random Forest had a recall of 66.51%, and KNN had the lowest recall at 62.28%.

The global model results demonstrate that XGBoost and FCNN consistently outperformed the other models across all performance metrics. Logistic Regression and Random Forest also showed competitive performance, while KNN had the lowest scores in all metrics.

Performance per region

The regional models were trained and evaluated on subsets of the dataset, each corresponding to one of the six WHO regions. Table 7 presents the performance metrics for each model in each region, including accuracy, precision, recall, and F1 score.

Sub-Saharan Africa

In the Sub-Saharan Africa region, the Fully Connected Neural Network (FCNN) achieved the highest accuracy at 64.96%, followed by Logistic Regression at 64.89% and XGBoost at 64.44%. The Random Forest model had an accuracy of 64.39%, while the K-Nearest neighbours (KNN) model had the lowest accuracy at 59.99%.

East Asia and Pacific

For the East Asia and Pacific region, FCNN outperformed the other models with an accuracy of 71.16%, followed by Random Forest at 71.02% and Logistic Regression at 70.09%. XGBoost had an accuracy of 70.04%, and KNN had the lowest accuracy at 64.66%.

South Asia

In South Asia, FCNN achieved the highest accuracy at 68.70%, followed by Logistic Regression at 68.38% and XGBoost at 68.17%. Random Forest had an accuracy of 68.13%, and KNN had the lowest accuracy at 64.32%.

Middle East and North Africa

For the Middle East and North Africa region, FCNN had the highest accuracy at 68.59%, followed by Logistic Regression and Random Forest, both at 68.23%. XGBoost had an accuracy of 65.97%, and KNN had the lowest accuracy at 62.77%.

Europe and Central Asia

In Europe and Central Asia, Logistic Regression achieved the highest accuracy at 77.75%, followed by Random Forest at 77.53% and FCNN at 77.19%. XGBoost had an accuracy of 76.77%, and KNN had the lowest accuracy at 72.87%.

Latin America and Caribbean

For the Latin America and Caribbean region, Random Forest outperformed the other models with an accuracy of 70.29%, followed by Logistic Regression at 69.55% and FCNN at 69.39%. XGBoost had an accuracy of 66.90%, and KNN had the lowest accuracy at 64.08%.

The regional model results demonstrate that the performance of the models varies across regions, with FCNN and Logistic Regression generally performing well in most regions. Random Forest also showed strong performance in some regions, particularly in East Asia and Pacific and Latin America and the Caribbean. XGBoost and KNN consistently had lower accuracies compared to the other models across all regions.

Table 7 Region model results

Full size table

Performance per country

The country-specific models were trained and evaluated on subsets of the dataset corresponding to each of the 57 countries included in the study. Table 8 presents the performance metrics for each model in each country, including accuracy, precision, recall, and F1 score.

Table 8 Country model results

Full size table

The performance of the models varied considerably across countries, with some models consistently outperforming others in certain countries, while the reverse was true in other countries. For example, in Ethiopia, the Logistic Regression model achieved the highest accuracy at 62.55%, precision at 62.58%, recall at 62.31%, and F1 score at 62.22%. In contrast, the KNN model had the lowest scores across all metrics in Ethiopia, with an accuracy of 58.96%, precision of 58.88%, recall of 58.83%, and F1 score of 58.82%.

In some countries, such as Georgia and Belarus, the FCNN model performed well, with accuracies of 72.51% and 81.41%, respectively. However, in other countries like the Bahamas and Barbados, the Random Forest model achieved the highest accuracies at 71.49% and 73.21%, respectively.

XGBoost demonstrated strong performance in certain countries, such as Tokelau, where it achieved the highest accuracy (77.67%), precision (77.52%), recall (76.46%), and F1 score (76.80%) among all models. However, its performance was less impressive in other countries, like Chad and Grenada, where it had lower scores compared to other models.

The Logistic Regression model showed consistent performance across many countries, often ranking among the top models in terms of accuracy, precision, recall, and F1 score. For instance, in Mongolia, Logistic Regression achieved an accuracy of 72.98%, precision of 73.30%, recall of 72.92%, and F1 score of 72.86%.

The KNN model generally had lower scores compared to the other models across most countries. However, there were a few exceptions, such as in Grenada, where KNN achieved the highest accuracy at 66.85%, precision at 64.49%, recall at 61.81%, and F1 score at 61.94%.

Model explainability

Mean decrease in impurity (MDI)

Mean decrease in impurity (MDI) serves as a method for evaluating feature importance in decision tree-based models. This approach quantifies the average reduction in impurity-such as entropy or the Gini index-resulting from the utilization of a specific feature to partition the dataset. A greater decrease in impurity corresponds to a higher degree of importance for the feature. In essence, the feature that induces the largest reduction in impurity is deemed the most significant in the dataset. As illustrated in Fig. 2, the top five features include age, weight, hip circumference, waist circumference, and sex (male).

Feature permutation importance

Feature permutation importance evaluates the impact on model performance when a specific feature is randomly altered by introducing noise. The importance of that particular feature for the model’s predictions can be estimated by contrasting the performance of the model utilizing the permuted or modified feature with the performance of the model employing the original feature. A greater change in performance implies increased importance for the feature. This method has been implemented for a random forest classifier in the global model.

For each feature within the dataset, the values of the feature were randomly permuted, and predictions were generated using the trained model. Among the features included in the model, age, heart reading in beats per minute, weight, waist circumference, and hip circumference emerged as the top five features contributing to the change in error of the model after training on the permuted features compared to the original model. The results for the feature permutation importance measure are depicted in Fig. 3.

Discussion

This study applied several machine learning models to predict blood pressure status using the WHO STEPS dataset with a nationally-representative sample of 184,674 participants from 57 low- and middle-income countries. The XGBoost and FCNN models performed slightly better than logistic regression and random forest models across various metrics, while KNN consistently underperformed. Notably, model performance varied significantly across regions and countries, highlighting the need for context-specific approaches.

Our feature importance analysis identified age, weight, heart rate, waist circumference, and height as the most important blood pressure predictors, aligning with previous research findings [22,23,24]. The model’s explainability is crucial for facilitating its adoption and trust among healthcare professionals.

Our findings have several implications for policymakers, clinical care, and researchers focused on NCD prevention and control in resource-limited settings. Predictive machine learning models calibrated using health information systems can identify high-risk populations and prioritise sub-national regions for in-person hypertension screening and interventions. Similarly, clinicians can use these models to screen their patient database and identify those with a higher risk of raised blood pressure for intensive monitoring or earlier intervention.

The variability in model performance across countries reinforces the importance of developing and validating country-specific models for increased accuracy and more tailored interventions. While this study does not introduce new ML algorithms, it highlights the potential of applying existing techniques to large-scale datasets in LMICs to advance public health objectives.

The strengths of this study include the use of a large, nationally representative dataset [25, 26], evaluation of multiple machine learning models, and validation using a separate testing dataset. While this study provides valuable insights, it also has limitations. The models were applied to retrospective data and not tested prospectively in clinical settings or population settings. Future studies should validate these findings in prospective, real-world scenarios. Additionally, the study focused solely on predicting blood pressure status and did not extend to other NCD risk factors or blood pressure control over time.

Despite these limitations, this study contributes to the literature on using machine learning for chronic disease management. Future research should focus on validation of these models in clinical settings, developing country-specific models to improve prediction accuracy, expanding the target to other NCD risk factors and long-term blood pressure control, and more broadly, the integration of ML-based prediction tools into health systems in LMICs.

Conclusion

This study demonstrates the potential of applying machine learning techniques to large-scale health datasets for predicting blood pressure status in LMICs. The variability in model performance across countries underscores the need for context-specific approaches in addressing hypertension. Policymakers and healthcare providers in LMICs could potentially use these models as tools for population-level risk stratification and resource allocation, complementing rather than replacing direct blood pressure measurements.

By addressing the identified limitations and expanding the geographical coverage to include more diverse populations, researchers can develop more comprehensive and reliable models for predicting blood pressure control. The integration of such models into clinical practice, coupled with further validation and refinement, has the potential to revolutionize the management of hypertension and other non-communicable diseases in resource-limited settings.

In conclusion, this study lays the foundation for future research on the use of machine learning in the context of global health and non-communicable disease management. The explainable machine learning model developed herein serves as a valuable tool for supporting clinical decision-making and improving blood pressure control in low- and middle-income countries. With continued efforts to address the limitations and expand upon this work, the application of machine learning in healthcare can contribute significantly to the achievement of better health outcomes for populations worldwide.

Availability of data and materials

Data and code that support the findings of this study are available on Github: https://github.com/SiliconBlast/bpc-prediction-lmics.

References

World Health Organization, et al. Noncommunicable diseases progress monitor. 2022. https://iris.who.int/bitstream/handle/10665/353048/9789240047761-eng.pdf.
Centers for Disease Control and Prevention. About Global NCDs. 2021. https://www.cdc.gov/globalhealth/healthprotection/ncd/global-ncd-overview.html#one. Accessed 6 July 2024.
UNICEF. Non-communicable diseases. 2023. https://www.unicef.org/health/non-communicable-diseases. Accessed 6 July 2024.
Aikaeli F, Njim T, Gissing S, Moyo F, Alam U, Mfinanga SG, et al. Prevalence of microvascular and macrovascular complications of diabetes in newly diagnosed type 2 diabetes in low-and-middle-income countries: A systematic review and meta-analysis. PLOS Global Public Health. 2022;2(6):e0000599.
Article PubMed PubMed Central Google Scholar
Gheorghe A, Griffiths U, Murphy A, Legido-Quigley H, Lamptey P, Perel P. The economic burden of cardiovascular disease and hypertension in low-and middle-income countries: a systematic review. BMC Public Health. 2018;18(1):1–11.
Article Google Scholar
World Health Organization. Noncommunicable Disease Surveillance, Monitoring and Reporting; 2023. https://www.who.int/teams/noncommunicable-diseases/surveillance/systems-tools/steps. Accessed 6 July 2024.
Wamai RG, Kengne AP, Levitt N. Non-communicable diseases surveillance: overview of magnitude and determinants in Kenya from STEPwise approach survey of 2015. BMC Public Health. 2018;18(3):1–8.
Google Scholar
World Health Organization, et al. The WHO STEPwise approach to chronic disease risk factor surveillance. Geneva: World Health Organization; 2005.
Riley L, Guthold R, Cowan M, Savin S, Bhatti L, Armstrong T, et al. The World Health Organization STEPwise approach to noncommunicable disease risk-factor surveillance: methods, challenges, and opportunities. Am J Public Health. 2016;106(1):74–8.
Article PubMed PubMed Central Google Scholar
Bonita R, Winkelmann R, Douglas KA, de Courten M. The WHO stepwise approach to surveillance (Steps) of non-communicable disease risk factors. In: McQueen DV, Puska P, editors. Global behavioral risk factor surveillance. Boston: Springer; 2003. pp. 9–22. https://doi.org/10.1007/978-1-4615-0071-1_3.
Kalaivani B, Ranichitra A. Unveiling the impact of outliers: an improved feature engineering technique for heart disease prediction. In: International Conference on IoT based control networks and intelligent systems. Singapore: Springer Nature Singapore; 2023. pp. 469–78.
Aggarwal V, Gupta V, Singh P, Sharma K, Sharma N. Detection of spatial outlier by using improved Z-score test. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). Tirunelveli: IEEE; 2019. pp. 788–90. https://doi.org/10.1109/ICOEI.2019.8862582.
Centers for Disease Control and Prevention. High blood pressure symptoms and causes. 2021. https://www.cdc.gov/bloodpressure/about.htm. Accessed 6 July 2024.
American Heart Association. Understanding blood pressure readings. 2022. https://www.heart.org/en/health-topics/high-blood-pressure/understanding-blood-pressure-readings. Accessed 6 July 2024.
Rousseeuw PJ, Hampel FR, Ronchetti EM, Stahel WA. Robust statistics: the approach based on influence functions. New York: Wiley; 1986.
Google Scholar
Huber PJ. Robust estimation of a location parameter. In: Kotz S, Johnson NL, editors. Breakthroughs in statistics. New York: Springer Series in Statistics; 1992. pp. 492–518. https://doi.org/10.1007/978-1-4612-4380-9_35.
Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. Wiley; 2013.
Altman NS. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am Stat. 1992;46(3):175–85.
Article Google Scholar
Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). New York: Association for Computing Machinery; 2016. pp. 785–94. https://doi.org/10.1145/2939672.2939785.
Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016. http://www.deeplearningbook.org.
Islam SMS, Talukder A, Awal MA, Siddiqui MMU, Ahamad MM, Ahammed B, et al. Machine learning approaches for predicting hypertension and its associated factors using population-level data from three south asian countries. Front Cardiovasc Med. 2022;9:839379. https://doi.org/10.3389/fcvm.2022.839379.
Martinez-Ríos E, Montesinos L, Alfaro-Ponce M, Pecchia L. A review of machine learning in hypertension detection and blood pressure estimation based on clinical and physiological data. Biomed Signal Process Control. 2021;68:102813.
Article Google Scholar
Wu X, Yuan X, Wang W, Liu K, Qin Y, Sun X, et al. Value of a machine learning approach for predicting clinical outcomes in young patients with hypertension. Hypertension. 2020;75(5):1271–8.
Article CAS PubMed Google Scholar
Nasir N, Oswald P, Barneih F, Alshaltone O, AlShabi M, Bonny T, et al. Hypertension classification using machine learning part II. In: 2021 14th International Conference on Developments in eSystems Engineering (DeSE). Sharjah: IEEE; 2021. pp. 459–63. https://doi.org/10.1109/DeSE54285.2021.9719408.
Bani-Salameh H, Alkhatib SM, Abdalla M, Al-Hami M, Banat R, Zyod H, et al. Prediction of diabetes and hypertension using multi-layer perceptron neural networks. Int J Model Simul Sci Comput. 2021;12(02):2150012.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

SiliconBlast Ltd., Calgary, AB, Canada
Ekaba Bisong & George Oboh
Medtronic, Abu Dhabi, United Arab Emirates
Noor Jibril
Department of Government Enablement, Abu Dhabi, United Arab Emirates
Preethi Premnath
World Bank, Washington, DC, 20433, USA
Elsy Buligwa & Adanna Chukwuma
Health, Nutrition, and Population Global Practice, World Bank Group, Washington, DC, 20433, USA
Adanna Chukwuma

Authors

Ekaba Bisong
View author publications
You can also search for this author in PubMed Google Scholar
Noor Jibril
View author publications
You can also search for this author in PubMed Google Scholar
Preethi Premnath
View author publications
You can also search for this author in PubMed Google Scholar
Elsy Buligwa
View author publications
You can also search for this author in PubMed Google Scholar
George Oboh
View author publications
You can also search for this author in PubMed Google Scholar
Adanna Chukwuma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.B. and A.C. conceptualized the study. E.B. and G.O. acquired and processed the data. E.B. and A.C. designed and performed the data analyses with support from N.J., P.P., and E.B. E.B. and A.C., together with N.J., P.P., and E.B., drafted the manuscript. All authors contributed to the interpretation of the results and critically revised the manuscript for important intellectual content. A.C. supervised the study. All authors approved the final version of the manuscript for submission.

Authors’ information

Not applicable.

Corresponding authors

Correspondence to Ekaba Bisong or Adanna Chukwuma.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bisong, E., Jibril, N., Premnath, P. et al. Predicting high blood pressure using machine learning models in low- and middle-income countries. BMC Med Inform Decis Mak 24, 234 (2024). https://doi.org/10.1186/s12911-024-02634-9

Download citation

Received: 01 May 2024
Accepted: 12 August 2024
Published: 23 August 2024
DOI: https://doi.org/10.1186/s12911-024-02634-9

Predicting high blood pressure using machine learning models in low- and middle-income countries

Abstract

Background

Methods

The STEPS noncommunicable disease risk factor surveillance instrument

Data harmonization

Data processing

Model design and evaluation

Model explainability

Results

Study sample characteristics

Model performance

Performance globally

Performance per region

Sub-Saharan Africa

East Asia and Pacific

South Asia

Middle East and North Africa

Europe and Central Asia

Latin America and Caribbean

Performance per country

Model explainability

Mean decrease in impurity (MDI)

Feature permutation importance

Discussion

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors’ information

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Informatics and Decision Making

Contact us