Skip to main content

A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones



Undernutrition is the main cause of child death in developing countries. This paper aimed to explore the efficacy of machine learning (ML) approaches in predicting under-five undernutrition in Ethiopian administrative zones and to identify the most important predictors.


The study employed ML techniques using retrospective cross-sectional survey data from Ethiopia, a national-representative data collected in the year (2000, 2005, 2011, and 2016). We explored six commonly used ML algorithms; Logistic regression, Least Absolute Shrinkage and Selection Operator (L-1 regularization logistic regression), L-2 regularization (Ridge), Elastic net, neural network, and random forest (RF). Sensitivity, specificity, accuracy, and area under the curve were used to evaluate the performance of those models.


Based on different performance evaluations, the RF algorithm was selected as the best ML model. In the order of importance; urban–rural settlement, literacy rate of parents, and place of residence were the major determinants of disparities of nutritional status for under-five children among Ethiopian administrative zones.


Our results showed that the considered machine learning classification algorithms can effectively predict the under-five undernutrition status in Ethiopian administrative zones. Persistent under-five undernutrition status was found in the northern part of Ethiopia. The identification of such high-risk zones could provide useful information to decision-makers trying to reduce child undernutrition.

Peer Review reports


Proper nutrition is so crucial to lead a healthy lifestyle. Malnutrition, particularly undernutrition, is a global concern for the health condition and survival of children [1,2,3,4,5]. Almost half of the deaths of children in developing countries were directly or indirectly linked to malnutrition [3, 6]. Malnourished children are more vulnerable to different illnesses compared to their counterparts [1,2,3,4,5,6]. A considerable number of studies investigating the issue targeting under-five children malnutrition and the risk factors associated with this age group. These studies employed classical models such as generalized linear (mixed) models [4, 5, 7,8,9,10]. The finding from the investigations, among others, showed that the nutritional status of children of this age group has gradually improved over the last 2 decades in Ethiopia. Particularly, it has been found that the prevalence of under-five children underweight in Ethiopia was 47.1% in 2000, 38.5% in 2005, 28.8% in 2011, 23.3 in 2016, and 20.56% in 2019, while the prevalence of stunting was 51.22% in 2000, 46.5% in 2005, 44.3% in 2011, 38.3% in 2016, and 36.9% in 2019. Similarly, 10.7% of under-five children were wasted in 2000, 10.5% in 2005, 9.9% in 2011, 10.1% in 2016, and 7% in 2019. The prevalence of having at least one of the undernutrition indicators measured in terms of the composite index for anthropometric failure (CIAF) was 61.38% in 2000, 56.58% in 2005, 51.58% in 2011, 46.49% in 2016, and 42.4 in 2019. Moreover, the CIAF is computed by grouping different forms of anthropometric failure as such: B-wasting only, C-wasting and underweight, D-wasting, stunting and underweight, E-stunting and underweight, F-stunting only, and Y-underweight only. The CIAF, calculated by aggregating these six (B–Y) categories [11,12,13,14,15]. Most of such studies conducted in this country depicted the effects of socio-economic and demographic covariates that were associated with under-five children undernutrition status using the classical regression models [4, 5, 7, 8]. Those traditional models are widely used for causal inferences and with the selection of built-in features, with a relatively small number of covariates [16, 17]. Correlations between covariates (multicollinearity) and a large number of factors are the common analytical challenges in traditional modeling [18,19,20,21]. Moreover, as compared to those classical models, the machine learning (ML) methods have the qualities of using a larger number of predictors, requiring fewer assumptions, incorporating “multi-dimensional correlations”, and producing a more flexible relationship among the predictor variables and the outcome variables [16,17,18, 20,21,22]. In addition, the ML models can create models for prediction purposes that show superiority in taking care of classification problems when compared with the classical approaches [16,17,18, 21, 23]. In the present paper, we focused to predict CIAF in Ethiopia using this tool drawing on the nationally representative data. Machine learning employs methods developed within the disciplines of statistics, computer sciences, mathematics, and artificial intelligence which allow the formation of algorithms that can learn from and make predictions using data [24,25,26,27,28,29]. As such, it is applicable in different disciplines, such as in medical sciences; for diagnosis and outcome prediction [23, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44], disease modeling [33], disease prediction [34,35,36,37], child mortality [23, 38], and it is also used in industrial applications [39,40,41]. Just only a few studies had investigated the role of this tool to create prediction models of childhood for malnutrition [42,43,44]. Moreover, the study is conducted at the administrative zones in Ethiopia. This is because, in the country, the zonal health departments have the mandate to plan, follow up, monitor, and evaluate health activities of Woreda health offices and the different Woredas in the same Zone are relatively similar in many respects. Moreover, the administrative Zones are mainly ethnic-based, and the assessment of the Zones provides cultural practices regarding staple food and the geographic environment of the community in the Zones [45,46,47,48]. Hence detecting the problems of undernutrition and its variations among administrative Zones provides deeper insight into the health priorities which helps policymakers to design focused intervention strategies. The main objective of this study was, therefore, to identify ML algorithms in predicting and identifying the important covariates that underline the spatial variations in childhood CIAF among 72 Ethiopian administrative zones.

Materials and methods

This study was carried out on the disparities of malnutrition in Ethiopia, with a surface area of 1.1 million km2, the country shares borders with Eritrea in the north, Djibouti and Somali in the east, Sudan and South Sudan in the west, and Kenya in the south. It is divided into 11 administrative units (regions) including Addis Ababa, the capital city of the country. The regions were further divided into 72 second-level administrative boundaries called zones [49] (Fig. 1).

Fig. 1
figure 1

Map of Ethiopia with regions and zones: A the 11 regions of the country and B the 72 administrative zones

Data sources and analysis tools

We conducted the analysis based on the four EDHS datasets (2000, 2005, 2011, and 2016), a nationally representative household survey developed by the United States Agency for International Development (USAID) in the 1980s [50]. The outcome variable that we aimed to predict is the undernutrition status of under-five children measured in terms of the composite index for anthropometric failure (CIAF). CIAF is measured as a binary response as being nourished (coded as 0) and undernourished (coded as 1). The covariates (features) were collected from different pieces of literature [4, 5, 7,8,9,10]. All the categorical features are converted to numerical dummy variables, by mapping each unique value to a number [4, 5, 7,8,9,10]. The boundaries (shapes) were used to define the second-level administrative zones and merged with the real dataset for analysis [51].


Model building The ML models have shown superiority in taking care of classification problems when compared with the traditional models (like generalized linear mixed models). The raw data are usually not found in the form and shape that is required for optimal performance of the machine learning algorithms. The algorithms that would be implemented in ML are only numerical values and therefore it is important to transform the categorical variables into numerical values. Hence, the preprocessing step is the most important aspect in the ML model applications [21, 23, 52,53,54]. The categorical features of the dataset are encoded to transform these features into numerical values and the continuous data in this study were normalized. For ML approaches, the dataset is randomly split into two: a training dataset which trains the model, and a test dataset where we predict the response variable and check whether the predicted outcome is similar to the actual outcomes, and the validation dataset is considered for the parameter estimates to be incorporated in the training models [24,25,26,27,28,29]. Influence of different training and testing ratios on the performance of the given ML models were checked. This study (train/test: 80/20, and 70/30) was implemented to divide the datasets into the training and testing datasets for performance assessment of models. Popular statistical indicators have been employed to evaluate the predictive capability of the models under different training and testing ratios. The results revealed that the train-test 70–30% split were more advantageous to undernutrition classification than their counterparts (80/20). A variety of supervised ML algorithms including Logistic Regression (LR) [55], Ridge regression [56], Least Absolute Shrinkage and Selection Operator (LASSO) regression [57], Elastic Net [27, 58], Artificial Neural Network (ANN) [59, 60] and Random Forest (RF) [27, 61] were included in the analysis.

The Ridge, Lasso, and Elastic Net are very similar to LR, except that we have an additional penalty term called regularization to estimate the regression coefficients [26, 27] to reduce the over-fitting and the adverse effects of multicollinearity [26,27,28, 62]. The advantage of ridge, lasso and elastic net modeling over the classical statistical methods is that, in addition to fitting optimized models, a penalty is applied to predictors in the model, causing covariates with little impact on the outcome variable to be minimized or dropped from the final model. This reduces the model's complexity while increasing its generalizability.

Logistic regression (LR) LR is a widely applied statistical model for classification problems. This model applies the maximum likelihood estimation procedure to estimate the parameter of interest. Let \(y_{i}\) be the response variable for the ith child, and it is Bernoulli distributed and takes on the value 1 with a probability of \(\pi_{i} = P(y_{i} = 1|{\varvec{x}}_{i} )\), where \({\varvec{x}}_{i} = \left( {x_{1} , . . . , x_{p} } \right)^{T}\) is the ith child’s covariate vector, and value 0 with probability 1 − \(\pi_{i}\). Then the logistic regression model with the logit link function can be given as:

$$\pi_{i} = \frac{{{\text{exp}}\left( {\beta_{0} + {\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right)}}{{1 + {\text{exp}}\left( {\beta_{0} + {\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right)}}$$

where \(\beta_{0}\) is the intercept term, and \({\varvec{\beta}} = \left( {\beta_{1} , . . . , \beta_{p} } \right)^{T}\) is a p × 1 vector of estimated regression parameters on the logit scale. If parameter \({\varvec{\theta}} = \left( {\beta_{0} ,{\varvec{\beta}}} \right)^{T}\), then the corresponding log-likelihood function is given by the following equation as it was also shown by [55]:

$$l_{{\varvec{\theta}}} = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} \log \left( {\pi_{i} } \right) + \left( {1 - y_{i} } \right)\log \left( {1 - \pi_{i} } \right)} \right]$$

By replacing \(\pi_{i}\) from Eq. 1 in Eq. 2, we have:

$$l_{{\varvec{\theta}}} = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} \left( {\beta_{0} + {\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right) - \log \left( {1 + {\text{exp}}(\beta_{0} + {\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right))} \right]$$

In the maximum likelihood method, the goal is finding a set of \({\varvec{\theta}}\) that can maximize Eq. (3). When we have a large number of features (dimensionality), the traditional LR has a few problems: over-fitting, multicollinearity, and computational difficulties. To address this problem, we used regularization which is a GLM that imposes a penalty on the parameters to shrink towards zero [27, 55,56,57,58, 63].

The ridge regression (L2 regularization, which shrinks coefficients of correlated covariates towards each other) is obtained by maximizing the function with a penalized parameter \(\lambda\) applied for all the parameters except the constant (intercept) [55, 56]. The penalized likelihood formulation for ridge regression is given by (4)

$$l_{\lambda }^{{\text{R}}} \left( {\varvec{\beta}} \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} \left( {{\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right) - \log \left( {1 + {\text{exp}}({\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right))} \right] - \lambda \mathop \sum \limits_{j = 1}^{p} {\varvec{\beta}}_{j}^{2}$$

When the λ values are too large (λ → ∞), the coefficients of all the parameters tend to be zero, but when λ = 0, the ridge regression is equal to the traditional approach.

The LASSO regression uses the L-1 penalty for variable selection and shrinkage. As such, if the \(\lambda\) is large enough, it forces the coefficient to be zero which provides a lesser number of predictors [57]. The function for the lasso regression is given by (5)

$$l_{\lambda }^{{\text{L}}} \left( {\varvec{\beta}} \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} \left( {{\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right) - \log \left( {1 + {\text{exp}}({\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right))} \right] - \lambda \mathop \sum \limits_{j = 1}^{p} \left| {{\varvec{\beta}}_{j} } \right|$$

The term \(\lambda\) allows the lasso model to carry out much iteration for a given function and find the optimum values for all coefficients. The optimal regularization parameter (\(\lambda\)) was determined using the nfold cross-validation techniques. The smaller the \(\lambda\) value, the more the effect of regularization upon the number of covariates (features) in the model and their respective coefficients [26,27,28]. Thus, variables with non-zero estimates are considered the important covariates for the outcome variable of interest.

The elastic net regularization is a combination of both (3) and (4) penalties [27, 58]. This method can effectively control the group of correlated features and also shrink the coefficients of non-informative features to zero [27, 58, 63, 64]. The elastic net regression is given by (5)

$$l_{\alpha }^{{{\text{El}}}} \left( {\varvec{\beta}} \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} \left( {{\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right) - \log \left( {1 + {\text{exp}}({\varvec{x}}_{i}^{T} {\varvec{\beta}}} \right))} \right] + \alpha \mathop \sum \limits_{j = 1}^{p} {\varvec{\beta}}_{j}^{2} + \left( {1 - \alpha } \right)\mathop \sum \limits_{j = 1}^{p} \left| {{\varvec{\beta}}_{j} } \right|$$

All the ML algorithms including the logistic regression were performed with R statistical software R and the packages glmnet, pROC, caret, random forest, ggplot, and ROCit were included in the analysis [65,66,67,68,69]. In this paper, we trained the generalized linear model (GLM) estimators with common \(\alpha\) values from the set \(\left\{ {0,0.5,1} \right\}\), where (\(\alpha\) = 0.0, 0.5 and 1.0 respectively refers to the ridge, elastic net and lasso penalty) [27, 58, 63].

The Random forest (RF) is the popular supervised ML approach in applied statistics because of its applicability in both classification and regression [70,71,72]. It is also used for variable screening for dimension reduction. It is a “tree-based” technique in which several decision trees are constructed from a random set of covariates and used to predict an outcome label for a subset of samples. It builds multiple trees (called the forest) and the decision is based on the majority votes over all the trees in the forest [70,71,72,73].

The Neural Network (NN) is a type of ML algorithm that is made up of layers of nodes, the most important of which are an input layer [74], hidden layers, and output layers. It is set up with several input neurons (X) that represent the information extracted from each feature in the dataset. Back-propagation is a process used in recurrent NN in which prediction errors are fed back through the NN before modifying the weights of each neural connection until the error level is minimized [59, 60].

$$y = activation\left( {\sum (weight + input} \right) + bias)$$

Model evaluation

Model performance The performances of the given ML models are evaluated using different model performance approaches including sensitivity, specificity, and accuracy [24,25,26,27,28,29, 75] which are calculated using the observed data as the gold standard. The model sensitivity and specificity relationship are expressed using the Receiver operating characteristics (ROC) curves (Fig. 2).

Fig. 2
figure 2

Overview flow chart of the methodologies used

All the curves which are plotted to the left of the diagonal line are performing better than chance. The area under each curve (AUC) gives an aggregated value which explains the probability that a random sample would be correctly classified by each of the ML algorithms [25, 76]. The AUC of the ROC curve averaged over 10 cross-validation folds (ten repeats) [25], which partitions the original sample into ten disjoint subsets, uses nine of those subsets in the training process, and then makes predictions about the remaining subset. Then the identified best-fit model is used to predict the undernutrition in another dataset, known as the test dataset [24,25,26,27,28,29].

Covariate selection and ranking Covariate selection is very important for prediction and interpretations, especially for high-dimensional datasets. To assess the importance of predictors in the selected model, the study employed two important measures; Mean Decreases Accuracy (MDA) and Mean Decrease Gini (MDG). The highest decrease in the accuracy and Gini values of the model implies the best predictive and the most important variable respectively [77] for the successful classifications (Table 1).

Table 1 The description of the response variable and the respective covariates included in the model


This analysis consisted of data from 29,333 children of age 0–59 months. Of these, 15,281 (52.09%) had at least one form of the undernutrition indicators (stunting, wasting, and underweight) measured in terms of CIAF. We examined the prevalence of CIAF of U5C experience across different child and mother-household level covariates. The prevalence of CIAF was more common among parents with no formal education compared to parents with secondary and post-secondary levels of educations. Most of the undernourished children were from rural areas. Also, the prevalence of undernourished children was reported from the lower wealth index of households, from mothers having no media exposure, from unimproved toilets and sanitation compared with their counterparts. Covariates that were significant in the Chi-square statistics were used to develop the ML algorithms on the training dataset (Table 2).

Table 2 Sample characteristics (n = 29,333)

Figures in the supplementary documents indicated the effects of different levels of the log of the regularization parameter (\(\lambda\)) for the ridge, elastic net, and lasso regression using the dotted vertical lines (here at x = − 4.51, x = − 7.84, and x = − 8.71) respectively, which indicates the accuracy of the prediction maximization. The coefficients for the given model features were indicated for different values of log (\(\lambda\)) that minimizes a mean squared error (MSE) of coefficients established during the cross-validation. The graph shows that as the log (\(\lambda\)) value decreases, the number of the variables included in the model (those with nonzero coefficients) increases (Additional file 1).

Performance comparisons The accuracy and AUC were implemented to evaluate the efficiency of ML algorithms. The comparison of the efficiency of ML algorithms with the traditional LR was depicted in Fig. 3 and Table 3. All the ML algorithms considered in this study perform better than those of the classical logistic regression model to predict the undernutrition status. More detail is given in the Additional file 1.

Fig. 3
figure 3

ROC curves for LR, L-1 regularization, L-2 regularization, elastic net regularization, ANN, and RF in predicting undernutrition among under-five children

Table 3 The performance of the prediction models based on different classifications on the independent tests for two ratios

A comparison of 70% training and 30% validation, 80% training and 20% validation was performed respectively to examine the six models’ behaviors with some statistical measures and area under the receiver operating characteristic curve. Although all the models with the two train-test splits ratio had almost identical performances evaluation metrics, the 70–30% split was chosen as the most appropriate model to undernutrition classification. Moreover, it was noticed that the prediction model based on RF demonstrated the best-performed model, with AUC up to 0.761, followed by LASSO (AUC = 0.717), while the perdition model using the traditional model (LR) is the least efficient (AUC = 0.653). Hence RF model was chosen as the classification engine to construct the perdition model for under-five undernutrition in Ethiopian administrative zones (Table 3).

In machine learning prediction, identifying important attributes is also crucial. The importance of each aspect for a tree’s decision is represented by feature importance rates. The random forest (best algorithm for childhood undernutrition in our study gives the MDA and MDG measures of the relative importance of covariates in the model which are summarized in Fig. 4. The factors include urban–rural settlement (ur), the total number of under-five population, the BMI, literacy rates of parents and zones were the most important predictors of CIAF, but household size, age of mother, parity, and autonomy were the lowest predictive variables in our model (Fig. 4).

Fig. 4
figure 4

Relative Variable importance from the best model (random forest)

The predicted values with the actual values of undernutrition among the 72 administrative areas were mapped in Fig. 5. Having the best predictive model (RF) that yielded the highest AUC, we further predicted the undernutrition status of under-five children by the administrative zones. Both the crude and predicted undernutrition values were merged with the second-level administrative level (zones) shapefiles. A visual comparison confirms that while discrepancies did exist between few zones, the overall patterns of the observed prevalence were in line with the patterns of the predicted prevalence of undernutrition. The degrees of agreement between the actual and predicted values indicated that the two variables are strongly correlated. Moreover, the third map reveals that the difference. Further, it is between the crude and predicted CIAF of U5C in some zones that have a positive difference indicated that the crude prevalence is less than the predicted value and vice versa (Fig. 5).

Fig. 5
figure 5

mapping the predicted and actual prevalence of undernutrition outcomes based on the test data


Previous studies carried out on this subject reported that Ethiopia is one of the countries with the highest number of under-five undernourished children in the world [2, 4, 8, 78, 79]. Further, the studies indicated that, while the prevalence of under-five undernutrition has declined in the nation from time to time, more effort is needed to facilitate this decline and to contain the negative consequences of the phenomena. In this study, we briefly described spatial disparities in under-five undernutrition and predicted under-five undernutrition among Ethiopian administrative zones. The spatial maps show evidences of considerable zonal disparities in under-five undernutrition rates in the administrative zones similar to what has been reported in different countries [80,81,82]. The continuous data in this study were normalized and the categorical variables were encoded. The machine learning models are known as advanced approaches and techniques for quick and accurate prediction of real-world problems. In this paper, the ML techniques are analyzed by investigating the influence of training/testing ratio on the performance of the six popular ML models to predict the undernutrition of under-five children. The performance of the ML models was slightly changed under the two different ratios. The result revealed that the ratio 70/30 was the most suitable ratio for the training and validating ML models. This study is in line with previously published studies [18, 23, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44, 83,84,85,86]. The ML tool can offer insight into the identification of novel factors associated with under-five undernutrition that can serve as targets for intervention. Among the six predictive models built using these techniques, the Random Forest (RF) model reveals a higher predictive power as compared to other ML models including the logistic regression. The RF model reveals that urban–rural settlement ratio, the literacy level of parents, under five populations, BMI of mothers, locations (zones, place of residence), and rainfall distributions were the top important predictors of under-five undernutrition in Ethiopia. This study is consistent with previous studies [4, 42, 79, 81]. Moreover, the selected ML algorithm reveals consistent effects of the covariates with the classical generalized linear model which shows that the educational level of parents, the age of the child, sex of the child, birth order, dietary diversity, types of the birthplace of residence, women’s autonomy, household sanitation, and a clean water supply were the most significant variables for undernutrition [4, 6, 7, 10, 21, 79,80,81,82]. The child’s residence (zones) was one of the important risk factors for the U5C CIAF rate which varied significantly across spatial zones. Moreover, this paper briefly explored the spatial variation in under-five child undernutrition and the predicted under-five undernutrition risk factors in Ethiopia using the different machine learning approaches. Hence, we explored a spatial map for the crude prevalence and predicted (from RF) rate of under-five undernutrition by zones in Ethiopia to document the zonal disparities in under-five undernutrition in the country.


Since there are no regression coefficients and no directional effects in ML algorithms, the parameters are difficult to be interpreted [21, 23, 87]. In the current study, ML models only predict or classify certain variables depending on the importance of their contribution in determining under-five undernutrition instead of causal inferences. More types of classification ML algorithms could also have been used [21, 23, 28, 38, 59].


The main objective of this study was to compare and evaluate the performance of different machine learning (ML) algorithms considering the influence of two train-test splits ratios in predicting the undernutrition under-five classification. Popular statistical indicators, such as accuracy and area under the curve were employed to evaluate the predictive power of the ML models under different testing and training ratios. The higher the accuracy the model had, the better was the performance of the model. Our results confirm that ML models can effectively predict the under-five undernutrition status and hence may be useful for concerned body decision tools. The best model was the RF, with accuracy and AUC of (68.2%, 76.2%) respectively. The findings from this paper showed that considerable zonal disparities in the under-five undernutrition status persist in the northern part of Ethiopia. When implementing health policies aimed at the redaction of child undernutrition in Ethiopian administrative zones, the zone characteristics must be taken into account.

Availability of data and materials

The dataset used and analyzed during the current study is available from the corresponding author on reasonable request.



Area under the curve


Ethiopian Demographic and Health Survey


Body mass index


Machine learning


Random forest


Artificial neural network


Receiver operating characteristic


Under-five children


Logistic regression


  1. Phalkey RK, et al. Systematic review of current efforts to quantify the impacts of climate change on undernutrition. Proc Natl Acad Sci. 2015;112(33):E4522–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Organization WH. The state of food security and nutrition in the world 2019: safeguarding against economic slowdowns and downturns, vol 2019. Food & Agriculture Org; 2019.

  3. El-Ghannam AR. The global problems of child malnutrition and mortality in different world regions. J Health Soc Policy. 2003;16(4):1–26.

    Article  PubMed  Google Scholar 

  4. Fenta HM, et al. Determinants of stunting among under-five years children in Ethiopia from the 2016 Ethiopia demographic and Health Survey: application of ordinal logistic regression model using complex sampling designs. Clin Epidemiol Glob Health. 2020;8(2):404–13.

    Article  Google Scholar 

  5. Kassie GW, Workie DL. Determinants of under-nutrition among children under five years of age in Ethiopia. BMC Public Health. 2020;20:1–11.

    Article  Google Scholar 

  6. Pelletier DL, Frongillo EA. Changes in child survival are strongly associated with changes in malnutrition in developing countries. J Nutr. 2003;133(1):107–19.

    Article  CAS  PubMed  Google Scholar 

  7. Degarege D, Degarege A, Animut A. Undernutrition and associated risk factors among school age children in Addis Ababa, Ethiopia. BMC Public Health. 2015;15(1):1–9.

    Article  Google Scholar 

  8. Takele K, Zewotir T, Ndanguza D. Understanding correlates of child stunting in Ethiopia using generalized linear mixed models. BMC Public Health. 2019;19(1):1–8.

    Article  Google Scholar 

  9. Suriyakala V et al. Factors affecting infant mortality rate in India: an analysis of Indian states. In: The international symposium on intelligent systems technologies and applications. Springer; 2016.

  10. Habyarimana F, Zewotir T, Ramroop S. A proportional odds model with complex sampling design to identify key determinants of malnutrition of children under five years in Rwanda. Mediterr J Soc Sci. 2014;5(23):1642–1642.

    Google Scholar 

  11. Nandy S, Svedberg P. The composite index of anthropometric failure (CIAF): an alternative indicator for malnutrition in young children. In: Handbook of anthropometry. Springer, pp 127–137; 2012.

  12. Rasheed W, Jeyakumar A. Magnitude and severity of anthropometric failure among children under two years using Composite Index of Anthropometric Failure (CIAF) and WHO standards. Int J Pediatr Adolesc Med. 2018;5(1):24.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Shit S, et al. Assessment of nutritional status by composite index for anthropometric failure: a study among slum children in Bankura, West Bengal. Indian J Public Health. 2012;56(4):305.

    Article  PubMed  Google Scholar 

  14. Mandal G, Bose K. Assessment of overall prevalence of undernutrition using composite index of anthropometric failure (CIAF) among preschool children of West Bengal, India; 2009.

  15. Sen J, Mondal N. Socio-economic and demographic factors affecting the Composite Index of Anthropometric Failure (CIAF). Ann Hum Biol. 2012;39(2):129–36.

    Article  PubMed  Google Scholar 

  16. Knol MJ, et al. What do case-control studies estimate? Survey of methods and assumptions in published case-control research. Am J Epidemiol. 2008;168(9):1073–81.

    Article  PubMed  Google Scholar 

  17. Gu W, et al. Use of random forest to estimate population attributable fractions from a case-control study of Salmonella enterica serotype Enteritidis infections. Epidemiol Infect. 2015;143(13):2786–94.

    Article  CAS  PubMed  Google Scholar 

  18. Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J. 2017;38(23):1805–14.

    PubMed  Google Scholar 

  19. Ambale-Venkatesh B, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res. 2017;121(9):1092–101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Adler ED, et al. Improving risk prediction in heart failure using machine learning. Eur J Heart Fail. 2020;22(1):139–47.

    Article  PubMed  Google Scholar 

  21. Deo RC. Machine learning in medicine. Circulation. 2015;132(20):1920–30.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Shameer K, et al. Machine learning in cardiovascular medicine: are we there yet? Heart. 2018;104(14):1156–64.

    Article  PubMed  Google Scholar 

  23. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng. 2007;160(1):3–24.

    Google Scholar 

  24. Quinlau R. Induction of decision trees. Mach Learn. 1986;1(1):S1–106.

    Google Scholar 

  25. Gareth J, et al. An introduction to statistical learning: with applications in R. Berlin: Spinger; 2013.

    Google Scholar 

  26. Molina M, Garip F. Machine learning for sociology. Annu Rev Sociol. 2019;45:27–45.

    Article  Google Scholar 

  27. Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media; 2019.

  28. Marsland S. Machine learning: an algorithmic perspective. Boca Raton: CRC Press; 2015.

    Google Scholar 

  29. Zhang H. The optimality of Naïve Bayes. FLAIRS2004 conference. 2004.

  30. Esteva A. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Anderson JP, et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J Diabetes Sci Technol. 2016;10(1):6–18.

    Article  CAS  Google Scholar 

  32. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29.

    Article  PubMed  Google Scholar 

  33. Ayer T, et al. Comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radiographics. 2010;30(1):13–22.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Farran B, et al. Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study. BMJ Open. 2013;3(5):e002457.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Aneja S, Lal S. Effective asthma disease prediction using naive Bayes—Neural network fusion technique. In: 2014 international conference on parallel, distributed and grid computing. 2014. IEEE.

  36. Behroozi M, Sami A. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int J Telemed Appl. 2016;2016:6837498.

    PubMed  PubMed Central  Google Scholar 

  37. Weiss JC, et al. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 2012;33(4):33–33.

    Google Scholar 

  38. Methun MIH, et al. A machine learning logistic classifier approach for identifying the determinants of under-5 child morbidity in Bangladesh. Clin Epidemiol Glob Health. 2021;12:100812.

    Article  Google Scholar 

  39. Bertolini M et al. Machine Learning for industrial applications: a comprehensive literature review. Expert Syst Appl; 2021: 114820.

  40. Schmidt J, et al. Recent advances and applications of machine learning in solid-state materials science. NPJ Comput Mater. 2019;5(1):1–36.

    Article  Google Scholar 

  41. Wuest T, et al. Machine learning in manufacturing: advantages, challenges, and applications. Prod Manuf Res. 2016;4(1):23–45.

    Google Scholar 

  42. Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition. 2020;78:110861.

    Article  PubMed  Google Scholar 

  43. Khare S, et al. Investigation of nutritional status of children based on machine learning techniques using Indian demographic and health survey data. Procedia Comput Sci. 2017;115:338–49.

    Article  Google Scholar 

  44. Rahman SJ, et al. Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach. PLoS ONE. 2021;16(6):e0253172.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gebreyesus SH, et al. Local spatial clustering of stunting and wasting among children under the age of 5 years: implications for intervention strategies. Public Health Nutr. 2016;19(8):1417–27.

    Article  PubMed  Google Scholar 

  46. Collaborators GRF. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet (London, England). 2016;388(10053):1659.

    Article  Google Scholar 

  47. Corsi DJ, et al. Shared environments: a multilevel analysis of community context and child nutritional status in Bangladesh. Public Health Nutr. 2011;14(6):951–9.

    Article  PubMed  Google Scholar 

  48. Griffiths P, et al. A tale of two continents: a multilevel comparison of the determinants of child nutritional status from selected African and Indian regions. Health Place. 2004;10(2):183–99.

    Article  PubMed  Google Scholar 

  49. Fetene N, et al. The Ethiopian health extension program and variation in health systems performance: what matters? PLoS ONE. 2016;11(5):e0156438.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Croft TN et al. Guide to DHS statistics. Rockville, Maryland, USA: ICF; 2018.

  51. Esri, ArcGIS Version 10.1. ESRI; 2010.

  52. Ibeji JU, et al. Modelling children ever born using performance evaluation metrics: a dataset. Data Brief. 2021;36:107077.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Raschka S. Python machine learning. Birmingham: Packt publishing ltd; 2015.

    Google Scholar 

  54. Seger C. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing; 2018.

  55. Yu H-F, Huang F-L, Lin C-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn. 2011;85(1–2):41–75.

    Article  Google Scholar 

  56. Arthur EH, Robert WK. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.

    Article  Google Scholar 

  57. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol). 1996;58(1):267–88.

    Google Scholar 

  58. Zou H, Hastie T. Addendum: regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol). 2005;67(5):768–768.

    Article  Google Scholar 

  59. Hecht-Nielsen R. Theory of the backpropagation neural network. In: Neural networks for perception. Elsevier. p. 65-93; 1992.

  60. Abdelhafiz D, et al. Deep convolutional neural networks for mammography: advances, challenges and applications. BMC Bioinform. 2019;20(11):1–20.

    Google Scholar 

  61. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining (New York, NY, USA, 2016), KDD ‘16, ACM; 2016.

  62. Garg A, Tai K. Comparison of statistical and machine learning methods in modelling of data with multicollinearity. Int J Model Identif Control. 2013;18(4):295–312.

    Article  Google Scholar 

  63. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.

    Article  Google Scholar 

  64. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol). 2005;67(2):301–20.

    Article  Google Scholar 

  65. Yuan G-X, Ho C-H, Lin C-J. An improved glmnet for l1-regularized logistic regression. J Mach Learn Res. 2012;13(1):1999–2030.

    Google Scholar 

  66. Genuer R, Poggi J-M, Tuleau-Malot C. VSURF: an R package for variable selection using random forests. R J. 2015;7(2):19–33.

    Article  Google Scholar 

  67. Robin X, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011;12(1):1–8.

    Article  Google Scholar 

  68. Khan MRAA. ROCit-An R package for performance assessment of binary classifier with visualization; 2019.

  69. Wickham H, Chang W, Wickham MH. Package ‘ggplot2’. Create elegant data visualisations using the grammar of graphics. Version. 2016; 2(1): 1–189.

  70. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  71. Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests. Pattern Recogn Lett. 2010;31(14):2225–36.

    Article  Google Scholar 

  72. Janitza S, Tutz G, Boulesteix A-L. Random forest for ordinal responses: prediction and variable selection. Comput Stat Data Anal. 2016;96:57–73.

    Article  Google Scholar 

  73. Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18–22.

    Google Scholar 

  74. Liang N-Y, et al. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw. 2006;17(6):1411–23.

    Article  PubMed  Google Scholar 

  75. Bland JM, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10.

    Article  Google Scholar 

  76. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.

    Article  CAS  PubMed  Google Scholar 

  77. Han H, Guo X, Yu H. Variable selection using mean decrease accuracy and mean decrease gini based on random forest. In: 2016 7th IEEE international conference on software engineering and service science (ICSESS). IEEE; 2016.

  78. Gebre A et al. Prevalence of malnutrition and associated factors among under-five children in pastoral communities of Afar Regional State, Northeast Ethiopia: a community-based cross-sectional study. J Nutr Metab. 2019;2019.

  79. Kassie GW, Workie DL. Determinants of under-nutrition among children under five years of age in Ethiopia. BMC Public Health. 2020;20(1):1–11.

    Article  Google Scholar 

  80. Spray AL, et al. Spatial analysis of undernutrition of children in leogane Commune, Haiti. Food Nutr Bull. 2013;34(4):444–61.

    Article  PubMed  Google Scholar 

  81. Simler KR. Nutrition mapping in Tanzania: an exploratory analysis. IFPRI Food Consumption and Nutrition Division Discussion Paper, 2006(204).

  82. Khan J, Mohanty SK. Spatial heterogeneity and correlates of child malnutrition in districts of India. BMC Public Health. 2018;18(1):1–13.

    Article  Google Scholar 

  83. Pham BT, et al. Spatial prediction of rainfall-induced landslides using aggregating one-dependence estimators classifier. J Indian Soc Remote Sens. 2018;46(9):1457–70.

    Article  Google Scholar 

  84. Verma C, Illés Z. Attitude prediction towards ICT and mobile technology for the real-time: an experimental study using machine learning. In: The international scientific conference elearning and software for education. 2019. “Carol I” National Defence University.

  85. Van Dao D, et al. A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA. 2020;188:104451.

    Article  Google Scholar 

  86. Nguyen PT, et al. Soft computing ensemble models based on logistic regression for groundwater potential mapping. Appl Sci. 2020;10(7):2469.

    Article  CAS  Google Scholar 

  87. Bitew FH, et al. Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey. Genus. 2020;76(1):1–16.

    Article  Google Scholar 

Download references


The datasets used in this study were obtained from the DHS program thanks to the authorization received to download the dataset on the website.


Not applicable.

Author information

Authors and Affiliations



HMF was involved in this study from data management, data analysis, drafting, and revising the final manuscript. TZ and EKM contributed to the conception, design, and interpretation of data, as well as to manuscript reviews and revisions. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Haile Mekonnen Fenta.

Ethics declarations

Ethics approval and consent to participate

Ethics approval and consent to participate Institutional review board of Macro International and USAID ethically approved the data utilized on this study. Authorization to make use of the data was formally applied through online registration on the MEASURE DHS website. The study protocol was submitted. Thus, approval was sought to use the datasets.

Consent for publication

Not applicable.

Competing interests

We, the authors, declare that we have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Implementation of different Supervised Machine Learning (SML) using R statistical software.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fenta, H.M., Zewotir, T. & Muluneh, E.K. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Med Inform Decis Mak 21, 291 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: