Skip to main content

Table 2 Summary of implemented ML algorithms

From: Quantifying the impact of addressing data challenges in prediction of length of stay

Algorithm

Description

KNN [31]

KNN is an instance-based and supervised ML algorithm used in both classification and regression problems. This algorithm uses feature similarity to determine the values of unseen data samples, meaning a missing datapoint is assigned a value based on that datapoint’s value in similar samples in the training set

BR [31]

BR is a linear regression using probability distributions rather than point estimates. In other words, the target variable’s value is not estimated as a single point but is drawn from a probability distribution

DT [31]

DT is a supervised learning model used in classification and regression applications that predicts the target value by learning simple rules from features. Each internal node represents a test on a feature, every leaf node indicates label of a class, and branching reflects features conjunctions that result in those classes

SVM (SVR) [31]

SVM is an instance-based and supervised ML algorithm utilized in classification and regression problems. A version of SVM for regression is called SVR. This algorithm generates a boundary known as a hyperplane between classes. The primary goal of this model is to maximize separation between classes

RF [32]

RF is a form of ensemble learning which is based on bagging. RF can be applied in both classification and regression applications. Ensemble learning refers to the process of constructing a model by merging various single models to produce a model that outperforms the primary models. In this approach, multiple subsets of a dataset are selected randomly with replacement, and one model is trained for each subset of the dataset. Ensemble learning has benefits such as overfitting prevention, solving the curse of dimensionality problem, and avoiding local optimal

GB [32]

GB is a supervised ensemble learning algorithm based on boosting that can be applied in both classification and regression applications. In this approach, basic models are trained sequentially which means the output of each model passes to the next model

XGB [32]

XGB is an enhanced version of GB and both techniques use the gradient boosting principle. Differences in details of the modeling process include a more regularized way of formalization to avoid over-fitting and generally provide better performance

GP [31]

GP is a stochastic process that consists of random variables and every finite set of random variables is modeled by a multivariate normal distribution. GP is a supervised learning technique that can be used for both classification and regression problems

NN [31]

NN is a supervised ML algorithm that has single or multiple layers between input and output layers called hidden layers that carry data from input to output layer. NN can be implemented in classification and regression applications. In this study, a feedforward architecture was utilized

  1. KNN, K-neasrest neighbors; BR, Bayesian ridge; DT, decision tree; SVM, support vector machine; SVR, support vector regression; RF, random forest; GB, gradient boosting; XGB, extreme gradient boosting; GP, Gaussian processing; NN, neural network