Skip to main content

Classification of coronary artery disease using radial artery pulse wave analysis via machine learning

Abstract

Background

Coronary artery disease (CAD) is a major global cardiovascular health threat and the leading cause of death in many countries. The disease has a significant impact in China, where it has become the leading cause of death. There is an urgent need to develop non-invasive, rapid, cost-effective, and reliable techniques for the early detection of CAD using machine learning (ML).

Methods

Six hundred eight participants were divided into three groups: healthy, hypertensive, and CAD. The raw data of pulse wave from those participants was collected. The data were de-noised, normalized, and analyzed using several applications. Seven ML classifiers were used to model the processed data, including Decision Tree (DT), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting (LightGBM), and Unbiased Boosting with Categorical Features (CatBoost).

Results

The Extra Trees classifier demonstrated the best classification performance. After tunning, the results performance evaluation on test set are: 0.8579 accuracy, 0.9361 AUC, 0.8561 recall, 0.8581 precision, 0.8571 F1 score, 0.7859 kappa coefficient, and 0.7867 MCC. The top 10 feature importances of ET model are w/t1, t3/tmax, tmax, t3/t1, As, hf/3, tf/3/tmax, tf/5, w and tf/3/t1.

Conclusion

Radial artery pulse wave can be used to identify healthy, hypertensive and CAD participants by using Extra Trees Classifier. This method provides a potential pathway to recognize CAD patients by using a simple, non-invasive, and cost-effective technique.

Peer Review reports

Introduction

Coronary artery disease (CAD) is a kind of cardiovascular disease which has affected human population in both developed and developing countries [1], which is also the main cause of death around the world [2]. CAD is the most prevalent form of cardiovascular disease, characterized by the accumulation of lipids and immune cells [3] in the subendothelial space of the coronary arteries, leading to atherosclerosis. An inflammatory response in the vascular endothelium is elicited in the process [4, 5]. Genome-wide association studies (GWAS) have identified several genetic variants that are robustly associated with CAD [6]. It is important to emphasize cardiovascular health at an early stage and adopt strategies that focus on prevention of hypertension, dyslipidemia, diabetes, obesity and smoking [7,8,9,10,11,12,13].

Radial artery pulse represents a complex interplay of circulatory functions. The condition of the pulse varies with the status of the circulatory system. The pulse can indicate whether the aortic valve is functioning properly, whether the heart beats rhythmically, and the elasticity of the arteries. Furthermore, since the circulatory system is closely linked to the body’s internal organs, any changes in tissue metabolism can significantly impact blood circulation, while notable alterations in bodily diseases can affect the function of the circulatory system to varying degrees. Therefore, the pulse reflects not only changes in the circulatory system but also alterations in other organs and systems, which is also associated with the basic theory of Traditional Chinese Medicine (TCM). TCM practitioners sense the radial artery pulse waveforms in patients’ wrists to make diagnoses based on their subjective personal experience. Research on pulse acquisition platforms and computerized analysis methods can facilitate objective studies in pulse diagnosis, many studies have demonstrated a specific application in diagnosis of diseases, assessment of disease progression and prognosis [14,15,16,17,18], thereby enabling TCM to align with advancements in modern medicine [19].

Over the past decades, artificial intelligence (AI) has been used for a wide range of tasks, such as painting, driving, conversation and healthcare, which includes lots of machine learning (ML) algorithms [20]. Traditional methods for diagnosing and managing CAD encompass the evaluation of medical history, physical examinations, and various imaging techniques, including angiography. However, these methods often have limitations in terms of accuracy, invasiveness, and cost. In recent years, ML has emerged as a powerful tool that can enhance the diagnosis, prediction, and treatment of CAD. Those methods explore how machine learning approaches are being utilized in the field of coronary artery disease, highlighting their benefits and potential challenges. Several machine learning techniques are commonly employed in the context of CAD, each offering unique strengths and capabilities. These techniques include Supervised Learning which approach is particularly useful for classification and regression tasks in CAD; Classification Algorithms which can be used to classify patients into different risk categories based on clinical data; Clustering Techniques which can be useful for tailoring treatment plans. Deep Learning which capable of learning complex patterns in large datasets; Convolutional Neural Networks (CNNs) which Particularly useful in image analysis, such as interpreting medical imaging for CAD diagnosis. Recurrent Neural Networks (RNNs) which Suitable for analyzing time-series data, such as monitoring heart rate and other vital signs over time. Machine learning techniques have been applied in various aspects of CAD management, including diagnosis, risk prediction, treatment planning and management. The future of ML in CAD looks promising, with ongoing research and advancements expected to address current limitations and unlock new possibilities.

Methods

The study protocol was approved by the IRB of Shanghai University of Traditional Chinese Medicine (Approval Number:2023–3-10–08-08) and the study complied with the Declaration of Helsinki.

Study subjects

This case–control study involved three groups. The CAD group included 226 patients with CAD, the healthy group included 196 normal participants and the hypertension group included 186 patients with only hypertension. All participants were recruited from the Shanghai Municipal Hospital of Traditional Chinese Medicine, Shuguang Hospital, Yueyang Hospital, and Longhua Hospital, all affiliated with the Shanghai University of Traditional Medicine, between September 2019 and December 2021.

Inclusion criteria

The inclusion criteria were as follows:

  • CAD patients should fit the "Nomenclature and criteria for diagnosis of ischemic heart disease" [21].

  • Patients in the hypertension group should be hypertensive and without any organic lesions of the heart.

  • Participants in the healthy group were required to be free from any cardiovascular disease or hypertension.

  • All participants required to finish the collection of pulse wave and sign the informed consent form.

Exclusion criteria

The exclusion criteria were as follows:

  • Participants with arrhythmia, valvular heart disease, or severe heart failure.

  • Participants with severe endocrine, blood, metabolic system diseases, severe gastrointestinal disease or kidney diseases.

  • Participants with malignant tumors.

  • Participants who refused to participate.

  • Severe incomplete clinical data.

Sensors of pulse wave collection

Pulse wave collection primarily utilizes sensors to gather the human body’s pulse, which then converted into electrical signals to display the pulse graph on a computer. Therefore, the selection of sensor is a crucial aspect of pulse wave collection.

Piezoelectric pressure transducer

A piezoelectric sensor converts pulse pressure signals into electrical signals using piezoelectric materials with specific piezo characteristics. When pressure is applied to the piezoelectric material, a charge proportional to the pressure is generated on its surface [22,23,24,25]. By measuring this charge, the pulse wave information can be collected.

Piezoresistive pressure transducer

A piezoresistive pressure transducer is made of single-crystal silicon exhibiting piezoresistive effects. Its principle involves using a single-crystal silicon wafer as an elastic element and employing circuit technology to diffuse a set of equivalent resistors in specific directions on the silicon diaphragm. These resistors are combined into a bridge circuit, enabling the silicon to be used in standard pressure transducer applications. As external pressure changes, the silicon wafer also deforms, causing the strain resistors on the diaphragm to vary accordingly. This variation is proportional to the measured pressure, allowing the bridge circuit to produce a corresponding voltage output signal [26,27,28,29].

Photoelectric sensors

Photoelectric sensors are among the most common types of sensors. Their working principle is based on the photoelectric effect, which involves converting light signals into electrical signals. The photoelectric effect refers to the phenomenon where electrons in certain materials absorb the energy of photons when exposed to light, resulting in a corresponding electrical effect. The working principle of photoelectric sensors in measuring pulse wave signals is that the heart's pulsation causes blood to flow in a fluctuating manner, leading to corresponding changes in blood volume within the blood vessels. The amount of light absorbed by the blood is related to the blood volume within the vessels. Therefore, when light of a constant wavelength illuminates the skin tissue, the energy of the light penetrating the tissue changes correspondingly with the fluctuations in blood volume, thus enabling the measurement of the body's pulse wave signals [30,31,32,33,34].

Doppler ultrasound

The working principle of Doppler ultrasound primarily involves three-dimensional reconstruction and imaging. Utilizing these techniques, it tracks and analyzes blood flow velocity, vascular lumen volume, and the three-dimensional movement of blood vessels, thereby depicting pulse pulsation to obtain pulse signals. The application of this technology is beneficial the mechanism of pulse formation. Consequently, in recent years, Doppler ultrasound has gained widespread application and attention [35,36,37,38].

Collection of pulse wave

The pulse signals were collected from the pulsation of the radial artery on the palmar side of the left wrist by using the SmartTCM-1 pulse wave digital acquisition analyzer (Shanghai Asia & Pacific Computer Information System Co., Ltd. Shanghai, China). The sensor consists of a piezoelectric pressure transducer. Each collection cycle lasted 60 s, and additional cycles were performed if the initial collection was unsatisfactory. The optimal pulse graph was selected for extracting and analyzing pulse graph parameters. Participants were required to keep calm and fast at least 30 min before the collection. Throughout the collection process, participants had to breathe calmly, sit upright, and keep their left arm relaxed and naturally stretched forward, with the wrist placed on a pulse cushion facing up and fingers slightly bent. All participants were also required to avoid violent mood swings during the whole test.

PulseSystem software (jointly developed by our research group and East China University of Science and Technology (Shanghai)) was used to de-noise pulse signals and extract the raw data of pulse wave. After that PulseAnalyseGraphic v1.1 software was used to process and calibrate the raw data, then export the data for analyzing.

To ensure the prevention of bias, a consistent pulse wave digital acquisition analyzer was utilized for data collection. The data collection process was conducted by the same set of thoroughly trained personnel. Data entry was performed using a double-entry and verification method, with re-entry conducted by the same trained personnel.

Time-domain parameters of the radial pulse wave signal

Figure 1 shows the classical time-domain parameters used in pulse wave analysis, including the amplitude, duration, area under the curve, and their ratio of each feature point. Table 1 shows the definition and meaning of classical time-domain characteristics of pulse wave.

Fig. 1
figure 1

The classical time-domain parameters

Table 1 The definition and meaning of classical time-domain characteristics of pulse wave

Newly introduced pulse time domain parameters

The triple-peaked wave is the typical morphology of a normal cycle of pulse, and it is on the basis of the triple-peaked wave that the traditional time-domain parameters of the pulse are defined. However, in CAD patients, due to changes in vascular elasticity and cardiac function, the typical triple-peaked wave morphology is uncommon, and bimodal and unimodal waves are predominant (Fig. 2), which makes the localization of the repetitive pre-pulsation wave more difficult. For this kind of pulse wave, the rules based on the classification of pulse wave morphology are mostly used for localization (Fig. 3). However, on the one hand, such localization methods are sensitive to the pulse wave morphology, and the probability of localization error is higher when the pulse wave morphology differs greatly from the typical morphology; on the other hand, the physiological significance of the characteristic points obtained by such methods has not been adequately studied, and it is not possible to determine whether they have the same physiological significance as the peaks and valleys of the three-peak wave. Therefore, it is difficult to adequately characterize pulse wave morphology in patients with coronary artery disease by conventional pulse time-domain parameters alone.

Fig. 2
figure 2

Examples of bimodal wave (left) and monomodal wave (right)

Fig. 3
figure 3

Schematic diagram of bimodal wave (left) and unimodal wave (right) feature point localization methods

The mathematic formula and definition of sampling frequency of pulse wave collection are as follows:

$$\begin{array}{c}{f}_{s}=\frac{1}{{T}_{s}}\end{array}$$
(1)

\({f}_{s}\) is the sampling frequency in hertz, \({T}_{s}\) is the sampling period in seconds.

The preliminary research of the our research group [39] believes that based on the study of aortic systolic pressure, namely the moving average method—setting a moving window according to the sampling frequency (represented by f) to calculate the moving average of the radial pulse wave, and taking the maximum value of the moving average as the estimated value of aortic systolic pressure can accurately calculate aortic systolic pressure from the radial artery waveform. When using the moving average method to calculate aortic systolic pressure, there is no consensus on the selection method for the width of the moving window. However, it is generally believed that the optimal width is between 1/3 and 1/6 of the sampling frequency [40,41,42]. Therefore, in this study, the moving average was calculated based on the moving window widths of 1/3, 1/4, 1/5, and 1/6 of the sampling frequency, with the maximum values represented by hf/3, hf/4, hf/5, and hf/6, respectively. The values at which the pulse wave pressure first reaches this maximum value after the main wave peak are represented by tf/3, tf/4, tf/5, and tf/6 (Fig. 4) [43].

Fig. 4
figure 4

Moving average of the radial pulse wave

This study also included the time when the maximum value of the waveform appeared (represented by tmax) to observe its difference from the t1 and t3 extracted by traditional methods in patients with coronary heart disease, and included the time when the waveform rose to 80% and 90% of the maximum value respectively (represented by t0.8 and t0.9), (Fig. 5), in order to more accurately reflect the duration of the rapid ejection period of the heart, and to supplement more information when the main wave and the pre beat wave are fused.

Fig. 5
figure 5

Schematic diagram of the new main wave correlation time value indicator

According to the previous research results of our research group, the Euclidean distance (De) between the ascending branch data of the two main waves in the average pulse chart can effectively reflect the variability of the participant's heart rate, and this indicator has a good effect on identifying atrial fibrillation. Considering that De is not only related to heart rate variability, but also influenced by the duration of the main wave. Therefore, we divided the data points of the rising branch of the main wave based on De (represented by n1, n1 = t1·f) to offset the influence of the time value of the main wave. The metrics included in this study to measure heart rate variability were De/n1.

Machine learning classification

ML covers various knowledge and technologies such as probability theory, statistics, and complex algorithms [44,45,46]. This discipline uses computers as its main tool, aiming to simulate human learning methods in real and real-time, and further improve learning efficiency by dividing existing content into knowledge structures [47]. This enables computers to continuously improve the performance of specific algorithms in experiential learning, achieving infinite proximity to (partial) human intelligence [48]. In the era of big data, with the great improvement of software and hardware devices and data storage capabilities, using machine learning technology to deeply analyze complex and diverse, large and even massive data, and more efficiently utilizing information has become an important direction of research in the field of machine learning [49]. Machine learning is gradually developing towards intelligent data analysis, and it will become an important foundation for intelligent data analysis technology.

ML classification is used to predict categories, in this study, which are the CAD group, hypertensive group and the healthy group. Figure 6 shows the flowchart of the ML classification process.

Fig. 6
figure 6

Flowchart of the collection and analysis of pulse wave and the process of ML

This study used common ML libraries and frameworks [50] such as Decision Tree Classifier、Random Forest Classifier、Gradient Boosting Classifier、Extra Trees Classifier、Extreme Gradient Boosting、Light Gradient Boosting Machine、CatBoost Classifier, etc. After the identification models were built, the top learner under the default hyperparameters was selected to conduct randomized grid search (RandomizedSearchCV), automatically adjusted the hyperparameters of the model to reduce the time needed for searching and improve efficiency. The entire procedure used ten-fold cross-validation, and the samples were split into train set and test set in the ratio of 7:3. The final evaluation was based on the results of the test set.

The Decision Tree (DT) [51] algorithm uses a tree structure and uses layers of reasoning to achieve the final classification. Decision tree mainly consists of root nodes containing the full set of samples, internal nodes corresponding to the feature attributes tested, and leaf nodes representing the results of the decisions. It is a supervised learning algorithm based on if–then-else rules, where the prediction is made with a certain attribute value at the internal node of the tree, and the decision of which branch node to enter is made based on the result of the judgment until it reaches the leaf node, where the classification result is obtained. DT is the simplest machine learning algorithm, it is easy to implement, interpretable, fully consistent with human intuitive thinking. Therefore, the DT algorithm is used by us to classify CAD patients.

DT, although simple and powerful, may face the problem of excessive variance, i.e., even a small difference in the input data can cause the prediction results to differ significantly from the previous ones, there may be the problem that the results are better for a particular training data, and may not have a better generalization ability. Therefore, Random Forest (RF) [52], which possesses higher robustness, is included in this study. RF belongs to the Bootstrap Aggregation method in integrated learning, which consists of multiple decision trees with no association between different decision trees. When performing a classification task, given a weak learning algorithm and a training set, the algorithm is learned several times, and the prediction is decided based on the number of votes cast on the sequence of prediction functions. Such an approach improves performance by training multiple models and taking the mean of their predictions, usually achieving better results than a single model. We choose RF because it can come out with very high dimensional data and without dimensionality reduction and without having to do feature selection, at the same time, it’s not easily overfitted.

Gradient Boosting Decision Tree (GBDT) is an iterative decision tree algorithm that accumulates the conclusions of multiple decision trees to obtain the final result, which has strong generalization ability. GBDT mainly includes the concepts of Regression Decision Tree, Gradient Boosting (GB) and shrinkage. The core of GBDT is to accumulate the results of all the regression trees as the result. The regression tree gets a prediction at each node (not necessarily a leaf node). The branching exhausts each threshold of each feature to find the best split, measured by minimizing the mean square deviation. GBDT is the accumulation of the conclusions of all the trees to arrive at the conclusion, which is centered on the fact that each tree learns the residuals of the sum of the conclusions of all the previous trees. We choose GBDT for its good interpretability and robustness and can automatically discover higher-order relationships between features.

Extreme Gradient Boosting (XGBoost) is a model initially proposed by Chen T and Guestrin C in 2011 and continuously optimized and improved with the efforts of many scientists [53], which is a learning framework based on GBDT but possesses strong extensibility, and is currently widely used in data mining related tasks and achieved more satisfactory results. The optimization of XGBoost is mainly reflected in: (1) the use of second-order Taylor's formula expansion to optimize the loss function and improve the computational accuracy; (2) the use of regular terms in order to simplify the model and avoid or fitting; (3) the use of Blocks storage structure, which allows for parallel computation; (4) it can automatically deal with default values. GBDT only uses the first-order Taylor expansion, while XGBoost performs a second-order Taylor expansion on the loss function. The introduction of the second derivative in XGBoost serves to increase accuracy and also allows for the customization of the loss function, as the second-order Taylor expansion can approximate a wide range of loss functions. We want to compare GBDT and XGBoost, to test which algorithm is more suitable for identifying CAD.

Light Gradient Boosting Machine (LightGBM) also utilizes weak classifiers (DT) to iteratively train in order to obtain the optimal model. XGBoost suffers from the problem of traversing the entire dataset when iterating which results in a long training time and high memory consumption, and traverses the separation points when performing information gain calculations, resulting in lower efficiency. resulting in lower efficiency, and finally, XGBoost is not well compatible with cache optimization and causes large cache misses. In order to optimize the above problems, at the end of 2016, Guolin Ke, Qi Meng, Thomas Finley et al. proposed LightGBM [54], which is about three times faster than XGBoost in terms of processing speed alone. The optimizations of LightGBM are mainly reflected in (1) Histogram-based decision tree algorithm. (2) Gradient-based One-Side Sampling. (3) Exclusive Feature Bundling. (4) Leaf-wise with depth constraints -wise leaf growth strategy. (5) Direct support for category features. (6) Support for efficient parallelism. (7) Cache hit rate optimization. Using GOSS could reduce a large number of data instances with only small gradients, so that only the remaining data with high gradients can be utilized when calculating the information gain, saving a lot of time and space overhead compared to XGBoost traversing all the feature values. As (5) methods, LightGBM optimises the support for category features by allowing direct input of category features without additional 0/1 expansion. And decision rules for category features are added to the decision tree algorithm. That’s the reason we chose it for building models.

Extremely randomized trees (ET) algorithm [55] consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. The Extra-Trees algorithm builds an ensemble of unpruned decision or regression trees according to the classical top-down procedure. Its two main differences with other tree-based ensemble methods are that it splits nodes by choosing cut-points fully at random and that it uses the whole learning sample (rather than a bootstrap replica) to grow the trees. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. In clinical, extreme samples will exist, especially in CAD patients, ET provides very strong additional randomness, and this randomness suppresses overfitting. At the same time, ET has faster training speeds compared to RF.

Unbiased Boosting with Categorical Features (CatBoost) is a machine learning library open-sourced by Russian search giant Yandex [56, 57] and is a type of Boosting family of algorithms. CatBoost is also in the GBDT algorithm an improved implementation in the framework, claimed to be an algorithm that performs better than XGBoost and LightGBM in terms of algorithmic accuracy and other aspects. It automatically handles categorical features in a special way. First, it performs some statistics on the categorical features to calculate the frequency of a particular category. Then, it adds hyperparameters to generate new numerical features.

$$\begin{array}{c}\widehat{x}{i}^{k}=\frac{\sum_{j=1}^{n} {\mathbb{I}}_{\left\{x{j}^{i}={x}_{k}^{i}\right\}}\cdot {y}_{j}+ap}{\sum_{j=1}^{n} {\mathbb{I}}_{\left\{x{j}^{i}={x}_{k}^{i}\right\}}+a}\end{array}$$
(2)

After using EQ2 the processed label values can be used instead of the category features. To overcome the poor gradient, in CatBoost, the first stage uses a no-compared estimation of the gradient step size, and the second stage is executed using the traditional GBDT scheme. And to mitigate the poorer estimation of gradient and improve the generalization ability of the model, CatBoost uses sort boosting, but it greatly increases the memory consumption and time complexity, so it is optimized in the stage of tree building.

Data normalization

In this study, to avoid the effect of different feature units on model training, the raw data are min–max normalized, which does not change the relative position of the data and maps the resultant values between [0, 1]. Min–max processing has the following two benefits: first, for features with small fluctuations, it can be maintained to further accentuate this feature; and second, when the raw data are zero, it will not be converted to other values according to this function. Since there is a large amount of data that is zero in this study, min–max normalization is used.

Let \({x}_{min}\) denote the minimum value of a specific value in a feature, and \({x}_{max}\) denote the maximum value of a specific value in a feature, the conversion function is as follows:

$$\begin{array}{c}{x}^{*}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}\end{array}$$
(3)

Classification metrics

In this study, the main metrics used for model performance evaluation are Precision, Recall, Accuracy, AUC, F1, Kappa and MCC.

Confusion matrix

The confusion matrix (shown in Table 2) is used to summarize the results of a classifier and is a standard format for accuracy evaluation.

Table 2 Confusion matrix

TP(True Positive): Positive sample predicted by the model as the positive category. The larger the TP value, the better the model.

FN(False Negative): Positive sample predicted by the model as the negative category. The smaller the FN value, the better the model.

FP(False Positive): Negative sample predicted by the model as the positive category. The smaller the FP value, the better the model.

TN(True Negative): Negative sample predicted by the model as the negative category. The larger the TN value, the better the model.

$$\begin{array}{c}Precision=\frac{TP}{TP+FP}\times 100\%\end{array}$$
(4)
$$\begin{array}{c}Recall=\frac{TP}{TP+FN}\times 100\%\end{array}$$
(5)
$$\begin{array}{c}Accuracy=\frac{TP+TN}{P+N}\times 100\%\end{array}$$
(6)

Receiver operating characteristic curve

Area under the Receiver Operating Characteristic curve (ROC) is a useful metric to visualize and evaluate classification ability. ROC graph reveals the relationship between true positive rate (TPR) and false positive rate (FPR). AUC ranges from 0 to 1.0, 0.5 means random guessing, the larger the AUC the better the model is [58, 59].

Kappa coefficient

The Kappa coefficient is an indicator used for consistency testing and can also be used to measure the effectiveness of classification. Because for classification problems, consistency refers to whether the predicted results of the model are consistent with the actual classification results. The calculation of Kappa coefficient is based on the confusion matrix, with values ranging from -1 to 1, usually greater than 0.

The formula for calculating the kappa coefficient based on the confusion matrix is as follows:

$$\begin{array}{c}kappa=\frac{{p}_{o}-{p}_{e}}{1-{p}_{e}}\end{array}$$
(7)

wherein:

\({p}_{o}=\frac{Sum of diagonal elements}{Sum of the elements of the entire matrix}\), as the same as accuracy.

\({p}_{e}=\frac{{\sum }_{i}Sum of elements in the i-th row\times Sum of elements in i-th column }{{(\sum All elements of the matrix)}^{2}}\), the sum of the "product of actual and predicted numbers" corresponding to all categories, divided by the "square of the total number of samples".

The most common evaluation index in classification problems is the accuracy rate, which can directly reflect the proportion of correct scores, and at the same time the calculation is very simple. However, in actual classification problems, the number of samples in each category is often not balanced. In this unbalanced dataset, if not adjusted, the model is easy to be more oriented to the large categories and give up the small categories. At this time, the overall accuracy is quite high, but some categories cannot be recalled at all. At this point, a metric that can penalize the model's "more directionality" is needed to replace the accuracy rate. According to the kappa formula, the more unbalanced the confusion matrix, the higher the \({p}_{e}\) is, the lower the kappa value is, which is exactly able to give a low score to the model with strong "more directionality".

MCC coefficient

When the categories are unbalanced, the assessment index of accuracy cannot focus on a few categories, to solve this problem, the predicted results and the real results can be seen as two 0–1 distributions, and then the similarity of the two distributions can be measured by Matthews Correlation Coefficient (MCC) [60], and the MCC coefficient is calculated by the following formula:

$$\begin{array}{c}Matthews-score\left(MCC\right)=\frac{TP*TN-FP*FN}{\sqrt{\left(TP+FP\right)\left(FN+TP\right)\left(FN+TN\right)\left(FP+TN\right)}}\end{array}$$
(8)

When FP = FN = 0, the full prediction is correct and MCC = 1, when the full prediction is incorrect and MCC = -1, at which point label reversal is sufficient. When MCC = 0, it indicates that the model is no better than a random prediction.

Feature importance

Feature importance is a metric that assesses the relative significance of each input feature in a machine learning model’s prediction. It quantifies how much each feature contributes to the model’s ability to make accurate predictions. The higher the value the more important the feature.

In our study, we use Gini impurity for DT. The formula is:

$$\begin{array}{c}{\sum }_{i=1}^{C}{f}_{i}\left(1-{f}_{i}\right)\end{array}$$
(9)

fi is the frequency of label i at a node and C is the number of unique labels.

RF construct many individual decision trees at training. Predictions from all trees are pooled to make the final prediction. As they use a collection of results to make a final decision, they are referred to as Ensemble techniques (such as GBDT, LightGBM, XGBoost, ET, etc.). Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The formula is:

$$\begin{array}{c}{RFfi}_{i}=\frac{{\sum }_{j\in all trees}normf{i}_{ij}}{T}\end{array}$$
(10)

RFfii = the importance of feature i calculated from all trees in the RF model.

normfiij = the normalized feature importance for i in tree j.

T = total number of trees.

Results

Seven ML algorithms above were used for diagnostic model construction. Table 3 shows the performance of seven ML models with default hyperparameters. tenfold cross-validation was applied to the training data, the data were further split into train/test sets for 10 folds, the folds are made by preserving the percentage of subjects for each class. The final evaluation aimed to check the general ability of models to predict unseen data. Extra Trees Classifier has the best performance with the highest accuracy, AUC, recall, precision, F1, kappa and MCC value. Considering the total time (TT) comprehensively, ET was chosen as the most suitable ML model classifier for future work.

Table 3 The performance comparison of different ML models (mean ± std)

Table 4 shows the nonparametric test for classification metrics of 7 ML models, although ET model gets all 7 highest metrics, but it doesn’t reach statistical significance among top 3 models.

Table 4 Nonparametric test for classification metrics of 7 ML models (Kruskal–Wallis test, M (QL, QU))

From Figs. 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20, the ROC Curves, confusion matrix, and feature importance for all 7 ML models on test set are shown.

Fig. 7
figure 7

ROC Curves and confusion matrix for ExtraTreesClissifier on test set

Fig. 8
figure 8

Feature importance of ExtraTreesClissifier on test set

Fig. 9
figure 9

ROC Curves and confusion matrix for CatBoostClassifier on test set

Fig. 10
figure 10

Feature importance of CatBoostClassifier on test set

Fig. 11
figure 11

ROC Curves and confusion matrix for RandomFoerstClassifier on test set

Fig. 12
figure 12

Feature importance of RandomFoerstClassifier on test set

Fig. 13
figure 13

ROC Curves and confusion matrix for LightGBMClassifier on test set

Fig. 14
figure 14

Feature importance of LightGBMClassifier on test set

Fig. 15
figure 15

ROC Curves and confusion matrix for GBCClassifier on test set

Fig. 16
figure 16

Feature importance of GBCClassifier on test set

Fig. 17
figure 17

ROC Curves and confusion matrix for XGBoostClassifier on test set

Fig. 18
figure 18

Feature importance of XGBoostClassifier on test set

Fig. 19
figure 19

ROC Curves and confusion matrix for DecisionTreeClassifier on test set

Fig. 20
figure 20

Feature importance of DecisionTreeClassifier on test set

To create better model, we tuned the hyperparameters with GridSearchCV. The cross-validation result is shown in Table 5. The average results were 86.6% accuracy, 91.36% AUC, 86.6% recall, 87.27% precision, 86.58% F1 score, 0.7984 kappa coefficient and 0.8018 MCC.

Table 5 Tenfold cross-validation result of tuned Extra Trees Classifier

The ROC curves and confusion matrix for tuned ExtraTreesClassifier on test set are plotted in Fig. 21.

Fig. 21
figure 21

ROC Curves and confusion matrix for tuned ExtraTreesClassifier on test set

The feature importance of tuned ET model is plotted in Fig. 22. The top 10 features are w/t1, t3/tmax, tmax, t3/t1, As, hf/3, tf/3/tmax, tf/5, w and tf/3/t1.

Fig. 22
figure 22

Feature importance of tuned ExtraTreesClassifier on test set

The performance evaluation results of tuned Extra Trees Classifier on test set are shown in Table 6.

Table 6 Performance evaluation of tuned Extra Trees Classifier on test set

Conclusion and discussion

In this study, we have employed modern machine learning techniques to explore the relationship of pulse among healthy, hypertensive and CAD participants. 7 computational models have been developed and Table 3 has shown that Extra Trees Classifier had the best classification performance with 0.8519 accuracy, 0.9151 AUC, 0.8424 recall, 0.8598 precision, 0.851 F1-score, 0.777 kappa, and 0.7814 MCC. Among them, ET model achieves all 7 best performances. After tuning the hyperparameters of ET model with GridSearchCV, as Table 6 shows, the performance evaluation of tuned Extra Trees Classifier is 0.8579 accuracy, 0.9361 AUC, 0.8561 recall, 0.8581 precision, 0.8571 F1-score, 0.7859 kappa, and 0.7867 MCC. Almost all performance evaluation of tuned ET model has slightly raised. Considering the computational efficiency and accuracy, we believe that ET might be the preferred option for model selection.

From the feature importance analysis of tuned ET, we noted that the top 10 features are w/t1, t3/tmax, tmax, t3/t1, As, hf/3, tf/3/tmax, tf/5, w and tf/3/t1. These features are related to left ventricular function and aortic pressure directly or indirectly. This result revealed that the left ventricle function and aortic pressure are the prominent factors to distinguish CAD patients and patient with hypertension from healthy participants, which are already identified by previous studies [61,62,63,64,65,66].

There are fewer existing studies on machine learning classification of CAD based on radial artery pulse wave analysis recently. In 2021, Zhang et al. [67] utilized K-Nearest Neighbors (KNN), DT, and RF algorithms to develop classification models using baseline dataset, time-domain features, and Multiscale entropy (MSE) features. The results demonstrated that the RF-based model achieved the highest average precision of 80.98%, surpassing both KNN and DT. In 2023, Wu et al. [68] utilized DT and RF algorithms to develop classification models using time-domain features, MSE features, and general information. The average precision, recall and F1-score of BNP Level 3 group RF model were 91.048%, 90.897% and 90.797%, outperforming the DT model. In 2023, Yan et al. [69] utilized RF, Support Vector Machine (SVM), KNN and DT algorithms to develop classification models using microcirculatory characteristic parameter set. The results showed that RF showed good classification performance, the identification accuracy of the model built on the microcirculatory characteristic parameter set and RF algorithm all reached more than 88%. The highest recognition accuracy was 95.51% for coronary heart disease samples, 92.11% for healthy samples, and 88.55% for hypertensive samples. In 2004, Ma et al. utilized RF algorithm to develop classification model using wrist pressure pulse waves and fingertip photoplethysmography (FPPG) to assess the severity of coronary artery lesions. The results showed that RF model achieved an accuracy, precision, recall, and F1-score of 78.79%, 78.69%, 78.79%, and 78.70%, respectively. All studies provide invaluable insights into the novel development of diagnostic devices imbued with TCM principles and their potential in managing CADs.

In comparison to previous studies, this study introduces features of moving averages and the time when the waveform rose to 80% and 90% of the maximum value. These approaches aim to provide a more comprehensive description of radial artery pulse wave morphology in patients with CAD, accurately reflect the duration of the rapid ejection phase of the heart, and enhance more information when the main wave and the pre beat wave are fused. These improvements contribute to the study’s depth and increase its applicability to clinical research. While most previous studies utilized DT and RF algorithms for modeling, this study not only utilizes DT and RF algorithms but also incorporates several integrated methods based on decision trees. These approaches ensure high accuracy and AUC, mitigate overfitting, and enhance robustness and operational efficiency as the data size increases. Finally, the feature importance of model is extracted and ranked, providing a degree of interpretability that aids in guiding data collection efforts and enhancing the interpretation of clinical predictions. This level of interpretability facilitates acceptance and understanding among researchers.

The correlation between pulse diagnosis and CAD was clarified in this study. Given the high prevalence of CAD and its serious consequences, this approach, based on ML, provides a systematic way to "learn" the correlation among the pulse wave data, hypertension and CAD. Although this "learning" approach may not be able to understand the underlying biological mechanisms, this approach may be very helpful for the early diagnosis of CAD in clinical practice, which could be easy, non-invasive and cheap for patients.

In recent decades, wearable devices that monitor physiological signals have been increasingly utilized in diagnostics and treatment, significantly contributing to the fields of medicine and health care. This study may serve as a reference for the development of wearable devices capable of detecting cardiovascular lesions. It can also be expanded to more diagnostic applications in the future.

To achieve this long-term goal, we have several potential obstacles to occur: (1) Dataset size, right now, there are only about 1,500 samples of CAD and hypertensive patients, which are not enough to train some deep learning models with high accuracy and validate models we have now. We will collect more patient data to enrich our database in the following work, in order to be able to further improve our model. (2) De-noise of raw data. In this study, the raw data in the acquisition process may exist some interference data, such as the acquisition of patients by the external influence of the existence of emotional fluctuations, etc., we will further optimise the collection environment, try to isolate the external interference, and at the same time, the application of a variety of noise reduction algorithms to further reduce the noise interference during the follow-up work, in order to get more high-quality data. (3) The acquisition accuracy of device. The SmartTCM-1 pulse wave digital acquisition analyzer will be updated in the future for more sensitive pressure sensing and data pre-denoise.

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due to ethical concern but are available from the corresponding author on reasonable request.

Abbreviations

CAD:

Coronary artery disease

TCM:

Traditional Chinese Medicine

ML:

Machine learning

AI:

Artificial intelligence

DT:

Decision Tree

RF:

Random Forest

GBDT:

Gradient Boosting Decision Tree

ET:

Extra Trees

XGBoost:

Extreme Gradient Boosting

LightGBM:

Light Gradient Boosting

CatBoost:

Unbiased Boosting with Categorical Features

AUC:

Area under curve

ROC:

Receiver Operating Characteristic curve

TPR:

True positive rate

FPR:

False positive rate

MCC:

Matthews Correlation Coefficient

TT:

Total time

KNN:

K-Nearest Neighbors

MSE:

Multiscale entropy

SVM:

Support Vector Machine

FPPG:

Fingertip photoplethysmography

References

  1. GBD 2013 Mortality and Causes of Death Collaborators. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet (London, England). 2015;385(9963):117–71.

    Article  Google Scholar 

  2. Mastoi QU, Wah TY, Gopal Raj R, Iqbal U. Automated Diagnosis of Coronary Artery Disease: A Review and Workflow. Cardiol Res Pract. 2018;2018:2016282.

    Article  PubMed  Google Scholar 

  3. Ross R. Atherosclerosis — An Inflammatory Disease. N Engl J Med. 1999;340(2):115–26.

    Article  PubMed  CAS  Google Scholar 

  4. Gao W, Liu H, Yuan J, Wu C, Huang D, Ma Y, Zhu J, Ma L, Guo J, Shi H, et al. Exosomes derived from mature dendritic cells increase endothelial inflammation and atherosclerosis via membrane TNF-α mediated NF-κB pathway. J Cell Mol Med. 2016;20(12):2318–27.

    Article  PubMed  CAS  Google Scholar 

  5. Herrero-Fernandez B, Gomez-Bris R, Somovilla-Crespo B, Gonzalez-Granado JM. Immunobiology of Atherosclerosis: A Complex Net of Interactions. Int J Mol Sci. 2019;20(21):5293.

    Article  PubMed  CAS  Google Scholar 

  6. Lieb W, Vasan R. Genetics of Coronary Artery Disease. Circulation. 2013;128(10):1131–8.

    Article  PubMed  Google Scholar 

  7. Kitazawa M, Fujihara K, Osawa T, Yamamoto M, Yamada MH, Kaneko M, Matsubayashi Y, Yamada T, Yamanaka N, Seida H, et al. Risk of coronary artery disease according to glucose abnormality status and prior coronary artery disease in Japanese men. Metabolism Clin Exp. 2019;101:153991.

    Article  CAS  Google Scholar 

  8. Oe M, Fujihara K, Harada-Yamada M, Osawa T, Kitazawa M, Matsubayashi Y, Sato T, Yaguchi Y, Iwanaga M, Seida H, et al. Impact of prior cerebrovascular disease and glucose status on incident cerebrovascular disease in Japanese. Cardiovasc Diabetol. 2021;20(1):174.

    Article  PubMed  CAS  Google Scholar 

  9. Fujihara K, Matsubayashi Y, Yamamoto M, Osawa T, Ishizawa M, Kaneko M, Matsunaga S, Kato K, Seida H, Yamanaka N, et al. Impact of body mass index and metabolic phenotypes on coronary artery disease according to glucose tolerance status. Diabetes Metab. 2017;43(6):543–6.

    Article  PubMed  CAS  Google Scholar 

  10. Kabootari M, Asgari S, Ghavam SM, Abdi H, Azizi F, Hadaegh F. Long term prognostic implication of newly detected abnormal glucose tolerance among patients with stable cardiovascular disease: a population-based cohort study. J Transl Med. 2021;19(1):277.

    Article  PubMed  CAS  Google Scholar 

  11. Manikpurage HD, Paulin A, Girard A, Eslami A, Mathieu P, Thériault S, Arsenault BJ. Contribution of Lipoprotein(a) to Polygenic Risk Prediction of Coronary Artery Disease: A Prospective UK Biobank Analysis. Circ Genom Precision Med. 2023;16(5):470–7.

    Article  CAS  Google Scholar 

  12. Ahmadzadeh K, Roshdi Dizaji S, Kiah M, Rashid M, Miri R, Yousefifard M. The value of Coronary Artery Disease - Reporting and Data System (CAD-RADS) in Outcome Prediction of CAD Patients; a Systematic Review and Meta-analysis. Arch Acad Emerg Med. 2023;11(1):e45.

    PubMed  Google Scholar 

  13. Goldsborough E 3rd, Osuji N, Blaha MJ. Assessment of Cardiovascular Disease Risk: A 2022 Update. Endocrinol Metab Clin North Am. 2022;51(3):483–509.

    Article  PubMed  Google Scholar 

  14. Zhang J, Liao J, Wang T, Yuan R, Zhao Y, Han Z, Tang L, Zhao L. Effects of joy and sorrow on pulse-graph parameters in healthy female college students based on emotion-evoked experiments. Explore. 2020;17(4):303–11.

    Article  PubMed  Google Scholar 

  15. Wang W, Zeng W, Chen X, Tu L, Xu J, Yin X. Parameter study on characteristic pulse diagram of polycystic ovary syndrome based on logistic regression analysis. J Obstet Gynaecol. 2022;42(8):3712–9.

    Article  PubMed  CAS  Google Scholar 

  16. Lim J, Li J, Feng X, Feng L, Xia Y, Xiao X, Wang Y, Xu Z. Machine learning classification of polycystic ovary syndrome based on radial pulse wave analysis. BMC Complement Med Ther. 2023;23(1):409.

    Article  PubMed  Google Scholar 

  17. Wan WK, Hsu TL, Chang HC, Wan YY. Effect of acupuncture at Hsien-Ku (St-43) on the pulse spectrum and a discussion of the evidence for the frequency structure of Chinese medicine. Am J Chin Med. 2000;28(1):41–55.

    Article  PubMed  CAS  Google Scholar 

  18. Lim J, Li J, Feng X, Feng L, Xiao X, Xia Y, Wang Y, Qian L, Yang H, Xu Z. Machine learning-based evaluation of application value of traditional Chinese medicine clinical index and pulse wave parameters in the diagnosis of polycystic ovary syndrome. Eur J Integr Med. 2023;64:102311.

    Article  Google Scholar 

  19. Baik Y. A study on The Characteristic of Traditional Pediatric Pulse Diagnosis. J Korean Med Classics. 2014;27(1):111–22.

    Article  Google Scholar 

  20. Alizadehsani R, Abdar M, Roshanzamir M, Khosravi A, Kebria PM, Khozeimeh F, Nahavandi S, Sarrafzadegan N, Acharya UR. Machine learning-based coronary artery disease diagnosis: A comprehensive review. Comput Biol Med. 2019;111:103346.

    Article  PubMed  Google Scholar 

  21. Nomenclature and criteria for diagnosis of ischemic heart disease. Report of the Joint International Society and Federation of Cardiology/World Health Organization task force on standardization of clinical nomenclature. Circulation. 1979;59(3):607–9.

    Article  Google Scholar 

  22. Tressler J, Alkoy S, Newnham R. Piezoelectric Sensors and Sensor Materials. J Electroceram. 1998;2(4):257–72.

    Article  CAS  Google Scholar 

  23. Xin Y, Liu T, Sun HS, Xu Y, Zhu JF, Qian CH, Lin TT. Recent progress on the wearable devices based on piezoelectric sensors. Ferroelectrics. 2018;531(1):102–13.

    Article  CAS  Google Scholar 

  24. Zaszczyńska A, Gradys A, Sajkiewicz P. Progress in the Applications of Smart Piezoelectric Materials for Medical Devices. Polymers (Basel). 2020;12(11):2754.

    Article  PubMed  Google Scholar 

  25. Wang YL, Yu YN, Wei XY, Narita F. Self-Powered Wearable Piezoelectric Monitoring of Human Motion and Physiological Signals for the Postpandemic Era: A Review. Adv Mater Technol. 2022;7(12):20.

    Article  CAS  Google Scholar 

  26. Xu SY, Xu ZG, Li D, Cui TR, Li X, Yang Y, Liu HF, Ren TL. Recent Advances in Flexible Piezoresistive Arrays: Materials, Design, and Applications. Polymers. 2023;15(12):29.

    Article  Google Scholar 

  27. Ding XC, Zhong WB, Jiang HQ, Li MF, Chen YL, Lu Y, Ma J, Yadav A, Yang LY, Wang D. Highly Accurate Wearable Piezoresistive Sensors without Tension Disturbance Based on Weaved Conductive Yarn. ACS Appl Mater Interfaces. 2020;12(31):35638–46.

    Article  PubMed  CAS  Google Scholar 

  28. Lin XZ, Gao S, Fei T, Liu S, Zhao HR, Zhang T. Study on a paper-based piezoresistive sensor applied to monitoring human physiological signals. Sensors Actuators a-Phys. 2019;292:66–70.

    Article  CAS  Google Scholar 

  29. Xu JH, Zhang L, Lai XJ, Zeng XR, Li HQ. Wearable RGO/MXene Piezoresistive Pressure Sensors with Hierarchical Microspines for Detecting Human Motion. ACS Appl Mater Interfaces. 2022;14(23):27262–73.

    Article  CAS  Google Scholar 

  30. Wu QH, Meng Y. Flexible Photoelectric Pulse Detection Sensor and Image Processing of Detection Signal. J Nanoelectron Optoelectron. 2023;18(3):302–10.

    Article  CAS  Google Scholar 

  31. Kyungho KIM. A study for measurement of radial artery oxygen saturation system using photoelectric plenthysmography. J Korea Soc Comput Inform. 2010;15(3):11–8.

    Article  Google Scholar 

  32. Zuo WM, Wang P, Zhang D. Comparison of Three Different Types of Wrist Pulse Signals by Their Physical Meanings and Diagnosis Performance. IEEE J Biomed Health Inform. 2016;20(1):119–27.

    Article  PubMed  Google Scholar 

  33. Dong KK, Chu Y, Tian XC, Fang TQ, Ye XY, Wang XH, Tang F. Wearable Photoelectric Fingertip Force Sensing System Based on Blood Volume Changes without Sensory Interference. ACS Appl Mater Interfaces. 2023;15(29):34578–87.

    Article  PubMed  CAS  Google Scholar 

  34. Fan L, Zhang JC, Wang ZM, Zhang XK, Yao RL, Li Y. Application research of pulse signal physiology and pathology feature mining in the field of disease diagnosis. Comput Methods Biomech Biomed Eng. 2022;25(10):1111–24.

    Article  Google Scholar 

  35. Wang C, Wu Q, Wang ZG, Wei Y, Xu SJ. Difference analysis of Doppler ultrasound blood flow of Cunkou (radial artery) pulse, Renying (carotid artery) pulse, and Fuyang (anterior tibial artery) pulse. J Tradit Chin Med. 2023;43(1):168–74.

    Google Scholar 

  36. Cohen A, Li T, Becker LB, Rolston D, Nelson M, Owens C, Gordon M. Time for a Change: Use of Doppler Ultrasound for Pulse Checks in Cardiac Arrest Patients. Circulation. 2020;142:2.

    Article  Google Scholar 

  37. Chen DN, Clarke J. Analysis of a clinical sign in traditional Chinese medicine using Doppler ultrasound. Australas Radiol. 2001;45(4):452–6.

    Article  PubMed  CAS  Google Scholar 

  38. Chang KV, Wu CH, Wang TG, Hsiao MY, Yeh TS, Chen WS. Pulsed Wave Doppler Ultrasonography for the Assessment of Peripheral Vasomotor Response in an Elderly Population. J Clin Ultrasound. 2011;39(7):383–9.

    Article  PubMed  Google Scholar 

  39. Ding X, Wang Y, Hao Y, Lv Y, Chen R, Yan H. A New Measure of Pulse Rate Variability and Detection of Atrial Fibrillation Based on Improved Time Synchronous Averaging. Comput Math Methods Med. 2021;2021:5597559.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Nichols WWORM, Vlachopoulos C, et al. McDonald’s Blood Flow in Arteries: Theoretical, Experimental and Clinical Principles. London: Hodder Arnold; 2011.

    Google Scholar 

  41. Nichols WW. Clinical measurement of arterial stiffness obtained from noninvasive pressure waveforms. Am J Hypertens. 2005;18(1 Pt 2):3s–10s.

    Article  PubMed  Google Scholar 

  42. Xiao H, Butlin M, Qasem A, Tan I, Li D, Avolio AP. N-Point Moving Average: A Special Generalized Transfer Function Method for Estimation of Central Aortic Blood Pressure. IEEE Trans Biomed Eng. 2018;65(6):1226–34.

    Article  PubMed  Google Scholar 

  43. Quail MA, Steeden JA, Knight D, Segers P, Taylor AM, Muthurangu V. Development and validation of a novel method to derive central aortic systolic pressure from the MR aortic distension curve. J Magnetic Resonance Imaging. 2014;40(5):1064–70.

    Article  Google Scholar 

  44. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl Vis Sci Technol. 2020;9(2):14.

    PubMed  Google Scholar 

  45. Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–30.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Sun X, Yin Y, Yang Q, Huo T. Artificial intelligence in cardiovascular diseases: diagnostic and therapeutic perspectives. Eur J Med Res. 2023;28(1):242.

    Article  PubMed  Google Scholar 

  47. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med. 2018;284(6):603–19.

    Article  PubMed  CAS  Google Scholar 

  48. Dwivedi YK, Hughes L, Ismagilova E, Aarts G, Coombs C, Crick T, Duan Y, Dwivedi R, Edwards J, Eirug A, et al. Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inf Manage. 2021;57:101994.

    Article  Google Scholar 

  49. Duan Y, Edwards JS, Dwivedi YK. Artificial intelligence for decision making in the era of Big Data – evolution, challenges and research agenda. Int J Inf Manage. 2019;48:63–71.

    Article  Google Scholar 

  50. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine Learning in Python. arXivorg 2012.

  51. Yu Z, Haghighat F, Fung BCM, Yoshino H. A decision tree method for building energy demand modeling. Energy Buildings. 2010;42(10):1637–46.

    Article  Google Scholar 

  52. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  53. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 2016. New York: ACM; 2016.

    Google Scholar 

  54. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Neural Information Processing Systems: 2017. 2017.

    Google Scholar 

  55. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42.

    Article  Google Scholar 

  56. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc.; 2018. p. 6639–49.

    Google Scholar 

  57. Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXivorg 2018.

  58. James GWD, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. New York: Springer; 2013.

    Book  Google Scholar 

  59. Hastie TTR, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer; 2009.

    Book  Google Scholar 

  60. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):13.

    Article  Google Scholar 

  61. Yildiz M, Oktay AA, Stewart MH, Milani RV, Ventura HO, Lavie CJ. Left ventricular hypertrophy and hypertension. Prog Cardiovasc Dis. 2020;63(1):10–21.

    Article  PubMed  Google Scholar 

  62. Nagueh SF. Left Ventricular Diastolic Function: Understanding Pathophysiology, Diagnosis, and Prognosis With Echocardiography. JACC Cardiovasc Imaging. 2020;13(1 Pt 2):228–44.

    Article  PubMed  Google Scholar 

  63. Battistoni A, Michielon A, Marino G, Savoia C. Vascular Aging and Central Aortic Blood Pressure: From Pathophysiology to Treatment. High Blood Press Cardiovasc Prev. 2020;27(4):299–308.

    Article  PubMed  Google Scholar 

  64. Zuo J, Chang G, Tan I, Butlin M, Chu S-L, Avolio A. Central aortic pressure improves prediction of cardiovascular events compared to peripheral blood pressure in short-term follow-up of a hypertensive cohort. Clin Exp Hypertens. 2020;42(1):16–23.

    Article  PubMed  CAS  Google Scholar 

  65. Sathe S, Inamdar M, Sathe AS, Tiwaskar M. Noninvasive Measurement of Aortic Pressure and Evaluation of Arterial Stiffness in Patients with Hypertension: An Observational Study. JAPI. 2022;70(7):11–2.

    Article  PubMed  Google Scholar 

  66. Cardoso C, Leite N, Salles G. Relative prognostic importance of aortic and brachial blood pressures for cardiovascular and mortality outcomes in patients with resistant hypertension and diabetes: a two cohorts prospective study. J Hypertens. 2023;41(4):648–57.

    PubMed  CAS  Google Scholar 

  67. Zhang C-K, Liu L. Wu W-j, Wang Y, Yan H-x, Guo R, Yan J: Identifying Coronary Artery Lesions by Feature Analysis of Radial Pulse Wave: A Case-Control Study. Biomed Res Int. 2021;2021:1–8.

    Article  Google Scholar 

  68. Wu W-j. Chen R, Guo R, Yan J, Zhang C-K, Wang Y-Q, Yan H-x, Zhang Y-Q: A novel method for assessing cardiac function in patients with coronary heart disease based on wrist pulse analysis. Ir J Med Sci. 2023;192(6):2697–706.

    Article  PubMed  CAS  Google Scholar 

  69. Yan J, Cai S, Cai X, Zhu G, Zhou W, Guo R, Yan H, Wang Y. Uncertainty quantification of microcirculatory characteristic parameters for recognition of cardiovascular diseases. Comput Methods Programs Biomed. 2023;240:107674.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work is supported by the National Natural Science Foundation of China (NSFC, Grant No.81673880), Shanghai Science and Technology Innovation Action Plan Technical Standards Program, China (Grant No.21DZ2203100), Shanghai Key Laboratory of Health Identification and Assessment Project, China (Grant No.21DZ2271000).

Author information

Authors and Affiliations

Authors

Contributions

Y. L and H.-M. W provided the research conceptualization. H.-X. Y, R. G, Y.-J. X, and R. L provided the research methods. R. C, W.-Y. H and J. H acquired the data. Y. L analyzed the data and performed machine learning, was a major contributor in writing the manuscript. All authors edited and revised the manuscript and approved the final version. Y. W and J. X in charge of funding acquisition.

Corresponding author

Correspondence to Jin Xu.

Ethics declarations

Ethics approval and consent to participate

The study protocol was approved by the IRB of Shanghai University of Traditional Chinese Medicine (Approval Number:2023–3-10–08-08) and the study complied with the Declaration of Helsinki. All participants received informed consent form and signed.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyu, Y., Wu, HM., Yan, HX. et al. Classification of coronary artery disease using radial artery pulse wave analysis via machine learning. BMC Med Inform Decis Mak 24, 256 (2024). https://doi.org/10.1186/s12911-024-02666-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12911-024-02666-1

Keywords