Machine learning prediction models for different stages of non-small cell lung cancer based on tongue and tumor marker: a pilot study

Objective To analyze the tongue feature of NSCLC at different stages, as well as the correlation between tongue feature and tumor marker, and investigate the feasibility of establishing prediction models for NSCLC at different stages based on tongue feature and tumor marker. Methods Tongue images were collected from non-advanced NSCLC patients (n = 109) and advanced NSCLC patients (n = 110), analyzed the tongue images to obtain tongue feature, and analyzed the correlation between tongue feature and tumor marker in different stages of NSCLC. On this basis, six classifiers, decision tree, logistic regression, SVM, random forest, naive bayes, and neural network, were used to establish prediction models for different stages of NSCLC based on tongue feature and tumor marker. Results There were statistically significant differences in tongue feature between the non-advanced and advanced NSCLC groups. In the advanced NSCLC group, the number of indexes with statistically significant correlations between tongue feature and tumor marker was significantly higher than in the non-advanced NSCLC group, and the correlations were stronger. Support Vector Machine (SVM), decision tree, and logistic regression among the machine learning methods performed poorly in models with different stages of NSCLC. Neural network, random forest and naive bayes had better classification efficiency for the data set of tongue feature and tumor marker and baseline. The models’ classification accuracies were 0.767 ± 0.081, 0.718 ± 0.062, and 0.688 ± 0.070, respectively, and the AUCs were 0.793 ± 0.086, 0.779 ± 0.075, and 0.771 ± 0.072, respectively. Conclusions There were statistically significant differences in tongue feature between different stages of NSCLC, with advanced NSCLC tongue feature being more closely correlated with tumor marker. Due to the limited information, single data sources including baseline, tongue feature, and tumor marker cannot be used to identify the different stages of NSCLC in this pilot study. In addition to the logistic regression method, other machine learning methods, based on tumor marker and baseline data sets, can effectively improve the differential diagnosis efficiency of different stages of NSCLC by adding tongue image data, which requires further verification based on large sample studies in the future. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-023-02266-5.


Introduction
The International Agency for Research on Cancer (IARC) released the most recent global cancer data [1] in 2020, revealing that lung cancer is the most common cancer in men, the second most common cancer in women after breast cancer, and the leading cause of cancer death.Non-small cell lung cancer (NSCLC) is the most common histological type of lung cancer, accounting for 80-85% of all lung cancer cases, with high morbidity and mortality [2].Lung cancer patients have a 5-year survival rate of 10-20%, and its prevention, screening, treatment, and reduction of the economic burden associated with lung cancer treatment have become an urgent problem to be solved [3].Early detection, diagnosis, and treatment of NSCLC are critical for improving patient prognosis and survival rates.Different clinical stages of NSCLC patients receive different treatment methods, and their prognosis varies.Surgery is an effective treatment option for early lung cancer.Surgery can also be used to reduce the tumor burden in patients with locally advanced lung cancer, in conjunction with postoperative radiotherapy and chemotherapy, and the survival period can be effectively extended [4].Treatment options for patients in advanced stages are limited due to tumor metastasis.Traditional Chinese Medicine (TCM) has specific characteristics and benefits in the treatment of advanced lung cancer.It can effectively reduce symptoms, stabilize tumors, and improve patients' quality of life [5].Therefore, it is of great significance to take effective methods to evaluate the clinical stage of NSCLC patients.At present, clinical staging of NSCLC primarily includes imaging and histological methods, with histological examination serving as the gold standard for NSCLC staging diagnosis.However, this method is invasive, complicated, and costly, causing harm to patients and even leading to tumor proliferation, and its use is limited.Therefore, finding a non-invasive, safe, reliable, and simple staging diagnosis approach for NSCLC is critical.
Tongue diagnosis is an important part of TCM diagnosis, and is one of its distinctive features.Studies have shown that the appearance of the tongue can reflect physiological and pathological changes in the body to some extent, and is closely related to a person's overall health status.Research shows that there is a correlation between the tongue characteristics of patients with Chronic Kidney Disease (CKD) and the disease itself.By evaluating the tongue image features of CKD patients using an automated tongue diagnosis system, valuable information can be provided to clinical doctors, facilitating early detection and diagnosis of CKD [6].The color, shape, thickness of the tongue coating, as well as the color of the tongue body, have certain correlations with the development of diabetes.Li Jun et al. have shown that tongue image features can significantly improve the prediction accuracy of diabetes risk models [7].Tongue diagnosis has clinical potential in predicting the risk and severity of gastroesophageal reflux disease (GERD).It is expected to serve as an initial screening indicator for upper gastrointestinal diseases and assist doctors in non-invasive early diagnosis of GERD [8].In addition, research indicates that the color value and thickness of the tongue coating during menstruation in patients with primary dysmenorrhea (PD) are significantly lower than those in the control group, the tongue image features obtained by computerized tongue image analysis system can serve as an auxiliary method for syndrome differentiation, evaluating therapeutic effects, and predicting prognosis in PD [9].With the advancement of TCM diagnostic information technology in recent years, the modernization of TCM has ushered in new opportunities and challenges.In clinical practice, a variety of tongue diagnostic instruments are widely used, and the objective data acquisition and analysis technology based on standardized tongue diagnosis has gradually matured.The key technologies of tongue diagnosis include tongue body and tongue coating separation techniques, as well as feature extraction techniques.In modern tongue diagnosis research, digital image processing technology is widely used to extract features of color and texture, and various machine learning methods are used for analysis, all of which have achieved good results [10][11][12][13].Wang X et al. [14] established a diagnostic model of tooth mark tongue based on a deep convolutional neural network, and the model has good validity and generalization, providing an objective and convenient computer-assisted tongue diagnosis method for tracking disease progression and evaluating efficacy from the perspective of informatics.Xu Q et al. [15] segmented tongue image based on deep neural network and established a multi-task joint learning model.Li J et al. [7] established a diabetes risk warning model based on tongue image by stacking model and ResNet50 model, and the results showed that the model established by combining tongue image data with machine learning had high classification efficiency.Digital tongue diagnosis research has become one of the focus of the modern research of TCM, along with the rapid development of artificial intelligence technology, different machine learning methods, such as logistic regression [16], support stages of NSCLC by adding tongue image data, which requires further verification based on large sample studies in the future.
vector machine (SVM) [17], neural network [12], and other data mining methods have been widely used in medical research.Quantitative diagnosis of information is carried out through various mathematical models, which has promoted the development of TCM information-based intelligent diagnosis.
Serum tumor marker detection is an examination method for patients which has great clinical value in early diagnosis, efficacy evaluation, and prognosis judgment of lung cancer.Currently, it has been widely used in clinical research and plays an important role in monitoring recurrence and metastasis.The clinical value of carcinoembryonic antigen (CEA), carbohydrate antigen 125 (CA-125), carbohydrate antigen 199 (CA-199), alpha-fetoprotein (AFP), neuron-specific enolase (NSE), cytokeratin 19 fragment (CYFRA21-1), and carbohydrate antigen 15 − 3 (CA15-3) in lung cancer has been widely concerned [18].Studies have shown that serum ferritin (SF), squamous cell carcinoma-associated antigen (SCC), NSE, CEA, and CYFRA21-1 were highly expressed in NSCLC and have important clinical value in evaluating clinicopathology, the combined detection of these 5 tumor markers can improve the diagnostic value of NSCLC [19].Zhang H et al. [20] established a prediction model for EGFR mutation in NSCLC based on tumor marker and CT feature, and the model results showed that the prediction model combining tumor marker and CT feature was more accurate than the prediction model using tumor marker or CT feature alone.
Based on this, this pilot study is primarily based on the tongue feature and tumor marker of NSCLC, analyzing the tongue feature of NSCLC in different stages, the correlation between tongue feature and tumor marker, and attempting to establish NSCLC prediction models of different stages based on tongue feature and tumor marker using different machine learning methods, and trying to explore a new, non-invasive, and efficient method for diagnosing NSCLC of different stages, in order to effectively promote the early detection, diagnosis and treatment of NSCLC, as well as improve the survival rate and prognosis of patients with NSCLC.This was an exploratory pilot study, mainly focused on assessing the feasibility of the methodological establishment, emphasizing the accuracy and reliability of data collection, description, and analysis, and providing data and references for subsequent in-depth studies.

Study design and subjects
From July 2020 to March 2022, 324 lung cancer patients at Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine's department of oncology were collected, and their case information, including medical record number, name, gender, medical history information, diagnosis information, and so on, were collected separately.Ethical approval was obtained from the Longhua Hospital affiliated to Shanghai University of Traditional Chinese Medicine Hospital Ethics Committee (registration number 2020LCSY083).Professionally trained graduate students collected standardized tongue image and tumor marker data.A total of 219 NSCLC patients were included in this study, including 109 patients with stages I, II, and III combined into the non-advanced NSCLC group and 110 patients with stage IV in the advanced NSCLC group.All patients were informed and signed informed consent after receiving a clear pathological diagnosis.The research flow chart was shown in Fig. 1.

Diagnostic criteria
According to the "Clinical Practice Guidelines for Lung Cancer Screening" issued by the National Comprehensive Cancer Network (NCCN) [21] and the fourth edition of the World Health Organization (WHO) "Classification of Lung Tumors" for histological classification of lung cancer [22,23].Exclusion criteria include: (1) patients who did not meet the inclusion criteria; (2) pregnant or breastfeeding patients; (3) patients with other malignant tumors; (4) patients with systemic acute and chronic infections; and (5) patients with mental illness, unwilling to cooperate, or poor study compliance.

Collecting clinical data TFDA-1 intelligent tongue diagnosis instrument
The Tongue Face Diagnosis Analysis-1(TFDA-1) digital tongue and face diagnosis instrument developed by the project team of the National Key Research and Development Program "TCM Intelligent Tongue Diagnosis System Research and Development" (NO: 2017YFC17033301) was used to collect the tongue images of patients, and the tongue image analysis system TDAS was used to analyze the tongue images to obtain the objective tongue features.The TFDA-1 digital tongue diagnosis instrument was shown in Fig. 2 (A) and Fig. 2 (B), and the corresponding tongue image analysis system TDAS was shown in Fig. 3.
All tongue images were collected by researchers with standardized training to ensure the standardization and accuracy of collection.Specific tongue image collection methods were as follows: (1) set the shooting parameters and sterilize the instrument with alcohol; (2) instruct the

Introduction to features of tongue diagnosis
The tongue color index is derived from four different color spaces: RGB, HSI, Lab, and YCrCb.R(Red), G(Green), and B(Blue) represent the three primary colors of red, green, and blue, with values ranging from 0 to 255."H" stands for Hue, and its angle range is [0, 2π], which means that the angle of red is 0, the angle of green is 2π/3, the angle of blue is 4π/3, and "S" stands for saturation."I" stands for intensity; "L" stands for lightness, and its value ranges from 0 to 100, representing pure black to pure white, "a" stands for the green-red axis, its value range is [127, -128], "b" stands for the blue-yellow axis, its value range is [127, -128]; "Y" stands for the luminance, which ranges from 16 to 235, and "Cr" and "Cb" denote chrominance, where Cr denotes the difference between the red part of the RGB input signal and the brightness value of the RGB signal, that is, the degree of offset of the current color to red. and Cb represents the difference between the blue part of the RGB input signal and the brightness value of the RGB signal, that is, the degree of offset of the current color to blue; Cr and Cb have a value range of 16 to 240.CON (Contrast), ASM (Angular Second Moment), ENT (Entropy), and MEAN are the tongue texture indexes; perAll and perPart are the tongue coating indexes, where perAll is the ratio of the tongue coating area to the total tongue area and perPart is the ratio of the coating area to the uncoated tongue area.The prefix "TB-" refers to the tongue body, and "TC-" refers to tongue coating in this study.In order to better reflect the continuity of data and find the data regularity and real differences, this study rotated TB-H and TC-H by 180° and redefined the H value after rotation.
The tongue features were extracted automatically by computer batch processing, which had good stability.Data preprocessing in this paper was mainly for data outliers.This study we used the box-graph method to determine outliers, in which the interquartile range (IQR) was the difference between the third (upper) and first (lower) quartile (IQR = Q3-Q1).The upper and lower boundary line was also called outlier cutoff point, the upper outlier cutoff point was the upper quartile + 1.5IQR, the lower outlier cutoff point was the lower quartile − 1.5IQR.

Statistical analysis
SPSS 25.0 was used for statistical analysis, count data were expressed as percentage N (%), Pearson χ 2 /Fisher's exact test was used for comparison between groups, measurement data that followed normal distribution were expressed as "X ± SD", and those that did not conform were expressed as "Median ( P25, P75)", T-test analysis was performed for groups followed to normality and homogeneity of variance, and independent sample Kruskal-Wallis U test was performed for those not conforming, and correlation heat maps were performed by GraphPad Prism 8.0.All test results were two-tailed, P < 0.05 was considered statistically significant.

Modeling with machine learning methods
In this experiment, six machine learning classification algorithms were used to establish differential diagnosis models for different stages of NSCLC, namely decision tree, support vector machines(SVM), random forest, neural network, naive bayes and logistic regression.Classification models were built using six data sets: "baseline", "tumor marker", "tongue feature", "tongue feature and tumor marker, " and "tongue feature and tumor marker and baseline" from patients with different clinical stages of NSCLC, and make two-class predictions respectively, baseline data here mainly included age and sex.All machine learning processes were done on R package.In addition to random forest, all other machine learning methods have been processed with data scaled.The data were normalized using the method of Z-score.The preprocessing-data method of Z-score is described as the following Eq.( 6).
Where X denotes an element in a data vector, µ for mean value, and σ for standard deviation.This study we used ten-fold cross-validation to screen and confirm the best parameters for the model.The optimal parameters for each model can be found in Supplementary material 1.After confirming the optimal parameters, the parameters were locked, and we resampled 10 times, with each resampled testing set occupying 30% of the total sample and the training set occupying 70%, to ensure that the evaluation results were not accidental.Then, the 10 evaluation results were averaged to reduce errors caused by unreasonable selection in the test set.The modeling was repeated 10 times for each data set, and the "Mean (Standard Deviations)" of the 10 classification results was used to describe the model's classification performance.
As evaluation indexes, Accuracy, Precision, F1-score, Sensitivity, and Specificity were used.AUC was the area under the ROC curve, with values ranging from 0.5 to 1, the higher the value, the better the classification effect.Sensitivity, also known as true positive rate, assesses the sensitivity of diagnostic methods to diseases, the greater the sensitivity, the lower the likelihood of a missed diagnosis.Specificity is also known as the true negative rate, the higher the specificity, the greater the likelihood of a correct diagnosis.Accuracy indicates the proportion of the number of correctly classified test instances to the total number of test instances.Precision is the ratio of the number of positive cases correctly classified to the number of positive cases classified.F1-score is a harmonic average based on Recall and Precision, which is to evaluate the Recall and Precision comprehensively.The evaluation indexes were shown in the following formulas: P recision = T P T P + F P × 100% (2)

Baseline data
The baseline data of NSCLC of the two groups with different stages were shown in Table 1.
The gender distribution of the two groups was more male than female in the non-advanced NSCLC group and more female than male in the advanced NSCLC group, and the gender difference between the two groups was statistically significant.In addition, the advanced NSCLC group was older than the non-advanced NSCLC group, and the age difference between the two groups was statistically significant.

Statistical analysis of tongue feature
The statistical analysis results of tongue features in different clinical stages of NSCLC were shown in Table 2.In order to facilitate the observation of the distribution of tongue features with statistically significant differences, GraphPad Prism 8.0 software was used to draw its violin diagram, as shown in Fig. 4.
According to the statistical results, there were statistically significant differences in tongue features between the non-advanced NSCLC and the advanced NSCLC group, and the indexes were TB-B, TB-H, TC-H, TB-L, TB-a, TC-b, TB-Y and TC-Cb, respectively.There was no statistically significant difference in the texture index and tongue coating index between the two groups.

Correlation analysis of tongue feature and tumor marker
In order to further understand the correlation between the index of TCM and Western medicine in patients with different stages of NSCLC, and whether there was any difference in the correlation between the indexes of TCM and Western medicine in patients with different stages, the study analyzed the correlation between tongue feature and tumor marker in the non-advanced NSCLC and the advanced NSCLC group.A total of 107 patients (66 in the non-advanced NSCLC and 41 in the advanced NSCLC group) had complete tongue feature and tumor marker.The indexes of correlation coefficient ≥ 0.3 were used to make correlation heat maps, the statistical results and the correlation heat map of the non-advanced NSCLC group were shown in Table 3; Fig. 5, respectively.Statistical results and the correlated heat map of the advanced NSCLC group were shown in Table 4; Fig. 6 respectively.

Classification model of NSCLC with different clinical stages
Five machine learning classifiers, logistic regression, SVM, random forest, naive bayes, and neural network were used to establish non-advanced NSCLC and advanced NSCLC classification models based on tongue feature, tumor marker, and baseline data.Each dataset was sampled 10 times for each classifier, and the "Mean (Standard Deviations)" of each evaluation index was taken to evaluate the model performance.The classification results of the models were shown in Table 5.
ROC curves of the models based on the six data sets were shown in Fig. 7.
Gini scores were used to rank the importance of variables.For variables modeled based on random forest method, the importance ranking of the first 15 variables was shown in Fig. 8.
The neural network model based on "tongue feature and tumor marker and baseline" data set had the best classification efficiency, and the confusion matrix of its model was shown in Table 6.The results showed that different classifiers had different classification effectiveness for different data sets during modeling.Among the machine learning methods tested, SVM, decision tree, and logistic regression performed poorly in models with various stages of NSCLC.In the tumor marker and tongue feature data set, the decision tree performed best, with a model accuracy of 0.658 ± 0.072 and an AUC value of 0.658 ± 0.104.SVM performed best in the baseline and tongue feature and tumor marker data sets, with a model accuracy of 0.736 ± 0.074 and an AUC value of 0.655 ± 0.056.Logistic regression performed best in the baseline data set, with a model accuracy of 0.627 ± 0.054 and an AUC value of 0.667 ± 0.065.Neural network, random forest and naive bayes had better classification efficiency for the data set of tongue feature and tumor marker and baseline.The classification accuracies of the models were 0.767 ± 0.081, 0.718 ± 0.062, and 0.688 ± 0.070, respectively, and the AUCs were 0.793 ± 0.086, 0.779 ± 0.075, and 0.771 ± 0.072.

Analysis of tongue features in different stages of NSCLC
There were statistically significant differences in tongue features between the non-advanced NSCLC and the advanced NSCLC groups.The indexes mainly focused on the color space index, which was TB-B, TB-H, TC-H, TB-L, TB-a, TC-b, TB-Y, and TC-Cb, respectively.The differences between the two groups were mainly reflected in the intensity and hue of tongue body, the hue of tongue coating, the color of tongue body and tongue coating.The TB-B, TB-H, TB-L, TB-Y, TC-H, and TC-b indexes in the non-advanced NSCLC group were higher than those in the advanced NSCLC group, indicating that the tongue body of the non-advanced NSCLC group was brighter than that of the advanced NSCLC group, and the tongue coating was more yellow.The advanced NSCLC group had higher TB-a and TC-Cr levels than the nonadvanced NSCLC group, indicating that the tongue body of the advanced NSCLC group was more reddish purple or cyanotic.The texture index and tongue coating index did not differ statistically between the non-advanced NSCLC and the advanced NSCLC groups, indicating that the tongue texture feature and tongue coating index of different stages of NSCLC could not be distinguished.

Correlation analysis of tongue feature and tumor marker in different stages of NSCLC
Tongue feature is TCM data, while tumor marker is Western medicine data.Due to the differences in concepts and methods of TCM and Western medicine, the relationship between them has not been systematically established.The essence of the relationship between TCM and Western medicine is the mechanism of disease and syndrome.The correlation analysis of tongue feature and tumor marker in this study will aid in the establishment of a bridge between TCM and Western medicine, allowing for a deeper understanding of the internal mechanism of disease and syndrome, as well as improve the accuracy of disease and syndrome diagnosis.
According to the findings of the study, there was a link between TCM and Western medicine in patients with various stages of NSCLC.In the advanced NSCLC group, the number of indexes with statistically significant correlations between tongue feature and tumor marker was significantly higher than in the non-advanced NSCLC group, and the correlations were stronger.
Although some studies have linked CA125, CA15-3, CA19-9, CA72-4, and CYFRA21-1 to lung adenocarcinoma metastasis [24], no studies have confirmed that CA125 can be used as a prognostic marker, and only a small number of studies have discussed its prognostic value in advanced cancer [25], other studies have shown that NSE is an important prognostic factor for advanced locally metastatic NSCLC [18,26].CA125 was significantly correlated with TB-H, TC-H, TC-b, and TC-Cb in the non-advanced NSCLC group, CA72-4 was significantly correlated with TB-H, TC-H, and TC-b in the advanced NSCLC group, and CA72-4 was significantly correlated with TB-H, TC-H, TC-b, and TC-Cb in the advanced NSCLC group.CA125 was found to be significantly correlated with TB-G and TB-Y, indicating that CA125 and CA72-4 were related to tongue brightness and yellow tongue coating in both groups of NSCLC patients.The difference was that in the non-advanced NSCLC group, CA125 was correlated with TC-Cb, whereas in the advanced NSCLC group, CA72-4 was correlated with TC-Cb, and TC-Cb was a typical characteristic index of the purple tongue.Furthermore, in the advanced NSCLC group, NSE was significantly correlated with TB-a, TB-L, and TB-Y, indicating that NSE was associated with tongue brightness and redness in patients with advanced NSCLC.

Analysis of modeling in different stages of NSCLC
The modernization research of diagnostic technology and artificial intelligence technology have greatly promoted the objectification, standardization, and intelligent research of TCM.Through continuous deep learning of big data, machine learning and data mining methods can provide better clinical diagnosis, efficacy evaluation, and prediction models, as well as new methodological support for disease and syndrome research.The organic integration of TCM disease and syndrome research and artificial intelligence technology can effectively promote the development of a TCM intelligent clinical decisionmaking and efficacy evaluation model with significant practical implications and promising application prospects [27].This study employed five classifiers, logistic regression, SVM, random forest, naive bayes, and neural network, which were based on tongue feature, tumor marker, tongue feature and tumor marker, tongue feature and tumor marker and baseline data to establish NSCLC classification and diagnosis models of different stages.The results showed that each classifier based solely on tongue feature and solely on tumor marker produced poor classification results, and the model had a high rate of missed diagnosis.All models performed well when combined with tongue feature, tumor marker, and baseline data, implying that single tongue feature and single tumor marker of NSCLC of different stages might not be able to classify or might be affected by the sample size, we should combine multidimensional data and conduct a comprehensive analysis to obtain better classification results when diagnosing NSCLC of different stages.The classifier of neural network based on the tongue feature and tumor marker and baseline data performed best when predict NSCLC at different stages, which suggested that we should give priority to neural network model in differential diagnosis.

Conclusions
There were statistically significant differences in the tongue feature of NSCLC patients at different clinical stages.In advanced NSCLC patients, there was a stronger correlation between tongue feature and tumor marker.Due to the limited information, single data sources including baseline, tongue feature, and tumor markers cannot be used to identify the different stages Note: "0" represents the non-advanced NSCLC, and "1" represents the advanced NSCLC.
of NSCLC.In addition to the logistic regression method, other machine learning methods, based on tumor marker and baseline data sets, can effectively improve the differential diagnosis efficiency of different stages of NSCLC by adding tongue image data.However, further verification in future studies with large sample is still needed.

F 1 = 2 ×
P recision × Sensitivity P recision + Sensitivity (5) TP (True Positive) refers to a positive sample predicted as positive by the model.TN (True Negative) refers to a negative sample predicted by the model to be negative.FP (False Positive) refers to a negative sample predicted to be positive by the model; FN (False Negative) refers to a positive sample predicted to be negative by the model.

Fig. 5 Fig. 4
Fig. 5 Heat map of correlation between tongue feature and tumor marker in the non-advanced NSCLC group

Fig. 6
Fig. 6 Heat map of correlation between tongue feature and tumor marker in the advanced NSCLC group

Fig. 8 Fig. 7
Fig. 8 Variable importance based on Random Forest

Table 1
Baseline data table

Table 3
Correlation analysis between tongue feature and tumor marker in the non-advanced NSCLC group *P < 0.05, **P < 0.01

Table 4
Correlation analysis between tongue feature and tumor marker in the advanced NSCLC group *P < 0.05, **P < 0.01

Table 5
Classification results of each model based on different data sets [Mean (Standard Deviations)]

Table 6
Confusion matrix of the neural network model