A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology

Salem, Hesham; Soria, Daniele; Lund, Jonathan N.; Awwad, Amir

doi:10.1186/s12911-021-01585-9

BMC Medical Informatics and Decision Making

Table 9 Qualitative assessment of urological Expert Systems

From: A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology

Art	Mdl	Validation methods	Credibility	Evaluation	Validation	Verification	Strength and bias
[27]	RBR	Patients' evaluation	No	Yes	Yes	No	Only qualitative evaluation
[18]	RBR	Blinded comparison against 4 experts with independent experts rating and 3 centres RCT pilot trial	Yes	Yes	Yes	No	Consideration of system evaluation with real time testing but small number
[21]	FRB	Improve practitioner accuracy	No	No	No	No	Insufficient info on development and validation
[15]	RBR	RCT reliability and validity by experts’ reviews	Yes	Yes	Yes	No	Small number in the study and short duration of follow up
[95]	ANN	ROC, Sp, Se	No	No	Yes	No	Small number for validation
[63]	FSS	ROC, Sp, Se	No	No	Yes	No	2 methods for validation, compared to experts and data
[143]	ANN	Compare to histology results	No	No	Yes	No	No comparison to human to demonstrate usability, no p value or CI
[103]	FNM	ROC, LR, RMS	No	No	Yes	No	p value calculated to compare all models
[103]	ANN	ROC, LR, RMS	No	No	Yes	No	p value calculated to compare all models, the effect of combining HK p53 with other variables
[102]	ANN	ROC, Sp, Se	No	No	Yes	No	No p value
[76]	ANN	Correlation co-efficient	No	No	Yes	No	Correlation co-efficient between expert and system? Kappa more accurate
[40]	FRB	Not published	No	No	No	No	Not validated
[68]	ANN	AUC ROC	No	No	Yes	No	p value calculated vs LR
[19]	RBR	Feedback from patients with no control group	No	Yes	No	No	No validation but user (patient evaluation)
[29]	FRB	Comparison to experts and non-experts	No	No	Yes	No	Expert as gold standard
[25]	RBR	PPV 62%, NPV 100% Se 100% Sp 33%	No	No	Yes	No	Small number, low specificity
[55]	ANN	ROC AUC then compare with LR, kappa stats	No	No	Yes	No	Multimodal of validation
[99]	ANN	ROC, Sp, Se	No	No	Yes	No	Not long term follows up
[43]	ANN	ROC (0.74 and 0.86)	No	No	Yes	No	TRUS finding from expert panel, human as gold standard
[105]	FNM	ROC, LR	No	No	Yes	No	p value calculated to compare all models
[105]	ANN	Kaplan Maier for survival	No	No	Yes	No	p for comparison ANN and FNM calculated
[145]	kNN	Comparison to other classifiers and ROC	No	Yes	Yes	No	Evaluated the usability of the product and was found to have less than significant effect
[129]	ANN	ROC Se, Sp	No	No	Yes	No	Sensitivity analysis of input variables
[22]	ANN	ROC 0.7, accuracy 79%	No	No	Yes	No	Compare to experts without accounting for human error
[85]	FRB	ROC Se, Sp	No	No	Yes	No	No user evaluation
[24]	FRB	Ac 0.76, Se 0.79, Sp 0.75	No	No	Yes	No	Expert as gold standard
[109]	ANN	ROC Compare to LR	No	No	Yes	No	CI calculated
[12]	FRB	Ac 0.93, Se 0.97, Sp 0.99	No	No	Yes	No	Expert as gold standard
[110]	ANN	Prediction error percent	No	No	Yes	No	Experimental results
[48]	SVM	ROC AUC	No	No	Yes	No	P value calculated to compare all models
[146]	ANN	Overlap measure (segmented by experts)	No	No	Yes	No	Expert as gold standard
[23]	ANN	Ac 0.84, Se 0.93, Sp 0.33	No	No	Yes	No	Experts verified data no account for human error
[30]	FNM	Accuracy 86.8%	No	No	Yes	No	Guidelines as gold standard
[20]	RBR	Evaluation by experts, 95 retrospective	No	No	Yes	No	Expert as gold standard, qualitative evaluation
[26]	HYB FUZZY ONT	Kappa vs experts, k = 0.89	No	No	Yes	No	Kappa limitation prospective, randomisation,
[16]	RBR	Se 0.95, Sp 0.72, Bayesian analysis S&S, usability of system by Likert scale (Cronbach’s alpha 0.9)	Yes	Yes	Yes	No	Full system evaluation but nurse as gold standard, no attempts to eliminate error
[91]	ANN	ROC AUC compare with Partin nomogram and LR	No	No	Yes	No	No correlation with user
[17]	FNM	Kappa vs experts, Se 0.95, Sp 0.92	No	No	Yes	No	Human expert as gold standard and no qualitative evaluation (weight of error)
[60]	ANN	Ac 60% (testing) 75% (training)	No	No	Yes	No	Compare to gold standard, Urodynamic
[117]	ANN	PPV 100%	No	No	Yes	No	No calculation of NPP and overall accuracy
[32]	FNM	Correlation coefficient = 0.99	No	No	Yes	No	Small number of cases for validation
[150]	FCM	OR 86.3%	No	No	Yes	No	Comparison with experts as gold standard than mapping to histology
[141]	ANN	ROC, Se 64.2%, Sp 59.6%, PPV 61.6%, NPV 62.2%, AUC 0.6852	No	No	Yes	No	Similar to urodynamic as research tool
[54]	FRB	None	No	No	Yes	No	No validation

All systems’ development was qualitatively assessed against the common industrial steps in the development pathway described by Okeefe and Benbasat. With exception of the system validation, the rest of the cycle was defective with no explanation. The validation had variable degree of strength with common application of the receiver operator characteristic for estimating the area under the curve for data driven systems

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com