Table 2 Estimates of the average accuracy of the raters with two different methods: comparison with the MRNet reference (which can be thought of as the raters’ “actual” accuracy); and comparison with the majority of the (other) raters. The difference between the two methods was not statistically significant at α=0.05 (p=0.68) according to a χ2 test for proportions

From: As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI

Estimation method Rater’s Average Accuracy (95% CI)
Actual accuracy 0.81 [0.80, 0.82]
Accuracy wrt majority 0.87 [0.86, 0.88]