Bayesian updating and sequential testing: overcoming inferential limitations of screening tests

Background Bayes’ theorem confers inherent limitations on the accuracy of screening tests as a function of disease prevalence. Herein, we establish a mathematical model to determine whether sequential testing with a single test overcomes the aforementioned Bayesian limitations and thus improves the reliability of screening tests. Methods We use Bayes’ theorem to derive the positive predictive value equation, and apply the Bayesian updating method to obtain the equation for the positive predictive value (PPV) following repeated testing. We likewise derive the equation which determines the number of iterations of a positive test needed to obtain a desired positive predictive value, represented graphically by the tablecloth function. Results For a given PPV (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho$$\end{document}ρ) approaching k, the number of positive test iterations needed given a prevalence of disease (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi$$\end{document}ϕ) is: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_i =\lim _{\rho \rightarrow k}\left\lceil \frac{ln\left[ \frac{\rho (\phi -1)}{\phi (\rho -1)}\right] }{ln\left[ \frac{a}{1-b}\right] }\right\rceil \qquad \qquad (1)$$\end{document}ni=limρ→klnρ(ϕ-1)ϕ(ρ-1)lna1-b(1) where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_i$$\end{document}ni = number of testing iterations necessary to achieve \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho$$\end{document}ρ, the desired positive predictive value, ln = the natural logarithm, a = sensitivity, b = specificity, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi$$\end{document}ϕ = disease prevalence/pre-test probability and k = constant. Conclusions Based on the aforementioned derivation, we provide reference tables for the number of test iterations needed to obtain a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho (\phi )$$\end{document}ρ(ϕ) of 50, 75, 95 and 99% as a function of various levels of sensitivity, specificity and disease prevalence/pre-test probability. Clinical validation of these concepts needs to be obtained prior to its widespread application.


Bayes' theorem
Bayes' theorem describes the probability of an event, based on prior knowledge of conditions that are related to the event [1]. As a principle, it follows simply from the axioms of conditional probability [2]. Mathematically speaking, the equation translates to the conditional probability of an event A given the presence of an event or state B. Indeed, as per Bayes' theorem, the above relationship is equal to the probability of event B given event A, multiplied by the ratio of independent probabilities of event A to event B [2]. Simply stated, the equation is written as follows: where A, B = events, P(A) and P(B) are the independent probabilities of A and B, P(A|B) = probability of A given B is true and P(B|A) = probability of B given A is true.
If we use T +/− as either a positive or negative test, and denote D +/− as the presence (+) or absence (−) of disease then we can use Bayes' theorem to calculate the positive predictive value (PPV) of a screening test by asking the following: given a positive screening test result, what is the probability that such individual does in fact have the disease in question? In other words, what is the probability that a positive test is a true positive? [5].
Since the denominator in Eq. (6) represents the probability of having a positive test regardless of context, then it follows logically that this variable should equal to the sum of true positives and false positives.
Otherwise stated: Furthermore, given that (1) the probability of having a positive test in an individual with the disease is a test's sensitivity, and (2) the probability of being disease-free is equal to the complement of the prevalence, and (3) the false positive rate is equal to the complement of the specificity (true negative rate), Bayes' theorem provides a formal way to obtain the PPV, ρ(φ) , as a function of the prevalence φ , as follows [6]: where ρ(φ) = PPV, a = sensitivity, b = specificity and φ = prevalence.

Sequential testing
Based on the aforementioned considerations, a problem arises. Since the vast majority of medical conditions and disorders amenable to screening have prevalences that are low in the general population, we deduce that a significant proportion of positive screening tests conducted in modern practice are false positives, which can bring about significant adverse administrative, social, health and psychological consequences [7]. As such, this insurmountable fact about the nature of screening begs the question -is there anything to be done to reduce the number of incorrect diagnoses that arise given the aforementioned limitation? [8]. Intuitively, as per Eq. (8), the development of novel screening tests with better parameters would reduce the influence of prevalence in the equation [9]. But such endeavour is costly and most often unattainable in the short term. Given human error, variations in patient status/characteristics, sampling error and technological limitations, the most intuitive method to ensure a correct diagnosis is made on a patient is that of sequential, or repetitive, testing [10]. This phenomenon is technically known as Bayesian updating [11]. While this is a general term that is used when any new information is added onto a system which was previously analysed, it too applies when the same test is run serially to improve its detection rate [12].

Conditional probabilities
Conditional probabilities relate the likelihood of an occurrence given that another related event has already taken place [13]. That initial condition is termed prior probability or in certain circumstances the pre-test probability. When we account for those prior probabilities, and analyse a screening test in that context, we obtain posterior probabilities. In general, with sequential Bayesian estimation, one can use the previous posterior as the current prior probability [14]. As such, in the case of sequential testing where D represents the presence of disease, T represents one initial positive test and TT represents two consequent positive tests, Bayes' theorem takes on the form:

General derivation
The expression of Eq. (2) in generalized terms is the following: where, • P(D) is the prior probability, or the initial degree of belief in D • P(¬D ) is the corresponding initial degree of belief in 'not-D' , where P(¬D ) = 1-P(D) • P(T|D) is the conditional probability or likelihood of T given that proposition D is true.
It thus follows that as n → ∞ , at some iteration n x the above equation converges as a function of P(T|D): In terms of screening parameters, the above equation therefore becomes: where n is the number of test iterations.
To determine the number of tests needed to obtain a desired predictive value, we need to first isolate n by rearranging Eq. (14) as follows: Re-arranging the terms: Factoring out the sensitivity a: By the fraction rule of exponents: Applying the natural logarithm (ln) to both sides: Via the power rule, we obtain: From the above relationship, we can isolate n: Finally, simplifying the expression: From this expression we can calculate the limit as ρ(φ) goes to 1, the ultimate predictive value: However, the lim ρ(φ)→1 n does not exist, since ln(φ-1/0) is undefined. In clinical terms, this translates to the fact that in all but one special case where disease prevalence φ is 1, no test can have a perfect positive predictive value.
To overcome this limitation, we render the generalized form of the above equation, and we denote ρ(φ) as ρ to obtain: where ρ = desired positive predictive value to achieve, n i = number of testing iterations necessary, a = sensitivity, b = specificity, φ = disease prevalence and k = constant.

Positive likelihood ratio: LR+
From Eq. (24) we observe that the number of serial tests n needed to attain a given PPV value is inversely proportional to ln a 1−b . The latter expression in brackets represents what is known as the positive likelihood ratio (+LR) [15]. A likelihood ratio (LR) for a dichotomous test is defined as the likelihood of a test result in patients with the disease divided by the likelihood of the test result in patients without the disease. Otherwise stated, the positive likelihood ratio (+LR) gives the change in the odds of having a diagnosis in patients with a positive test [16]. For example, a LR+ close to 1 means that the test result does not change the likelihood of disease or the outcome of interest appreciably. The more the likelihood ratio for a positive test (LR+) is greater than 1, the more likely the disease or outcome [15]. It would thus follow that the greater the likelihood ratio of a test the lower number of sequential tests needed to achieve a particular PPV.

Properties of sequential testing
Since the natural logarithmic function is continuous and increasing throughout its domain (0,∞ + ], it follows that as ln a 1−b increases, the number of test iterations n needed to achieve a desired positive predictive value decreases as per Eq. (24). Tables 1, 2, 3 and 4 provide different reference values of n as a function of the prevalence φ and the sensitivity and specificity for a ρ of 99, 95, 75 and 50%, respectively. Figure 2 provides a graphic representation of the n i , which given its geometric shape we define as the tablecloth function. The aforementioned relationship holds for a number of identical sequential tests that are positive until the n i iteration reaches the desired positive predictive value. For severe conditions whose treatment is rather innocuous but whose potential consequences are severe, a lower threshold to initiate treatment might be acceptable. Conversely, a condition whose consequences are less severe but whose treatment may lead to significant morbidity might benefit from a higher degree of diagnostic certainty prior to initiating therapy or proceeding to an invasive diagnostic test. Given the extremes of the domains of each predictive function as per Eqs. (8) and (11), and the fact that most conditions have a prevalence well below 20% then it follows that if prior to reaching the desired positive predictive value, a negative test result is obtained, the individual is more likely to be disease-free, since σ (φ) ≫ ρ(φ) at a low prevalence level of disease (Fig. 1). In other words, the intersection between the NPV and PPV as per the following equation hovers around 40-60% prevalence for values of sensitivity and specificity greater than 50% (clinically useful ones). Below this point the NPV > PPV.
It is critical to bear in mind that testing might be done in a representative sample of a population to estimate the rate of asymptomatic carriage; in this case the prevalence is meaningful. But testing is generally done in subjects in whom a condition is suspected, either because they have a known exposure or because they have various levels of symptomatology. In such cases the population prevalence is irrelevant, and it would be more appropriate to refer to prior or pre-test probability instead.

Clinical implications of n i
From the formula in (24), we learn that the number of iterations is inversely proportional to the ratio of sensitivity over the complement of the specificity -which represents the +LR [15].
However, the denominator of this equation is itself the natural logarithm of a fraction. It follows that for certain values of sensitivity a and specificity b, the ratio of [ a 1−b ] is < 1. Since the natural logarithm of x follows the following range properties: We deduce that for values of a and b such that: the denominator of the n i function will be negative and so will thus be n i .
Though it is unlikely that a test whose sensitivity and specificity add to less than one would be often used clinically [17], this idea does lead to a fundamental understanding about the n i equation. What does it mean to have a negative number of tests needed to achieve a given (25) Clinically it bears no meaning, since one would, by definition, need at least a single test to have a positive result. It thus follows that for the above equation to be of clinical use, we need to take its ceiling function [18], such that ⌈x⌉ is the unique integer satisfying ⌈x⌉ -1 < x < ⌈x⌉: In practical terms, the ceiling function assigns the nearest higher positive integer to a number [18]. For the case of screening tests, it implies that a whole rather than a decimal number of tests (rounded to the nearest, higher, positive integer) ought to be performed. In other words, the ceiling function in this context serves to suggest that when say, 2.8 tests are needed to achieve a desired PPV, one is better off doing 3 tests given the discrete nature of tests. Doing 3 would by definition guarantee that one is above the desired threshold, but doing 2 tests would yield a lower PPV than that desired.

Independence of serial testing
From the concepts described in this work, one might easily suggest that simply repeating the same screening test multiple times increases confidence that a positive result is a true positive. Setting aside the administrative and feasibility concerns, while such an interpretation is theoretically correct, the reality ought to be more nuanced, as there are confounding factors that might make the same result recur upon serial testing on the same patient. Indeed, repeating the same test under the same conditions, in a similar time-frame, perhaps even by the same interpreter/provider may not constitute a true independent observation [19]. Likewise, temporally smooth fluctuations in the biological parameters being measured imply there should be a temporal separation between subsequent tests. Otherwise stated, the final results are valid only if the probability of receiving subsequent tests is independent of the result of those tests (i.e., we would continue testing those with negative tests in addition to those with positive tests). As such, the primary use of the tables and notions herein described ought to be to contextualize the screening result and broaden the clinical judgement of the provider with regards to the reliability of the screening process. A more natural and reliable method to enhance the positive predictive value would be, when available, to use a different test with different parameters altogether after an initial positive result is obtained [19].

Strengths and limitations
The work hereby presented is largely theoretical in nature. As such, it carries several strengths, notably, (1) the complete derivation of the resulting equation and tablecloth function from first principles, (2) the use of mathematical language that translates well into clinical scenarios (use of limits to ensure attainable PPV values and use of the ceiling function to achieve a whole number of tests necessary), (3) the development of easily accessible reference tables for clinicians to use and (4) the novelty of the work presented-as to the best of our knowledge, the idea of sequential testing and Bayesian updating with a single screening test has not previously been explored to a great extent [20]. Nevertheless, the present work has some limitations as well, notably: (1) the lack of clinical data to validate results, and (2) the concerns regarding its clinical application given the potential issues with obtaining independent testing samples. Despite these limitations, the purpose of this manuscript is to raise awareness about the poor predictive value of many screening tests given the Bayesian limitations of the screening process and to contextualize the way the predictive value can be enhanced with a single repeated test, even in theory. Such an equation can contextualize the predictive ability of a single test -and may provide additional ways to communicate risk or likelihood of disease in the clinical counselling of patients.

Conclusion
In this manuscript, we describe a mathematical model to determine whether sequential testing with a single test overcomes the Bayesian limitations of screening and thus improves the reliability of screening tests. We show that for a desired positive predictive value of ρ that approaches k, the number of positive test iterations n i is inversely proportional to the natural logarithm of the positive likelihood ratio (LR+). This clinical utility of this equation would be best observed in conditions with low pre-test probability where single tests are insufficient to achieve clinically significant predictive values and likewise, in clinical scenarios with a high pre-test probability where confirmation of disease status is critical. When independent observations are difficult to obtain, serial testing with a different test will likewise enhance the positive predictive value [19] (Fig. 2).