 Research
 Open Access
 Published:
Bayesian updating and sequential testing: overcoming inferential limitations of screening tests
BMC Medical Informatics and Decision Making volume 22, Article number: 6 (2022)
Abstract
Background
Bayes’ theorem confers inherent limitations on the accuracy of screening tests as a function of disease prevalence. Herein, we establish a mathematical model to determine whether sequential testing with a single test overcomes the aforementioned Bayesian limitations and thus improves the reliability of screening tests.
Methods
We use Bayes’ theorem to derive the positive predictive value equation, and apply the Bayesian updating method to obtain the equation for the positive predictive value (PPV) following repeated testing. We likewise derive the equation which determines the number of iterations of a positive test needed to obtain a desired positive predictive value, represented graphically by the tablecloth function.
Results
For a given PPV (\(\rho\)) approaching k, the number of positive test iterations needed given a prevalence of disease (\(\phi\)) is:
\(n_i =\lim _{\rho \rightarrow k}\left\lceil \frac{ln\left[ \frac{\rho (\phi 1)}{\phi (\rho 1)}\right] }{ln\left[ \frac{a}{1b}\right] }\right\rceil \qquad \qquad (1)\)
where \(n_i\) = number of testing iterations necessary to achieve \(\rho\), the desired positive predictive value, ln = the natural logarithm, a = sensitivity, b = specificity, \(\phi\) = disease prevalence/pretest probability and k = constant.
Conclusions
Based on the aforementioned derivation, we provide reference tables for the number of test iterations needed to obtain a \(\rho (\phi )\) of 50, 75, 95 and 99% as a function of various levels of sensitivity, specificity and disease prevalence/pretest probability. Clinical validation of these concepts needs to be obtained prior to its widespread application.
Background
Bayes’ theorem
Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that are related to the event [1]. As a principle, it follows simply from the axioms of conditional probability [2]. Mathematically speaking, the equation translates to the conditional probability of an event A given the presence of an event or state B. Indeed, as per Bayes’ theorem, the above relationship is equal to the probability of event B given event A, multiplied by the ratio of independent probabilities of event A to event B [2]. Simply stated, the equation is written as follows:
where A, B = events, P(A) and P(B) are the independent probabilities of A and B, P(AB) = probability of A given B is true and P(BA) = probability of B given A is true.
Proof of Bayes’ theorem and its relationship to \(\rho (\phi )\)
Let us denote to independent events, A and B. The probability of events A and B both occurring is denoted axiomatically as \(P(A\cap B)\), and it equals to the conditional probability of A, P(A), times the probability of B given that A has occurred, P(BA) [3].
Likewise, since we have preconditionally established that both events are occurring, the index event order is commutative and could be switched to obtain:
Equating the terms, we obtain the formal Bayes’ theorem as follows [4]:
If we use T \(+/\) as either a positive or negative test, and denote D \(+/\) as the presence (+) or absence (−) of disease then we can use Bayes’ theorem to calculate the positive predictive value (PPV) of a screening test by asking the following: given a positive screening test result, what is the probability that such individual does in fact have the disease in question? In other words, what is the probability that a positive test is a true positive? [5].
Since the denominator in Eq. (6) represents the probability of having a positive test regardless of context, then it follows logically that this variable should equal to the sum of true positives and false positives.
Otherwise stated:
Furthermore, given that (1) the probability of having a positive test in an individual with the disease is a test’s sensitivity, and (2) the probability of being diseasefree is equal to the complement of the prevalence, and (3) the false positive rate is equal to the complement of the specificity (true negative rate), Bayes’ theorem provides a formal way to obtain the PPV, \(\rho (\phi )\), as a function of the prevalence \(\phi\), as follows [6]:
where \(\rho (\phi )\) = PPV, a = sensitivity, b = specificity and \(\phi\) = prevalence.
We have thus shown that the PPV, \(\rho (\phi )\), is a function of prevalence, \(\phi\). As the prevalence increases, the \(\rho (\phi )\) also increases and viceversa [7]. By the above equation, we obtain:
These limits denote the extremes of domain of the function \(\rho (\phi )\), notably [0,0], and [1,1]. Conversely, using the same derivation technique, the negative predictive value, \(\sigma (\phi )\) can be denoted as [6]:
The extreme limits of the domain of this function include [0, 1] and [1, 0].
Methods
Sequential testing
Based on the aforementioned considerations, a problem arises. Since the vast majority of medical conditions and disorders amenable to screening have prevalences that are low in the general population, we deduce that a significant proportion of positive screening tests conducted in modern practice are false positives, which can bring about significant adverse administrative, social, health and psychological consequences [7]. As such, this insurmountable fact about the nature of screening begs the question  is there anything to be done to reduce the number of incorrect diagnoses that arise given the aforementioned limitation? [8]. Intuitively, as per Eq. (8), the development of novel screening tests with better parameters would reduce the influence of prevalence in the equation [9]. But such endeavour is costly and most often unattainable in the short term. Given human error, variations in patient status/characteristics, sampling error and technological limitations, the most intuitive method to ensure a correct diagnosis is made on a patient is that of sequential, or repetitive, testing [10]. This phenomenon is technically known as Bayesian updating [11]. While this is a general term that is used when any new information is added onto a system which was previously analysed, it too applies when the same test is run serially to improve its detection rate [12].
Conditional probabilities
Conditional probabilities relate the likelihood of an occurrence given that another related event has already taken place [13]. That initial condition is termed prior probability or in certain circumstances the pretest probability. When we account for those prior probabilities, and analyse a screening test in that context, we obtain posterior probabilities. In general, with sequential Bayesian estimation, one can use the previous posterior as the current prior probability [14]. As such, in the case of sequential testing where D represents the presence of disease, T represents one initial positive test and TT represents two consequent positive tests, Bayes’ theorem takes on the form:
Results
General derivation
The expression of Eq. (2) in generalized terms is the following:
where,

P(D) is the prior probability, or the initial degree of belief in D

P(\(\lnot D\)) is the corresponding initial degree of belief in ’notD’, where P(\(\lnot D\)) = 1P(D)

P(TD) is the conditional probability or likelihood of T given that proposition D is true.
Bayesian updating formulation
Let \(T_1,T_2,...,T_n\) denote n independently conducted tests.
Then, we can find our expression for \(P(DT_1,T_2,...,T_n\)).
It thus follows that as \(n\rightarrow\) \(\infty\), at some iteration \(n_x\) the above equation converges as a function of P(TD):
In terms of screening parameters, the above equation therefore becomes:
where n is the number of test iterations.
To determine the number of tests needed to obtain a desired predictive value, we need to first isolate n by rearranging Eq. (14) as follows:
Rearranging the terms:
Factoring out the sensitivity a:
By the fraction rule of exponents:
Applying the natural logarithm (ln) to both sides:
Via the power rule, we obtain:
From the above relationship, we can isolate n:
Finally, simplifying the expression:
From this expression we can calculate the limit as \(\rho (\phi )\) goes to 1, the ultimate predictive value:
However, the \(\lim _{\rho (\phi ) \rightarrow 1} n\) does not exist, since ln(\(\phi\)1/0) is undefined. In clinical terms, this translates to the fact that in all but one special case where disease prevalence \(\phi\) is 1, no test can have a perfect positive predictive value.
To overcome this limitation, we render the generalized form of the above equation, and we denote \(\rho (\phi )\) as \(\rho\) to obtain:
where \(\rho\) = desired positive predictive value to achieve, \(n_i\) = number of testing iterations necessary, a = sensitivity, b = specificity, \(\phi\) = disease prevalence and k = constant.
Discussion
Positive likelihood ratio: LR+
From Eq. (24) we observe that the number of serial tests n needed to attain a given PPV value is inversely proportional to \(ln\left[ \frac{a}{1b}\right]\). The latter expression in brackets represents what is known as the positive likelihood ratio (+LR) [15]. A likelihood ratio (LR) for a dichotomous test is defined as the likelihood of a test result in patients with the disease divided by the likelihood of the test result in patients without the disease. Otherwise stated, the positive likelihood ratio (+LR) gives the change in the odds of having a diagnosis in patients with a positive test [16]. For example, a LR+ close to 1 means that the test result does not change the likelihood of disease or the outcome of interest appreciably. The more the likelihood ratio for a positive test (LR+) is greater than 1, the more likely the disease or outcome [15]. It would thus follow that the greater the likelihood ratio of a test the lower number of sequential tests needed to achieve a particular PPV.
Properties of sequential testing
Since the natural logarithmic function is continuous and increasing throughout its domain (0,\(\infty _+\)], it follows that as \(ln\left[ \frac{a}{1b}\right]\) increases, the number of test iterations n needed to achieve a desired positive predictive value decreases as per Eq. (24). Tables 1, 2, 3 and 4 provide different reference values of n as a function of the prevalence \(\phi\) and the sensitivity and specificity for a \(\rho\) of 99, 95, 75 and 50%, respectively. Figure 2 provides a graphic representation of the \(n_i\), which given its geometric shape we define as the tablecloth function. The aforementioned relationship holds for a number of identical sequential tests that are positive until the \(n_i\) iteration reaches the desired positive predictive value. For severe conditions whose treatment is rather innocuous but whose potential consequences are severe, a lower threshold to initiate treatment might be acceptable. Conversely, a condition whose consequences are less severe but whose treatment may lead to significant morbidity might benefit from a higher degree of diagnostic certainty prior to initiating therapy or proceeding to an invasive diagnostic test. Given the extremes of the domains of each predictive function as per Eqs. (8) and (11), and the fact that most conditions have a prevalence well below 20% then it follows that if prior to reaching the desired positive predictive value, a negative test result is obtained, the individual is more likely to be diseasefree, since \(\sigma (\phi ) \gg \rho (\phi )\) at a low prevalence level of disease (Fig. 1). In other words, the intersection between the NPV and PPV as per the following equation hovers around 40–60% prevalence for values of sensitivity and specificity greater than 50% (clinically useful ones). Below this point the NPV > PPV.
It is critical to bear in mind that testing might be done in a representative sample of a population to estimate the rate of asymptomatic carriage; in this case the prevalence is meaningful. But testing is generally done in subjects in whom a condition is suspected, either because they have a known exposure or because they have various levels of symptomatology. In such cases the population prevalence is irrelevant, and it would be more appropriate to refer to prior or pretest probability instead.
Clinical implications of \(n_i\)
From the formula in (24), we learn that the number of iterations is inversely proportional to the ratio of sensitivity over the complement of the specificity  which represents the +LR [15].
However, the denominator of this equation is itself the natural logarithm of a fraction. It follows that for certain values of sensitivity a and specificity b, the ratio of \([\frac{a}{1b}]\) is < 1. Since the natural logarithm of x follows the following range properties:
We deduce that for values of a and b such that:
the denominator of the \(n_i\) function will be negative and so will thus be \(n_i\).
Though it is unlikely that a test whose sensitivity and specificity add to less than one would be often used clinically [17], this idea does lead to a fundamental understanding about the \(n_i\) equation. What does it mean to have a negative number of tests needed to achieve a given \(\rho (\phi )\)? Clinically it bears no meaning, since one would, by definition, need at least a single test to have a positive result. It thus follows that for the above equation to be of clinical use, we need to take its ceiling function [18], such that \(\lceil x \rceil\) is the unique integer satisfying \(\lceil x \rceil\)  1 < x < \(\lceil x \rceil\):
In practical terms, the ceiling function assigns the nearest higher positive integer to a number [18]. For the case of screening tests, it implies that a whole rather than a decimal number of tests (rounded to the nearest, higher, positive integer) ought to be performed. In other words, the ceiling function in this context serves to suggest that when say, 2.8 tests are needed to achieve a desired PPV, one is better off doing 3 tests given the discrete nature of tests. Doing 3 would by definition guarantee that one is above the desired threshold, but doing 2 tests would yield a lower PPV than that desired.
Independence of serial testing
From the concepts described in this work, one might easily suggest that simply repeating the same screening test multiple times increases confidence that a positive result is a true positive. Setting aside the administrative and feasibility concerns, while such an interpretation is theoretically correct, the reality ought to be more nuanced, as there are confounding factors that might make the same result recur upon serial testing on the same patient. Indeed, repeating the same test under the same conditions, in a similar timeframe, perhaps even by the same interpreter/provider may not constitute a true independent observation [19]. Likewise, temporally smooth fluctuations in the biological parameters being measured imply there should be a temporal separation between subsequent tests. Otherwise stated, the final results are valid only if the probability of receiving subsequent tests is independent of the result of those tests (i.e., we would continue testing those with negative tests in addition to those with positive tests). As such, the primary use of the tables and notions herein described ought to be to contextualize the screening result and broaden the clinical judgement of the provider with regards to the reliability of the screening process. A more natural and reliable method to enhance the positive predictive value would be, when available, to use a different test with different parameters altogether after an initial positive result is obtained [19].
Strengths and limitations
The work hereby presented is largely theoretical in nature. As such, it carries several strengths, notably, (1) the complete derivation of the resulting equation and tablecloth function from first principles, (2) the use of mathematical language that translates well into clinical scenarios (use of limits to ensure attainable PPV values and use of the ceiling function to achieve a whole number of tests necessary), (3) the development of easily accessible reference tables for clinicians to use and (4) the novelty of the work presented—as to the best of our knowledge, the idea of sequential testing and Bayesian updating with a single screening test has not previously been explored to a great extent [20]. Nevertheless, the present work has some limitations as well, notably: (1) the lack of clinical data to validate results, and (2) the concerns regarding its clinical application given the potential issues with obtaining independent testing samples. Despite these limitations, the purpose of this manuscript is to raise awareness about the poor predictive value of many screening tests given the Bayesian limitations of the screening process and to contextualize the way the predictive value can be enhanced with a single repeated test, even in theory. Such an equation can contextualize the predictive ability of a single test  and may provide additional ways to communicate risk or likelihood of disease in the clinical counselling of patients.
Conclusion
In this manuscript, we describe a mathematical model to determine whether sequential testing with a single test overcomes the Bayesian limitations of screening and thus improves the reliability of screening tests. We show that for a desired positive predictive value of \(\rho\) that approaches k, the number of positive test iterations \(n_i\) is inversely proportional to the natural logarithm of the positive likelihood ratio (LR+). This clinical utility of this equation would be best observed in conditions with low pretest probability where single tests are insufficient to achieve clinically significant predictive values and likewise, in clinical scenarios with a high pretest probability where confirmation of disease status is critical. When independent observations are difficult to obtain, serial testing with a different test will likewise enhance the positive predictive value [19] (Fig. 2).
Availability of data and materials
Not applicable.
References
 1.
Hall GH. The clinical application of Bayes’ theorem. Lancet. 1967;290(7515):555–7.
 2.
Rouder JN, Morey RD. Teaching Bayes’ theorem: Strength of evidence as predictive accuracy. Am Stat. 2018;73:186–90.
 3.
Schulman P. Bayes’ theorem—a review. Cardiol Clin. 1984;2(3):319–28.
 4.
Dezert J, Tchamova A, Han D. Total belief theorem and generalized Bayes’ theorem. In: 2018 21st International Conference on Information Fusion (FUSION); IEEE. 2018. p. 1040–7.
 5.
Lutgendorf MA, Stoll KA. Why 99% may not be as good as you think it is: limitations of screening for rare diseases; 2016.
 6.
Simon D, Boring III JR. Sensitivity, specificity, and predictive value. In: Clinical methods: the history, physical, and laboratory examinations, 3rd edn. Butterworths; 1990.
 7.
Balayla J. On the formalism of the screening paradox. PLoS ONE. 2021;16(9):e0256645.
 8.
Moons KGM, van Es GA, Deckers JW, Habbema JDF, Grobbee DE. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: a clinical example. Epidemiology. 1997;8:12–7.
 9.
Balayla J. Prevalence threshold (ϕ e) and the geometry of screening curves. PLoS ONE. 2020;15(10):e0240215.
 10.
Woloshin S, Patel N, Kesselheim AS. False negative tests for SARSCOV2 infectionchallenges and implications. N Engl J Med. 2020;383:e38.
 11.
Etzioni RD, Kadane JB. Bayesian statistical methods in public health and medicine. Annu Rev Public Health. 1995;16(1):23–41.
 12.
Gniazdowski V, Morris CP, Wohl S, Mehoke T, Ramakrishnan S, Thielen P, Powell H, Smith B, Armstrong DT, Herrera M, et al. Repeat covid19 molecular testing: correlation of SARSCOV2 culture with molecular assays and cycle thresholds. Clin Infect Dis. 2020;27:ciaa1616.
 13.
Raymond JW, Jalaie M, Bradley MP. Conditional probability: a new fusion method for merging disparate virtual screening results. J Chem Inf Comput Sci. 2004;44(2):601–9.
 14.
McNeil BJ, Adelstein SJ. Determining the value of diagnostic and screening tests. J Nucl Med. 1976;17(6):439–48.
 15.
McGee S. Simplifying likelihood ratios. J Gen Intern Med. 2002;17(8):647–50.
 16.
Balayla J. Invariant points on the screening plane: a geometric definition of the likelihood ratio (lr+). 2020. arXiv preprint arXiv:2012.07066.
 17.
Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359(9309):881–4.
 18.
Weisstein EW. Ceiling function. 2002. https://mathworld.wolfram.com/.
 19.
Balayla J. Derivation of generalized equations for the predictive value of sequential screening tests. 2020. arXiv preprint arXiv:2007.13046.
 20.
Courty P, Hao L. Sequential screening. Rev Econ Stud. 2000;67(4):697–717.
Acknowledgements
Not applicable.
Funding
No funding was received for this study.
Author information
Affiliations
Contributions
Sole author conceived the idea, derived the equations, and wrote the manuscript. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable.
Competing interests
The author serves in an advisory capacity to Bayer, Inc, in their women’s health division.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Balayla, J. Bayesian updating and sequential testing: overcoming inferential limitations of screening tests. BMC Med Inform Decis Mak 22, 6 (2022). https://doi.org/10.1186/s1291102101738w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1291102101738w