Abstract
A number of philosophers of science and statisticians have attempted to justify conclusions drawn from a finite sequence of evidence by appealing to results about what happens if the length of that sequence tends to infinity. If their justifications are to be successful, they need to rely on the finite sequence being either indefinitely increasing or of a large size. These assumptions are often not met in practice. This paper analyzes a simple model of collecting evidence and finds that the practice of collecting only very small sets of evidence before taking a question to be settled is rationally justified. This shows that the appeal to long run results can be used neither to explain the success of actual scientific practice nor to give a rational reconstruction of that practice.
Similar content being viewed by others
References
Boring, E. G. (1950). A history of experimental psychology (2nd ed.). New Jersey: Prentice-Hall.
Casella, G., & Berger, R. L. (2001). Statistical inference (2nd ed.). Belmont: Duxbury.
DeGroot, M. H. (2004). Optimal statistical decisions. New Jersey: Wiley.
Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge: MIT Press.
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193–242.
Friedman, M. (1979). Truth and confirmation. The Journal of Philosophy, 76(7), 361–382.
Hempel, C. G. (1945a). Studies in the logic of confirmation (I.). Mind, 54(213), 1–26.
Hempel, C. G. (1945b). Studies in the logic of confirmation (II.). Mind, 54(214), 97–121.
Howson, C., & Urbach, P. (1989). Scientific reasoning: the Bayesian approach. La Salle: Open Court.
Kelly, K. T. (1991). Reichenbach, induction, and discovery. Erkenntnis, 35(1/3), 123–149.
Kelly, K. T. (1996). The logic of reliable inquiry. Oxford: Oxford University Press.
Lai, T. L. (1981). Asymptotic optimality of invariant sequential probability ratio tests. The Annals of Statistics, 9(2), 318–333.
Liu, Y., & Blostein, S. D. (1992). Optimality of the sequential probability ratio test for nonstationary observations. IEEE Transactions on Information Theory, 38(1), 177–182.
Olesko, K. M., & Holmes, F. L. (1993). Experiment, quantification, and discovery: Helmholtz’s early physiological researches, 1843–50. In D. Cahan (Ed.), Hermann von Helmholtz and the foundations of Nineteenth-century science, Chap. 2 (pp. 50–108). Berkeley: University of California Press.
Peirce, C. S. (1931 [1878]). How to make our ideas clear. In C. Hartstone & P. Weiss (Eds.), Collected Papers of Charles Sanders Peirce (Vol. V, pp. 5.388–5.410). Cambridge: Harvard University Press.
Popper, K. (1959). The logic of scientific discovery. London: Hutchinson.
Reichenbach, H. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge. Chicago: The University of Chicago Press.
Schurz, G. (2008). The meta-inductivist’s winning strategy in the prediction game: A new approach to Hume’s problem. Philosophy of Science, 75(3), 278–305.
Stein, C. (1946). A note on cumulative sums. The Annals of Mathematical Statistics, 17(4), 498–499.
von Helmholtz, H. L. F. (1850). Messungen über den zeitlichen Verlauf der Zuckung animalischer Muskeln und die Fortpflanzungsgeschwindigkeit der Reizung in den Nerven Archiv für Anatomie, Physiologie und wissenschaftliche Medicin (pp. 276–364).
Wald, A. (1947). Sequential analysis. New York: Wiley.
Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19(3), 326–339.
Acknowledgments
Thanks to Kevin Zollman, Kevin Kelly, Liam Bright, Adam Brodie, and an anonymous referee for valuable comments and discussion.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
From proposition 1 it follows that the optimal procedure that takes at least one observation takes the form \(\delta (a,b)\), where a is a negative and b a positive integer multiple of \(\log \frac{1-\varepsilon }{\varepsilon }\). If \(\xi = 1/2\), the symmetry of the problem (the loss for a wrong decision \(\beta\) and the cost per observation c are the same whether h is true or false) implies that in the optimal solution \(a = -b\). So I can restrict attention to procedures of the form
for some positive integer k. Note also that
Applying Eq. (1) to \(\delta _{k,k}\) yields
Note that \(\rho (1/2,\delta _{0,0}) = \beta /2\) correctly gives the risk of the procedure that takes no observations. So the optimal procedure (without the caveat “among those that take at least one observation”) is of the form \(\delta _{k,k}\) for some non-negative integer k.
Next, fix a value of k and ask whether \(\delta _{k+1,k+1}\) is better than \(\delta _{k,k}\). Some algebra shows that \(\rho (1/2,\delta _{k+1,k+1}) < \rho (1/2,\delta _{k,k})\) if and only if
Note that \(g_k(\varepsilon )\) is increasing in k, so either there is a unique positive integer \(k^*\) such that
or \(\beta /c \le g_0(\varepsilon )\); in that case set \(k^* = 0\). In either case \(\delta _{k^*,k^*}\) is the optimal sequential decision procedure. This proves proposition 5.
Now consider a prior of the form \(\xi _d\) for some \(d\in \mathbb {Z}\) (where \(\xi _d\) is as defined in proposition 6). This might be called a conjugate prior for this decision problem: the posterior after conditioning on evidence \(X_1\) is \(\xi _{d-1}\) if the evidence is \(X_1 = 1\) and \(\xi _{d+1}\) if \(X_1 = 0\).
Note that \(\xi _0 = 1/2\) so the optimal sequential decision procedure for \(\xi _0\) is \(\delta _{k^*,k^*}\) by proposition 5. In light of the above this statement is equivalent to the following: it is optimal to continue taking observations as long as the posterior remains between \(\xi _{k^*-1}\) and \(\xi _{1-k^*}\), and it is optimal to stop if the posterior is \(\xi _{k^*}\) or smaller, or \(\xi _{-k^*}\) or larger.
But the latter statement does not depend on the prior one started with. So for any prior \(\xi _d\) it is optimal to take observations if and only if the posterior remains strictly between \(\xi _{k^*}\) and \(\xi _{-k^*}\). This is exactly the sequential decision procedure \(\delta _{k^*+d,k^*-d}\) (which takes no observations if either \(k^*+d \le 0\) or \(k^*-d\le 0\)). This proves proposition 6.
If \(\xi _{d} < \xi < \xi _{d-1}\) then observing \(X_i = 0\) \(k^*-d+1\) times forces the posterior to be less than \(\xi _{k^*}\), at which point it is optimal to stop taking observations. Observing \(X_i = 0\) less than \(k^*-d\) times forces the posterior to be larger than \(\xi _{k^*-1}\), so continuing to take observations is optimal.
Similarly, observing \(X_i = 1\) \(k^*+d\) times forces the posterior to be greater than \(\xi _{-k^*}\), and observing \(X_i = 1\) less than \(k^*+d-1\) times forces the posterior to be less than \(\xi _{-k^*+1}\). Hence one of \(\delta _{k^*+d,k^*-d}\), \(\delta _{k^*+d-1,k^*-d+1}\), \(\delta _{k^*+d-1,k^*-d}\), or \(\delta _{k^*+d,k^*-d+1}\) is the optimal sequential decision procedure. This proves the corollary.
Rights and permissions
About this article
Cite this article
Heesen, R. How much evidence should one collect?. Philos Stud 172, 2299–2313 (2015). https://doi.org/10.1007/s11098-014-0411-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11098-014-0411-z