Statistical Literacy Website/Papers: Www.Statlit.Org/Pdf/2010GoodmanASA.Pdf (
2010)
Copy
BIBTEX
Abstract
In the face of continuing assumptions by many scientists and journal editors that p-values provide a gold standard for inference, counter warnings are published periodically. But the core problem is not with p-values, per se. A finding that “p-value is less than α” could merely signal that a critical value has been exceeded. The question is why, when estimating a parameter, we provide a range (a confidence interval), but when testing a hypothesis about a parameter (e.g. µ = x) we proceed as if “=” entails exact equality of the parameter with x. That standard is hard to meet, and is not a standard expected for power calculations, where we are satisfied to reject a null hypothesis H0 if the result is merely “detectably” different from (exact) H0. This paper explores, with resampling (simulation) methods, the impacts on p-values, and alternatives, if the null hypothesis is defined as a thick or thin range of values. It also examines, empirically, the extent to which the p-value may or may not be a good predictor of the probability that H0 is true, given the distribution of the data.