1. statspecialist.com http://www.statspecialist.com/blog/a-p-value-what-does-it-mean-really/
A p-value; what does it mean really?
Statistics
Specialist
On several occasions I have been approached by students and researchers to “help
them get a p-value for their results”. They believed that the purpose of an analysis was
to dig this value from the data otherwise the analysis would be incomplete.
Sir Ronald Fisher, a British statistician and geneticist, introduced the p-value in 1925.
This was around the time when he was developing computational algorithms for
analyzing data from his balanced experimental designs. He wrapped up his work in his
first book, Statistical Methods for Research Workers. The book went through many editions and translation over
time, and later became the standard reference work for scientists in many disciplines.
At the time he assumed 0.05 as a reference point for rejecting a null hypothesis but not as a sharp cut off.
Fisher’s philosophy of significance testing interpreted the p-value as a measure of evidence from a single
experiment. As a measure of evidence, the p-value was meant to be combined with other sources of information.
Thus, there was no set threshold for “significance” (Fisher, 1973).
P-values have since been widely misunderstood in a lot of circles where it is reported. Goodman’s article on the
misinterpretation of p-values lists some misconceptions. In brief the article puts across that a p-value of say 0.05
does not mean that: there is only a 5% chance that the null hypothesis is true, there is a 5% chance of a Type I
error (i.e. false positive), there is a 95% chance that the results would replicate if the study were repeated, there
is no difference between groups or that you have proved your experimental hypothesis.
A p-value should be interpreted as: The probability of getting the results you have observed or more
extreme results given that the null hypothesis is true. This might still not be clear, so let’s have usual coin
toss examples frequent in introduction to probability lessons.
Suppose we toss a fair coin 20 times and observe the number of heads that come up, we would expect to obtain
10 heads in our experiment. This is because, for a fair coin, the probability of turning heads is 0.5 and so the
expected number of heads will be 20*0.5=10.
Now let’s experiment with a coin with an unknown probability of turning heads. Our aim in the experiment is to
quantify the evidence against our null hypothesis that the coin is fair. In our experiment the coin lands heads on
16 out of 20 tosses.
How do we interpret this result? Is it unusual given that we were expecting about 10 heads? Let’s calculate a p-value.
Remember that a p-value was the probability of getting the observed results (16 heads) or more extreme
results (17, 18, 19, or 20 heads) if our null hypothesis is true- the coin is fair. Considering each toss as a Bernoulli
experiment we can easily obtain the probability that in 20 trials we get x (x=16,.., 20) number of heads using the
binomial function:
Where is the probability of success in each trial.
The p-value obtained is 0.0059. This could mean that an unlikely event occurred i.e. a fair coin landing
heads 16 times or that the coin is not fair! However, the p-value does not tell us which is which. Many people
conclude that such an unlikely event suggests that the coin is not fair; rejecting the null hypothesis that the coin is
fair, but do not recognize that there is a second possibility in the circumstance. So, I have heard statements like
‘the p-value was <0.05 which proves that the null hypothesis is true’ or ‘the p-value was <0.05, therefore, we
accept the null hypothesis’. This is where the misinterpretation comes.