Warning upfront: This post could be slightly technical.
There are lies, damned lies and there is statistics is a commonly used phrase to deplore the world of statistics. However, the problem becomes more acute when statistical tools are misused and misinterpreted by even experts.
One such tool is p-values which is used to show significance of your results. Say one is testing whether a new experiment is more useful or not. So you set a null hypothesis saying the new experiment does not matter. Alternative hypothesis is new experiment matters. Then you set a threshold of 5%. If the p values comes lower than 5% we shout eureka and say results are significant and new experiment matters. If p value more than 5% shoulders drop and one starts to look for data issues, addition of one more variable and so on.
American Statistical Association has seen abuse of p-value in research after research. People frame policies based on p-values only to see then go mostly wrong. Then the blame comes on damned statistics. However, problem is more of misinterpreting p-values.
Frustrated by this, ASA has released 6 principles of p-values:
After reading too many papers that either are not reproducible or contain statistical errors (or both), the American Statistical Association (ASA) has been roused to action. Today the group released six principles for the use and interpretation of p values. P-values are used to search for differences between groups or treatments, to evaluate relationships between variables of interest, and for many other purposes. But the ASA says they are widely misused. Here are the six principles from the ASA statement:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
There is a nice interview of Ron Wasserstein, ASA’s executive director as well where he explains some of these points. This one is really important:
Retraction Watch: Some of the principles seem straightforward, but I was curious about #2 – I often hear people describe the purpose of a p value as a way to estimate the probability the data were produced by random chance alone. Why is that a false belief?
Ron Wasserstein: Let’s think about what that statement would mean for a simplistic example. Suppose a new treatment for a serious disease is alleged to work better than the current treatment. We test the claim by matching 5 pairs of similarly ill patients and randomly assigning one to the current and one to the new treatment in each pair. The null hypothesis is that the new treatment and the old each have a 50-50 chance of producing the better outcome for any pair. If that’s true, the probability the new treatment will win for all five pairs is (½)5 = 1/32, or about 0.03. If the data show that the new treatment does produce a better outcome for all 5 pairs, the p-value is 0.03. It represents the probability of that result, under the assumption that the new and old treatments are equally likely to win. It is not the probability the new treatment and the old treatment are equally likely to win.
This is perhaps subtle, but it is not quibbling. It is a most basic logical fallacy to conclude something is true that you had to assume to be true in order to reach that conclusion. If you fall for that fallacy, then you will conclude there is only a 3% chance that the treatments are equally likely to produce the better outcome, and assign a 97% chance that the new treatment is better. You will have committed, as Vizzini says in “The Princess Bride,” a classic (and serious) blunder.
Actually the problem is the way stats is taught. It is taught in a highly mechanical way with assumptions either ignored or barely touched upon.
What has to be instead done is too go into the philosophy and roots of this testing and ensure students understand the limitations. This has become even more acute as it has become so easy to get data, run regressions and get p-values. It wasn’t this easy earlier so the stats profession could sit pretty as misinterpretations would be far and few.
But not anymore.
We need more textbooks now telling you how to think through these stats matters in plain English. More than how to get p-values, the stress should be on understanding its limitations..