A recent article in Nature, "Scientists rise up against statistical significance", discusses a group of statisticians who are speaking out against the way we describe our negative studies:
For several generations, researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis (the hypothesis that there is no difference between groups or no effect of a treatment on some measured outcome)…
Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero.
They suggest renaming "confidence intervals" as "compatibility intervals", and to interpret study results as being compatible with any outcome within these intervals.
For example, the SPRINT MIND study (https://jamanetwork.com/journals/jama/article-abstract/2723256) reported no association between intensive blood pressure reduction and dementia (HR 0.83, CI 0.67-1.04), but a significant reduction in mild cognitive impairment (HR 0.81; 95% CI, 0.69-0.95). The hazard ratios and confidence intervals are virtually identical for the two, with the only difference being that mild cognitive impairment was slightly higher powered due to higher case numbers and thus sneaked over the statistical significance line. Despite this, the medical press largely reported it along the lines of "Intensive BP control reduces mild cognitive impairment but not dementia" (E.g. https://www.bmj.com/content/364/bmj.l425.full).
Instead of saying "BP control doesn't prevent dementia", we should conclude that the effect on dementia could be anything from a 33% reduction to a 4% increase in risk, with a 17% reduction being the most likely. This different interpretation could be practice changing in some contexts, and would prevent the issue of "flip-flopping" when our position suddenly has to suddenly reverse once the confidence intervals tighten with further research.
A good example of this issue can be seen in the literature on statins (my favourite!). Reviews like this one from 2016 confidently state that statins do not prevent dementia, but this recommendation is based on very broad confidence intervals (e.g. 95% CI 0.61-1.65) which still include the possibility of a clinically significant reduction. A subsequent larger meta-analysis found that statins do prevent dementia with a RR 0.849, 95% CI = 0.787–0.916. These studies appear to say opposite things, but are in fact completely compatible, as the RR of 0.849 was included within the confidence intervals of the earlier, smaller study.
The correct interpretation of the first study should not have been "statins don't prevent dementia", but rather "statins could reduce dementia by 40%, increase it by 65%, or anything in between", in which case we wouldn't have been surprised when a later meta-analysis found it reduces dementia by 15%.
These are just some of the examples that I've noticed since I read the Nature article, and I feel that their approach of paying attention to the confidence interval rather than "significant vs non-significant" is a very helpful approach when reading the literature.
What do you guys think?
Source: Original link