In medical science, as in all walks of life, we are impressed by dramatic effects. If a new treatment seems much better than an old one initially, there is often impatience to get on and use it, and people question why one would want to conduct formal trials.
Doctors who feel this enthusiasm for what they see as a breakthrough often argue that it's not ethical to do a randomised trial of an exciting new treatment, because the benefits seem so obvious, and randomisation means that half the patients are deprived of them. Of course breakthroughs sometimes turn out to be false dawns, but the idea that something might be so obviously better than what we have now that it doesn't need a randomised trial is pretty widespread in medicine.
We decided to look at this by trying to find all the published randomised trials where the new treatment was reported as being five times better than the previous treatment (or a controlled group). We thought this 'five times better' idea might be a useful rule for medical science. If a hazard ratio of five (i.e. the new treatment is five times better) nearly always predicted correctly that subsequent trials would always report significant benefits, then we could use this as a signpost for the point where no further evidence is needed. Unfortunately, this turned out not to be true.
We studied all of the trials in the Cochrane Collaboration Database (more than 80,000) and found that there were very few instances where there had been both trial with a dramatic effect like this and a subsequent trial. We looked at the ones we found and unfortunately the 'five times better' rule was wrong in over one third of the cases. In other words, even though an earlier trial showed the new treatment as five times better, a subsequent trial said it was not significantly better at all. We tried to find a rule which worked by increasing the hazard ratio or the significance of the results. We found that we had to increase the hazard ratio to 20 (i.e. the new treatment is 20 times better than the old treatment) before the rule became 100% reliable. Out of the whole Cochrane database there were only four trials that fitted this rule.
So why doesn't this rule work? The main problem is an effect known as 'regression to the mean'. Most of the trials that show dramatic effects are small trials, and we know that a small trial has a better chance of producing a freak result than a large trial through the effects of pure chance. Smaller randomised trials also tend to be of lower quality than larger ones and therefore open to greater degrees of bias. The implications, particularly for surgery, are quite interesting. It's well known that it's much harder to perform a large randomised trial in surgery than it is when studying a drug. However, our work adds to the literature showing that small randomised trials are pretty unreliable. Given that they are also very expensive and difficult to do, our results throw into question whether surgeons might be better to do another type of study in situations where they know that they won't be able to do a large enough randomised trial to avoid the effects we are talking about. There will always be exceptions to this rule, particularly for rare diseases, but our findings could be used to support the idea that in surgery it may be useful sometimes to do a large non-randomised prospective study before committing to the major undertaking of developing a large high quality randomised trial.
Explore further: How to assess the effectiveness of activity trackers for improving health
Myura Nagendran et al. Very large treatment effects in randomised trials as an empirical marker to indicate whether subsequent trials are necessary: meta-epidemiological assessment, BMJ (2016). DOI: 10.1136/bmj.i5432