By Peter C Gøtzsche
Two weeks ago, US epidemiologist Harvey Risch published a paper criticising evidence-based medicine (EBM) and its reliance on randomised clinical trials (RCTs) quite substantially. At the same time, he praised observational studies based on arguments that are untenable.
After Gordon Guyatt had coined the term EBM in the early 1990s, some prominent epidemiologists provided similar criticisms of it as Risch. It is therefore worthwhile to discuss Risch’s key arguments.
When we discuss the merits of various research designs – indeed when we discuss anything – we need first and foremost to get the facts right. As I shall show below, there are important errors in Risch’s way of arguing.
Risch: As a university epidemiologist in 1991, I was insulted by the hubris and ignorance in the use of this term, EBM, as if medical evidence were somehow “unscientific” until proclaimed a new discipline with new rules for evidence.
The main idea of EBM was to tell clinicians that if they relied on their clinical experience while ignoring science that went against it, or not knowing about the most relevant science, they would not produce the best outcomes for their patients.
Psychiatry illustrates why EBM is important. Many psychiatrists still rely more on their biased clinical experience than on the science. The randomised trials say very clearly that by increasing the dose of a psychiatric drug, e.g. a drug against depression or psychosis, you do not improve the effect, only the harms, and yet psychiatrists increase the dose routinely because patients tend to get better with time, which they misinterpret as an effect of increasing the dose.
Even worse, the randomised trials show, also very clearly, that depression pills double the risk of suicide, not only in children but also in adults, but leading psychiatrists constantly refers to flawed observational studies that tell them the opposite.
This is what I call the UFO trick in my Critical Psychiatry Textbook. It is very common in science to mislead your readers this way, and it is all about not losing power and prestige and be forced to admit that you were wrong. If you use a fuzzy photo to “prove” you have seen a UFO when a photo taken with a strong telephoto lens has clearly shown that the object is an airplane or a bird, you are a cheat.
Risch: In 1996, responding to criticisms of EBM, David Sackett et al. (1996) attempted to explain its overall principles. Sackett asserted that EBM followed from “Good doctors use both individual clinical expertise and the best available external evidence.” This is an anodyne plausibility implication, but both components are basically wrong or at least misleading. By phrasing this definition in terms of what individual doctors should do, Sackett was implying that individual practitioners should use their own clinical observations and experience. However, the general evidential representativeness of one individual’s clinical experience is likely to be weak. Just like other forms of evidence, clinical evidence needs to be systematically collected, reviewed, and analyzed, to form a synthesis of clinical reasoning, which would then provide the clinical component of scientific medical evidence.
I co-founded the Cochrane Collaboration in 1993 and Sackett was the first chair of our Steering Group, which I was also a member of. We never disputed that EBM is also about summarising other types of research than RCTs. In the 1990s, my statistician and I established the international Cochrane Non-Randomised Methods group at the Nordic Cochrane Centre, which I directed. Some of the members had a strong background in epidemiology; others were experts in randomised trials, methodology and statistics. We agreed on four key issues:
1) Observational studies should, with rare exceptions, not be used to prove that an intervention is beneficial because effect sizes in healthcare are generally small. Measured effects could therefore easily be caused by confounding rather than being true effects.
2) Observational studies are needed when we cannot perform a randomised trial for ethical or practical reasons, e.g. we cannot randomise pregnant women to alcohol or no alcohol to study if alcohol intake during pregnancy is detrimental to the foetus.
3) Observational studies can be highly useful to elucidate what the harms are of our interventions, as they are often poorly reported in trials, if reported at all, or are so rare that they won’t be picked up in trials.
4) Observational studies are useful to summarise what we know about an intervention in order to provide information about how a randomised trial should best be planned and carried out.
Risch: A bigger failure of evidential reasoning is Sackett’s statement that one should use “the best available external evidence” rather than all valid external evidence.
This is a strawman argument. What is hidden in industry archives or in the file drawers of academics that do not dare upset their colleagues by publishing their data is not available. That was Sackett’s point. And by available, he meant easily available, as he had a habit of looking up the scientific evidence during his rounds at the McMaster hospital if he was in doubt about what to do.
Risch: Sir Austin Bradford Hill (1965) did not include an aspect of what would constitute “best” evidence, nor did he suggest that studies should be measured or categorized for “quality of study” nor even that some types of study designs might be intrinsically better than others.
This is also a strawman argument. What Hill wrote in 1965 cannot be decisive for what we ought to do. In EBM, we always need to consider the risk of bias in the studies we review. Many RCTs are heavily manipulated, with data torture until your data confessed. It is also true that some research designs are more reliable than others.
Risch: That EBM is premised on subjectively cherry-picking “best” evidence is a plausible method but not a scientific one.
This is a totally wrong argument. EBM is the opposite of cherry-picking. A systematic review is based on a protocol that describes which types of studies to collect and how to appraise them. This is done to reduce subjectivity in the process as much as possible.
Risch: Over time, the EBM approach to selectively considering “best” evidence seems to have been “dumbed down,” first by placing randomized controlled trials (RCTs) at the top of a pyramid of all study designs as the supposed “gold standard” design, and later, as the asserted only type of study that can be trusted to obtain unbiased estimates of effects.
I am co-author of the STROBE guidelines for good reporting of observational studies. If epidemiologists have a good knowledge of statistics, they will agree that randomisation is the only method that guarantees that allocation to the comparison groups is unbiased in respect of prognosis and responsiveness to treatment. This is what makes statistical hypothesis testing valid. Strictly speaking, p-values in observational research are not valid, which statisticians have recognised for decades. Randomisation is the only means of allocation that controls for unknown and unmeasured differences as well as those that are known and measured.
Risch: Most doctors only consider RCT evidence and dismiss all other forms of empirical evidence.
Such dogmatism is not only unacceptable; it is also dangerous. It means that doctors may dismiss even serious harms of their interventions because they were only found in observational studies.
Risch: So, what is the flaw of randomization …?
There is no flaw in randomisation. Its virtue is that allocation to the comparison groups is unbiased.
Risch: The purpose of randomization, of balancing everything between the treatment and control groups, is to remove potential confounding. Is there any other way to remove potential confounding? Yes: measure the factors in question and adjust or control for them in statistical analyses.
There are two main reasons why this is incorrect. First, we cannot control for unknown confounders. Second, controlling for confounders may take us further from the truth than if we do not control for confounders. We explain this in our recent systematic review of serious adverse events of the COVID-19 vaccines:
For observational studies, the main problem is confounding. In a little known but ingenious study, a statistician used raw data from two randomised multicentre trials as the basis for observational studies that could have been carried out. He showed that the more variables that are included in a logistic regression, the further we are likely to get from the truth. He also found that comparisons may sometimes be more biased when the groups appear to be comparable than when they do not; that adjustment methods rarely adjust adequately for differences in case mix; and that all adjustment methods can on occasion increase systematic bias. He warned that no empirical studies have ever shown that adjustment, on average, reduces bias.
Risch: Thus, randomization, in theory, removes potential confounding by unmeasured factors as an explanation for an observed association. That is the plausibility argument.
It is not a plausibility argument. It is a fact that randomisation is the only method that guarantees that any differences between groups is random.
Risch: In order for randomization to work, there needs to be sizable numbers of outcome events in both the treatment and placebo groups, say 50 or more in each group
This is not correct. Randomisation works if done properly. Number of outcome events have nothing to do with this.
Risch: An important example of this issue can be seen in the first published efficacy RCT result for the Pfizer BNT162b2 mRNA Covid-19 vaccine (Polack et al., 2020). This study was considered large enough (43,548 randomized participants) and important enough (Covid-19) that because of its assumed RCT plausibility it secured publication in the “prestigious” New England Journal of Medicine. The primary outcome of the study was the occurrence of Covid-19 with onset at least seven days after the second dose of the vaccine or placebo injection. However, while it observed 162 cases among the placebo subjects, enough for good randomization, it found only eight cases among the vaccine subjects, nowhere nearly enough for randomization to have done anything to control confounding.
This argument is totally wrong. Using the argument, all randomised trials that report a huge effect should be disbelieved, even a trial of 100,000 patients that found zero deaths in the actively treated group and 100 deaths in the placebo group because, according to Risch, this was not enough “for randomization to have done anything to control confounding” This is absurd.
Risch: In nonrandomized trials, the investigators know that many factors may, as possible confounders, influence the occurrence of the outcome, so they measure everything they think relevant, in order to then adjust and control for those factors in the statistical analyses.
This theoretical argument has been rejected in an empirical study, see the ingenious study just above. Furthermore, to base statistical adjustments on what researchers “think relevant” is a highly bias-prone method. One could try to adjust for fewer or more confounders until the data confessed.
Risch: In RCTs, investigators routinely think that the randomization has been successful and thus carry out unadjusted statistical analyses, providing potentially confounded results.
It is the other way around. In RCTs, adjusted analyses, based on the investigators’ arbitrary inclusion of one or more confounders, should be seen as exploratory. In meta-analyses, this is even more important. To avoid bias, meta-analyses should always be based on unadjusted results in the individual trials.
Risch: Trials with small numbers of primary outcome events are useless and should not be published, let alone relied upon for public health or policy considerations.
This is totally wrong. All trial results must be known. First, we have an ethical obligation towards the patients who contribute to science, often at a personal risk, to make all results known. Second, we have a scientific obligation to make all results known. Third, if the trials were done properly, we can increase the number of events by meta-analysis. Particularly in the past, selective publication of trial results was an important reason why meta-analyses of RCTs were often too positive.