This post will look at adaptively randomized trial designs. In particular, we want to focus on multi-arm trials, i.e. trials of more than two treatments. The aim is to drop the less effective treatments quickly so the trial can focus on determining which of the better treatments is best.
We’ll briefly review our approach to adaptive randomization but not go into much detail. For a more thorough introduction, see, for example, this report.
Why adaptive randomization
Adaptive randomization designs allow the randomization probabilities to change in response to accumulated outcome data so that more subjects are assigned to (what appear to be) more effective treatments. They also allow for continuous monitoring, so one can stop a trial early if we’re sufficiently confident that we’ve found the best treatment.
Adapting randomization probabilities
Of course we don’t know what treatments are more effective or else we wouldn’t be running a clinical trial. We have an idea based on the data seen so far at any point in the trial, and that assessment may change as more data become available.
We could simply put each patient on what appears to be the best arm (the “play-the-winner” strategy), but this would forgo the benefits of randomization. Instead we compromise, continuing to randomize, but increasing the randomization probability for what appears to be the best treatment at the time a subject enters the trial.
Continuous monitoring
By monitoring the performance of each treatment arm, we can drop a poorly performing arm and assign subjects to the better treatment arms. This is particularly important for multi-arm trials. We want to weed out the poor treatments quickly so we can focus on the more promising treatments.
Continuous monitoring opens the possibility of stopping trials early if there is a clear winner. If all treatments perform similarly, more patients will be used. The maximum number of patients will be enrolled only if necessary.
Multi-arm trials
Randomizing an equal number of patients to each of several treatment arms would require a lot of subjects. A multi-arm adaptive trail turns into a two-arm trial once the other arms are dropped. We’ll present simulation results below that demonstrate this.
Running a big trial with several treatment arms could be more cost effective than running several smaller trials because there is a certain fixed cost associated with running any trial, no matter how small: protocol review, IRB approval, etc.
There has been some skepticism whether two-arm adaptively randomized trials live up to their hype. Trial design is a multi-objective optimization problem and it’s easy to claim victory by doing better by one criteria while doing worse by another. In my opinion, adaptive randomization is more promising for multi-arm trials than two-arm trials.
In my experience, multi-arm trials benefit more from early stopping than from adapting randomization probabilities. That is, one may treat more patients effectively by randomizing equally but dropping poorly performing treatments. Instead of reducing the probability of assigning patients to a poor treatment arm, continue to randomize equally so you can more quickly gather enough evidence to drop the arm.
I initially thought that gradually decreasing the randomization probability of a poorly performing arm would be better than keeping the randomization probability equal until it suddenly drops to zero. But experience suggests this intuition was wrong.
Simulation study
I designed a 4-arm trial with a uniform prior on the probability of response on each arm. The maximum accrual was set to 400 patients.
An arm is suspended if the posterior probability of it being best drops below 0.05. (Note: A suspended arm is not necessarily closed. It may become active again in response to more data.)
Subjects are randomized equally to all available arms. If only one arm is available, the trial stops. Each trial was simulated 1,000 times.
In the first scenario, I assume the true probabilities of successful response on the treatment arms are 0.3, 0.4, 0.5, and 0.6 respectively. The treatment arm with 30% response was dropped early in 99.5% of the simulations, and on average only 12.8 patients were assigned to this treatment.
|-----+----------+----------------+----------| | Arm | Response | Pr(early stop) | Patients | |-----+----------+----------------+----------| | 1 | 0.3 | 0.995 | 12.8 | | 2 | 0.4 | 0.968 | 25.7 | | 3 | 0.5 | 0.754 | 60.8 | | 4 | 0.6 | 0.086 | 93.9 | |-----+----------+----------------+----------|
An average of 193.2 patients were used out of the maximum accrual of 400. Note that 80% of the subjects were allocated to the two best treatments.
Here are the results for the second scenario. Note that in this scenario there are two bad treatments and two good treatments. As we’d hope, the two bad treatments are dropped early and the trial concentrates on deciding the better of the two good treatment.
|-----+----------+----------------+----------| | Arm | Response | Pr(early stop) | Patients | |-----+----------+----------------+----------| | 1 | 0.35 | 0.999 | 11.7 | | 2 | 0.45 | 0.975 | 22.5 | | 3 | 0.60 | 0.502 | 85.6 | | 4 | 0.65 | 0.142 | 111.0 | |-----+----------+----------------+----------|
An average of 230.8 patients were used out of the maximum accrual of 400. Now 85% of patients were assigned to the two best treatments. More patients were used in this scenario because the two best treatments were harder to tell apart.