Name:
William J. Meurer, MD, MS, discusses cluster randomized trials and evaluating treatments applied to groups.
Description:
William J. Meurer, MD, MS, discusses cluster randomized trials and evaluating treatments applied to groups.
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/15625957-65b7-4404-ab19-69fa1e45c5d2/thumbnails/15625957-65b7-4404-ab19-69fa1e45c5d2.jpg?sv=2019-02-02&sr=c&sig=3jpZXKs9%2FhqMsT60jLPxZ6BfB0H2Y%2FckCu%2FxASdl55c%3D&st=2025-01-02T23%3A48%3A40Z&se=2025-01-03T03%3A53%3A40Z&sp=r
Duration:
T00H23M39S
Embed URL:
https://stream.cadmore.media/player/15625957-65b7-4404-ab19-69fa1e45c5d2
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/15625957-65b7-4404-ab19-69fa1e45c5d2/18471695.mp3?sv=2019-02-02&sr=c&sig=DDwXpfbkv89dudsjgT8jSMOJmaAugA0hHexEPaJhuNQ%3D&st=2025-01-02T23%3A48%3A40Z&se=2025-01-03T01%3A53%3A40Z&sp=r
Upload Date:
2022-02-28T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[ Music ] >> Hello and welcome to this episode of JAMAevidence. I'm Ed Livingston, Deputy Editor for Clinical Reviews in Education at JAMA. Today I'm joined by Dr. Will Moyer, Associate Professor of Emergency Medicine and Neurology at the University of Michigan. You've written a chapter in the JAMA Guide to Statistics and Methods on cluster randomized trials. So could we start by having you tell us why would someone do a cluster randomized trial? >> If you're looking at interventions that perhaps cross healthcare systems like health services interventions, it can be very hard to allocate those things differently across patient to patient.
And when you're looking at these types of interventions, something that is a change physician practice like perhaps getting emergency physicians to change how they take care of patients with dizziness, you need to give an intervention that you are giving to groups. And it's hard for those groups to unsee what they saw. By using a cluster design, you give that intervention to some groups and you find some similar groups who you don't give that intervention to. And you look at what change in outcome you might have induced from that clinical trial design.
>> So you analyze groups together and compare one group to another and that violates one of the more important assumptions in regression analysis which is independence of the study subjects. So could you first explain to us why independence matters in regression analysis and then how you get around that when you do cluster randomized trial analyses? >> I would be glad to. One of the hallmarks of statistics as you pointed out is that we think that each observation is independent from the other.
But if we do any sort of study that the data is clustered, whether it's a randomized trial or even an analysis of mortality across hospitals, it is likely that all of the patients within that hospital or that group are more alike and therefore that fundamental assumption is violated. At one extreme, maybe there isn't really that much correlation. For example, if you're studying something like different flu vaccinations, the similarities between patients in terms of their resistance to the flu could be quite small.
Whereas, you could think of a practice of how seizures are treated in ambulances where all of the paramedics are following a very specific protocol for how to treat seizures and as such almost every observation from each ambulance agency is exactly the same in terms of what treatment is being used. So it's a spectrum between that assumption of independence where each observation is uncorrelated to the other end of that spectrum where maybe it's something where everything in that group is being done the same way.
And in practice, most of the time it's somewhere in the middle of those two things. And the closer it is to independence, the less of an impact on your overall sample size and the closer it is to absolute uniform behavior in a group, then the effective sample size you have is just the number of groups and the number of individuals in the group is somewhat less important. That's a key concept just in any type of statistical analysis is that correlation across the subjects in a group, if it is high it is going to basically reduce your effective sample size or decrease the amount of precision you'll have in those results.
>> Now, the amount of correlation within these groups is quantified by an entity known as the intracluster correlation coefficient or the ICC. Could you explain that to us? >> So for cluster randomized trials, in contrast to like a regular trial where you may be calculator your sample size by some hypothesized difference across groups or a difference of means in the standard deviations, it's important to also think about adding a-- we'll call it like an inflation factor for the degree of correlation within subjects.
And sometimes you have preliminary data on this that could suggest how much correlation there will be within subjects. Sometimes you don't and you can look to the literature to find an amount that you would deem reasonable. And it basically is this proportion that is the amount of the variability that you're seeing that can be explained by that within group correlation. So if you were taking blood pressure measurements and there was, you know, a specific practice that had really good control of their blood pressure, you would notice that overall that group had a lower blood pressure and those readings were more highly correlated because they are within a tighter window and that would have that higher intracluster correlation coefficient.
Whereas if you took a bunch of blood pressures from a group practice where patients maybe have less access to blood pressure medications, their blood pressures may vary quite widely and as such you may have a different ICC if that's what your clusters are looking like. And obviously there may be differences across clusters, but you make a study-wide assumption as to how much of that within-group correlation is going to be across the trial.
And if some clusters are a little bit more correlated and some are a little bit less correlated, that typically isn't a huge deal as long as you have the magnitude right. And oftentimes this magnitude is not extremely large. It may be somewhere between 1 and 10% of what we would think of as the overall variability or kind of the spread of the data could be attributed to the within-group correlation or the ICC. >> Let me ask the question in a slightly different way. Let's say you were doing a study of how you clean ICU beds using different techniques to reduce infection rates in ICUs.
Obviously, that's a setup for a cluster study because it's not very practical to tell your custodial staff, "clean one bed this way and another bed the other way" because they're just going to do what they think is best. So you want to take the custodial staff for an entire ICU, say, "you do it this way." And then maybe some other ICU or some other institution, "you do the other way." And then that's your cluster. So let's say you have 20 of these and you put 10 in each group. And each hospital, they contribute 100 patients per ICU.
So you have 2,000 patients and 20 ICUs. The power is based on what? The number of patients or the number of ICUs? >> So a really good way of thinking of it -- it's somewhere in between. If the ICC was one, then you would have power based on 20 observations which would mean that the infection rate within each individual could be determined by knowing which ICU they are in. If your ICC was near zero, you would have an effective sample size of nearly 2,000.
And most of the time the impact is much closer to the 2,000. So you could expect that perhaps your effective sample size would be somewhere in the range of 1,800 so that you're taking a small hit to your effective sample size, you're increasing the variability in your outcome measurement, your confidence interval for the proportion that are infected will get a little wider, but it will be much closer to the number of patients in the vast majority of cases. >> How is the intercluster correlation coefficient calculated?
>> When you do a type of regression model that fits multiple levels, it is a way of figuring out how much variance is coming from the between subjects differences and how much is rolled up and can be explained by each individual hospital. In a study looking at a binary outcome like infection rate, you could think of each of those hospitals having their own sort of baseline infection rate that might be a little different. Some hospitals maybe have different types of patients who maybe have factors that put them at more risk for infection.
How much of the infection rate seems to be attributed to the hospitals themselves and how much of it seems to be attributed to the patient, how much variability there is in the patient. Is it really sort of noisy in that there are patients that are all over the place unpredictably, in which case you would have lower ICC? Or are those patients correlating quite reliably? And sometimes this is a little harder to understand for binary outcomes, like did you have it or not? But if you think about if we were measuring blood pressures, we could think of a set of blood pressures for a hospital and its standard deviation.
Each hospital's mean is interesting but the width of the standard deviation at each hospital would give you a sense as to how tightly correlated those blood pressures are within that hospital. So if you saw one hospital where the blood pressures range from 120 to 130, in that sort of setting the ICCs would be higher because the hospital is explaining more of that mean. Whereas if the hospital had blood pressures that were really spread out between 90 and 170, then there would be more patient to patient variability even within that hospital and likely the hospital is explaining less of that variability in blood pressures.
>> So when one of these studies is designed and you're doing the power calculation to determine the sample size, then you need to have an estimate of the expected outcome, its variability, and also some guess at what the intercorrelation coefficient is going to be? Is that how that's done? >> Exactly. And for the general high level overview, if it's a continuous outcome, you have this mean and its standard deviation. If it's a binary outcome, you just have the proportion, but that amount of variability around that proportion varies based on where you are on the probability scale.
If you're looking for a difference between 2-4% mortality, that has a different general variability than if you're looking for a difference from 50-52% mortality. It's much harder to find a different from 50 to 52 than it would be from 2 to 4. That's something that sometimes people get a little confused with. When you learn in statistics how to come up with a sample size, it's important to have the mean and standard deviation. But in medicine we often use binary outcomes like alive or dead, and the standard deviation goes away.
And it hasn't exactly gone away. It just depends on where you are on the probability scale. It gets harder and harder to find differences when you get around 50% in terms of your outcome rate. And it gets sort of easier and easier to find differences as you get to the boundaries. >> In your article, you talk about the various caveats when looking at cluster randomized trials. And one of them is contamination. Can you explain contamination of a trial to us and how that affects cluster randomized trials?
>> Definitely. If your intervention leaks out -- we go back to the example of the ways to clean ICU rooms. Perhaps there's a lot of turnover in your environmental services staff in your hospital and they go over to another hospital. It's hard for those people who have gone from one hospital to the other to unlearn what they have learned. So if they have seen a better technique for cleaning rooms, it would really be hard for them to just come to that new hospital and say, "you know, gosh, I want to just go back to what I was doing before" once I've learned something that has been sold to me as better.
One form of contamination would be if the people that you have intervened on move to a control hospital. And we did a cluster randomized trial of emergency physician behavior in terms of treating patients with stroke with TPA. And we did observe and sort of quantify how often physicians migrated from one hospital to the other. And it wasn't zero but it wasn't a ton. But the more of it that happens, the more that your two groups are going to look similar.
And it's harder and harder to find a difference if your groups end up being similar. So if the control group has more people who have been intervened on who have leaked into that cluster, then there's less ability to determine a difference. You know, not dissimilar from, say, a surgical versus medical trial for, say, a device versus medical management for stroke prevention where there's a lot of crossover to the surgical arm because providers outside the study feel that they need to do something additional for their patients afterwards.
In that same way it gets harder and harder to show a difference between the surgical arm and the medical arm, if there is a lot of crossover. So very similar concept to crossover. >> If you have crossover, does this affect an intention to treat analysis in any way if you have a cluster randomized trial and there's crossover? >> It is something that you definitely want to try to measure and account for. If there is significant crossover, it should not necessarily affect the analysis but it should, if what was crossing over was effective, it should attenuate the observed effects under your intention to treat analysis in that the groups will be looking more similar.
And if there was separation across those groups there should be less separation after the crossover in the intention to treat analysis. >> Anything else you think we should cover? >> I think one of the other issues where it can be helpful to use cluster randomized trials may be situations where it would be somewhat impractical to consent patients. If you're looking at, say, different ways for physicians to use the electronic health record in a way that wasn't really impacting patients directly, it might be hard to do individual patient level randomization on that.
Say, oh on my next patient you might have to wait longer while I use this other charting method. It could be hard to get at certain answers. But in certain cases, if there's forms of outcome measurement within the patients after this intervention or across these hospitals' individual patient level consent, the following for outcomes may be needed. And in the Restore trial that was in JAMA that was referenced in the cluster randomized trial chapter, that was the case in that they were looking at kids in pediatric ICUs and sedation protocols in order to limit the number of days on ventilators.
And the interesting thing about that was that added a quirk to the analysis because they knew that patients in the hospitals that were simply following their routine care, it was simpler to get consent from those patients because it was just seeing how long they were on the ventilator. Whereas in the hospitals in Restore that had been in the intervention group, there was a little bit more of a difference and more detailed consent process because they were using this new standardized protocol for sedation and ventilator liberation.
And as such, they actually had to include some extra hospitals in their intervention group because they knew it would be harder to recruit patients in those hospitals because there was a bigger ask. In the end, they found some interesting differences on some of their secondary outcomes but they didn't find a difference in the main outcome in that trial. And those sorts of things can have real practical implications when you're designing a cluster randomized trial because every single thing that you do that is different between the treatment and control hospitals can have an impact.
And in this case, you know, having slightly different consent processes changed somewhat the willingness and number of patients who were able to be in that trial. Another thing to consider, that it often can be logistically easier in certain respects to do a cluster randomized trial but there may be logistical challenges that you're not anticipating. >> Anything else you think we should cover? That was a great point. >> I think overall one thing that is not well addressed by simple cluster designs is the possibility of an underlying secular trend.
In our cluster randomized trial, Instinct, that Philip Scott was a PI in, when we were looking at TPA treatment in hospitals across Michigan, we did a straight up cluster randomized trial. There was the same crossover period basically, or very similar. And there were some hospitals that were not intervened upon. Those hospitals that we intervened on tended to get better at using TPA but the control hospitals got better too because there was an external secular trend in that Medicare started paying hospitals more to treat patients with TPA under DRG-559.
So as such, because both sets of hospitals had this external pressure, TPA treatment was going up in both our groups and we weren't anticipating that when we designed the trial. One design that is a derivative of cluster randomized trials that can help mitigate this somewhat is the step wedge design where there are basically different crossover periods at each hospital. And some hospitals may be in the control group for almost all of the time and contribute more observations in the pre-intervention period, and some hospitals may be intervened on very close to the beginning.
And while this does not completely eliminate problems that you might see with a secular trend, it helps you better estimate it because you have some hospitals that were sort of in the control condition for a longer period of time. That can sometimes help because there are a lot of good reasons in healthcare why we may be getting better in certain spaces. And if there are those external things going on, it limits our ability to learn in the cluster randomized trial whether our intervention is what's likely to be causing things.
There tends to be a few more of those questions when you use something that is an individual patient level randomization. One way to address the secular trend can be the use of stepwedge design. >> So we have a whole other chapter on stepwedge that we did. But could you tell us briefly how you analyze such a trial? >> Sure. In many respects it's quite similar to the cluster design in that you have to account for the thin cluster variability. In addition, each site is going to have a before and after period.
It's a standard regression type analysis and there needs to be an indicator variable for before and after the intervention was given. And sometimes you build in a period where you don't count the observations, a peri-intervention period, where the hospital is in the setting of crossing over. And depending on whether it's a binary outcome or a continuous outcome, a variety of different techniques can be used to potentially account for the underlying secular trend.
Depending on how many clusters you have in the stepwedge and how many observations within each cluster, it would give you some sense as to how much of a secular trend you might be able to detect. But knowing that, you know, you are expecting there to be a difference in the pre- versus post-period, the idea of accounting potentially for the secular trend is to see if this sort of slope of the line was rapidly accelerating before the switch. And if you think of kind of like what a graphical analysis of that might look like, there's like a straight upward line before and after.
That may show positivity on pre/post, but what you would really expect to see if your intervention was effective would be - and if there wasn't a secular trend -- would be a relatively flat line, a very steep jump to new baseline, and then another flat line in the new state where the post-intervention sites have changed to better practice, or the new practice, as opposed to smooth, straight line that's up which might represent more a secular trend. The insides and innards of what the statistical models look like can be complicated and obviously ought to be done in conjunction with experts in that methodology.
But in general, that's what you're trying to address with the stepwedge is if there was a constant upslope, can you detect that and can you differentiate how much that upslope was accelerated by what you did at those sites? >> That's a good explanation. >> I would say one other thing. If you're looking at a cluster randomized trial or you're thinking about planning one, is usually you don't have the resources to just have hundreds of clusters. So one thing that can be important and often can be a good practice is -- and this is what they did both in the Instinct trial I referenced before and the Restore trial -- is you group your hospitals into similar types.
And in Instinct, we grouped our hospitals into small and big hospitals, hospitals in the rural areas and the hospitals that were in the urban areas. And when we were randomizing to intervention versus control, we were picking within these matched sets of hospitals that had similar characteristics. So we didn't get an imbalance in that if in our half of the hospitals that were in our intervention group, if there were a lot more urban hospitals, then that would introduce bias into our study.
Similarly, in the Restore trial they divided the hospitals into three groups of small, medium, and large pediatric ICUs. And then within each of those groups they basically randomized half of them to each. And this is not dissimilar to the idea of doing stratified randomization in an individual level, randomized control trial, at the patient level. Like, say, stratifying by gender. And this way you get more balance between your clusters which can help take out the possibility that inherent differences across the hospitals were an explanation for your finding.
>> [Music beginning in background] That's a good point also. Dr. Moyer, thank you so much for talking with us today. More information about the JAMA Guide to Statistics and Methods is available on our website, jamaevidence.com. There you'll find a complete array of materials that will help you understand the medical literature. There's also a series of educational guides for all the content found on jamaevidence.com. Once again, I'm Ed Livingston, Deputy Editor for Clinical Reviews in Education at JAMA and coauthor of the book, JAMA Guide to Statistics and Methods.
I'll be back with you soon for another episode of JAMAevidence. For more podcasts, you can visit us at jamanetworkaudio.com and you can subscribe to our podcast on Stitcher, Apple Podcasts, or wherever you get your podcasts. Thanks for listening.