Name:
Gordon Guyatt, MD, discusses how to use a subgroup analysis.
Description:
Gordon Guyatt, MD, discusses how to use a subgroup analysis.
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c7456b19-b760-40bf-867d-b24a7b75d1f9/thumbnails/c7456b19-b760-40bf-867d-b24a7b75d1f9.jpg?sv=2019-02-02&sr=c&sig=pXOdGj9seaAjvtdQxMoIC40%2B69e%2FproZGl0zZSstBdI%3D&st=2022-12-04T13%3A06%3A03Z&se=2022-12-04T17%3A11%3A03Z&sp=r
Duration:
T00H09M57S
Embed URL:
https://stream.cadmore.media/player/c7456b19-b760-40bf-867d-b24a7b75d1f9
Content URL:
https://asa1cadmoremedia.blob.core.windows.net/asset-a75fad75-e4d7-48ae-9637-f4738a89e74f/17563805.mp3
Upload Date:
2022-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[ Music ] >> Hello and welcome to this episode of JAMAevidence Users' Guides to the Medical Literature. I'm Ed Livingston, Deputy Editor of JAMA. And today I'll be talking with Dr. Gordon Guyatt about Chapter 25 of the Users' Guides, How to Use a Subgroup Analysis. Dr. Guyatt, can you give me a brief overview of what this is all about? >> Certainly, so clinicians and patients are naturally concerned about the fact that treatments may not work the same on everyone.
So, that a trial can conceivably be positive, but there are people who don't benefit. Or a trial could fail overall to show a benefit, and yet there may be a subgroup who is beneficial. And that's the nature of the concern. Can we look for possible subgroup effects? Subgroup, if only a subgroup of people benefit from the treatment? And if we find them, clearly the people who benefit should receive the treatment and the people who don't benefit should not.
So that's the fundamental nature. So, it's natural and appropriate to look for subgroup effects. But as it turns out, in relative terms, subgroup effects are unusual. And I should distinguish between relative and absolute. So, let's say a treatment reduces bad events by 50%. That could be from 2% to 1%, or 10% to 5%, or 40% to 20%.
So the relative effect would be a 50% relative risk reduction, but the absolute effect would differ across the lower risk population 2% to 1%, 1% benefit, 10% to 5% or 5% benefit, or 40% to 20%, 20% absolute benefit. Well, as it turns out, the way the world works is these relative effects, and the example that I've just stated of 50% relative risk reduction, tend to be constant across subgroups and to be constant across subgroups.
And it works just as I suggested. The two goes to one, the ten goes to five and the 40 goes to 20. So, relative effects tend to be constant across subgroups, while absolute effects, because people are at different levels of risk, tend to differ across subgroups. So, when we talk about subgroup effects, the interesting subgroup effects are if there are differences in relative effect given that they tend to be constant across subgroups. Now, as it turns out, relative subgroup effects of any importance are unusual.
Relative effects tend to be constant across subgroups, so true relative subgroup effects are unusual. So the first fundamental points are distinguishing as clinicians need to do between relative and absolute effects. And the fact that relative effects, subgroup effects are unusual. And so absolute subgroup effects, because people are at different baseline risk, are ubiquitous. But, sometimes relative subgroup effects may be real. It's unusual but they may be real.
And so we need some criteria to identify the relatively unusual situations in which relative subgroup effects are the real thing. And we've suggested some criteria to do that. And those criteria are first the difference between the subgroup, say we speculate that the subgroup effect is, it works in men but not in women, or a lot less in women. We might then say, was the difference in the effect, men and women, be explained by chance?
There's something called the Test for Interaction that does that. We test the difference between men and women. If we get a low p-value, less that 0.05, or ideally less than 0.01, or even less, that raises our suspicion that there's a true subgroup effect. A p-value comparing men and women of 0.1 or 0.2 makes us think this is not a real subgroup effect. We're in a stronger position if the investigators have said before they looked at the data, "We suspect there is a difference between men and women." a so-called a-priori hypothesis.
You make the suggestion before you look. And then it's even stronger if you say, "We suspect the difference may be differed between men and women and we suspect the effect will be bigger in men than women, or will exist in men and not women." And that's the way it turns out. It's also stronger if they haven't made 15-subgroup hypothesis of this sort. So, if the men versus women was the only subgroup hypothesis they suggested beforehand, we're in a lot stronger position than if they've made 15 such hypotheses.
And we're also in a stronger position if they can offer biologic rationale for why the effect is going to be different in men and women. So, bottom lines thus far, we need to distinguish between relative and absolute. Relative subgroup effects are unusual. Absolute are ubiquitous. And we need to be able to have criteria for when we actually might believe there's a true relative effect. And I suggested some of the key criteria.
>> Can you explain a little bit more about interaction and how it's necessary when doing a subgroup analysis? You covered that fairly quickly, but I think that's a key issue that I don't think people understand very well. >> Yeah, it's a key issue. Well, first of all there might be worth mentioning the synonyms for subgroup analysis. So, let's take the men and women again. So, if the effect is men really benefit and women don't, we call it a subgroup effect between men and women.
Another term that is used is effect modification. Your sex modifies the effect. If you're a man it works. If you're a woman it doesn't work. So your sex modifies the effect. And the third term we use is interaction. Interaction means things are different across the subgroups. That your sex interacts with the magnitude of the effect so that the effect occurs in men but not in women.
So, let's say that we see a 50% relative risk reduction in men and we don't see any effect in women. Well, just looking at those data we say my goodness, big effect in men and no effect in women. However, what if there's only 20 men and 20 women in the study, very small sample size. When that happens, that difference between the 50% relative risk reduction in men and nothing in women may well be a chance phenomena with a very small sample size.
And so when we generate a hypothesis, whenever you generate a p-value implies a null hypothesis. The null hypothesis when we talk about a test for interaction is there's actually no difference between the men and the women. In other words, if the null hypothesis were true, the apparent difference between that 50% relative risk reduction and no risk reduction at all is a chance phenomenon. And we generate a p-value that addresses that. So let's say we do have that 50% big apparent relative risk reduction in men and none at all in women.
And we test, could this be due to chance? Well, as I said, there's only 20 patients in the men, 20 men and 20 women, that p-value may be 0.3. In other words, the difference between the 50% in the men, 50% relative risk reduction and none in the women, if there was no true difference would occur simply by chance on 30 out of 100 repetitions of the study. In other words, chance easily explains that different.
Let's say on the other hand there are 5,000 men and 5,000 women, and we have 50% relative risk reduction in the men and zero then in the women, with a large sample size that's a very unusual-- to happen simply by chance. We may have a p-value of 1 in a 1,000. So that test to say does the difference between the subgroup, in this case of men, and the subgroup of women, could it happen by chance? That's called a Test for Interaction.
If there's a lot of men and a lot of women, a lot of people in either subgroup, we might have -- and a big difference in the relative effect, we might have a very low p-value. If the difference is much smaller, say a 10% relative risk reduction in men and none in women, and the sample size is small, our test for interaction will be much higher. A low p-value in the test for interaction makes the subgroup hypothesis much more credible. A higher p-value makes us think, no this is not the real thing.
>> I'm JAMA Deputy Editor Ed Livingston and I've been talking with Dr. Gordon Guyatt about how to use a subgroup analysis. Thanks for listening to this addition of the JAMAevidence podcast. For more podcasts visit us at jamanetworkaudio.com. You can subscribe to our podcast and Stitcher and Apple Podcasts. [ Music ]