Name:
Confidence Intervals: Gordon Guyatt, MD, discusses confidence intervals.
Description:
Confidence Intervals: Gordon Guyatt, MD, discusses confidence intervals.
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/b1fc46fd-6d5c-47d7-91d2-b5b20a333247/thumbnails/b1fc46fd-6d5c-47d7-91d2-b5b20a333247.jpg?sv=2019-02-02&sr=c&sig=qkfonm5tNUiRl%2FfbUW2kca31%2B7Mcq7YXXOESewDWeG4%3D&st=2022-05-27T18%3A47%3A30Z&se=2022-05-27T22%3A52%3A30Z&sp=r
Duration:
T00H15M56S
Embed URL:
https://stream.cadmore.media/player/b1fc46fd-6d5c-47d7-91d2-b5b20a333247
Content URL:
https://asa1cadmoremedia.blob.core.windows.net/asset-eb9ddc39-339b-4907-ab15-5a7ad3e5a9fa/6830496.mp3
Upload Date:
2022-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
>> I'm Joan Stephenson, Editor of JAMA's Medical News and Perspectives Section. Today, I have the pleasure of speaking with Dr. Gordon Guyatt about confidence intervals, a concept that is important in interpreting the results of clinical trials and a topic that he and his co-authors cover in Chapter 8 of Users' Guides to the Medical Literature. Dr. Guyatt, why don't you introduce yourself to our listeners? >> I'm Gordon Guyatt. I'm a distinguished Professor of Medicine at McMaster University, a Clinical Epidemiologist, and clinically I practice hospital internal medicine.
>> Dr. Guyatt, what is a confidence interval? >> Well, in understanding a confidence interval it's good to think about the two major reasons that studies may mislead us. And one is called bias or systematic error. And if the study is not well designed, it may get a result that's systematically different from the truth. The other reason even a very well-designed study may mislead us is chance. When sample sizes are small, things may go quite differently from the truth simply by the play of chance.
And we can understand that in our day-to-day lives by thinking of flipping a coin. So, if I had a coin and I was unsure whether it was a biased or unbiased coin, I could flip it 10 times and if it ended up five heads and five tails would you be certain it was unbiased? The answer is probably no. You would say five and five is not enough. And if I flipped it 10 times and it was eight heads and two tails, would you be sure it is biased?
And the answer is probably no, because simply, if it were unbiased and in the long run you had half heads and half tails on 10 coin flips, it could be eight and two and it's still quite possible that it's unbiased. By the time we flipped it 1,000 times and you had 800 heads and 200 tails, you would be pretty sure that this is a biased coin. So, the first thing to understand is the concept of chance.
Let me give you another example to try and make that concrete again. I have a medical school class of 200 people and I wonder how many men and how many women are in the class. I randomly pick 10 people and they're five men and five women. Are you confident that there's 100 men and 100 women in the class? No, you aren't because your sample size is too small. So, the first thing when thinking about confidence intervals is that confidence intervals have to do with being concerned about how chance may mislead us with a particular sample chosen.
So, a confidence interval usually is around a treatment effect and it has to do with what we call the range of plausible truth. So, let us say I do a study and in relative terms, the treatment reduces the risk by half. So, the control group has a 10% event rate and the treatment group has a 5% event rate, a 50% reduction in relative risk.
With a small sample size, that relative risk reduction may be as small as no relative risk reduction at all or even the possibility of harm or a very large relative risk reduction, say 90%. So, the confidence interval then represents the range of plausible truth. Given the result we've observed, how small might the treatment effect still plausibly be or how large might the treatment effect plausibly be?
If a large sample size, we may see a 50% relative risk reduction, but with a very large sample size we may have a very narrow confidence interval that tells us if the study is unbiased the truth is somewhere between a 40 and 60% relative risk reduction. So, the bottom line is that the confidence interval deals with our concerns about random error or chance and being misled by random error or chance and it represents the range in which the truth plausibly lies.
>> What role do confidence intervals play in hypothesis testing and how do they help clinicians address the limitations of hypothesis testing? >> So, traditionally, clinical trials often focused on a hypothesis testing approach, which simply asked the question, can chance explain the findings of the results? And in what is actually quite a simplistic approach, we said, if the P-value is less than 0.05 it means treatment works and if it's greater than 0.05 the treatment doesn't work.
Well, there are a number of problems with that but one is that a treatment could work and the effect could be very small or the effect could be very large. And our treatments invariably have undesirable consequences. The undesirable consequences may be side effects, the undesirable consequences may be burden, the undesirable consequences can be cost. And the P-value, significant or non-significant, tells us nothing about the magnitude of the effect and it tells us very little about our confidence in the estimate of the effect and the range in which the truth might plausibly lie.
So, a P value of less than 0.05 might be a tiny treatment effect that is not worth it or it might be a very large treatment effect. And it might be one in which we are very confident of the particular estimate or not at all confident. And moving to a different way of looking at things than hypothesis testing, the way of looking at things where we use confidence intervals is called estimation.
So, we're not trying to say does it work, does it not work? We're trying to ask what we believe is the much more important question of what is our best estimate of the treatment effect and what is the range in which the truth plausibly lies. So, a P value of less than 0.05 could mean a 50% relative risk reduction with a confidence interval from 40- to 60% or a 50% relative risk reduction with a confidence interval that is low is only a 5% relative risk reduction.
We'd be much more confident about the first than the second. It could also mean only a 5% relative risk reduction could also give a P value less than 0.05. So, the estimation approach where we have the best estimate of the treatment effect and then the range in which the treatment effect may still plausibly lie, which is the confidence interval, is much more informative than the hypothesis testing mode of thinking.
>> You mentioned sample size earlier. What is the relationship between confidence intervals and sample size? >> Well, the larger the sample size, the narrower the confidence interval. The larger the sample size, the more precise the estimate, the more you can be confident that the truth lies close to the point estimate. And in situations in which we're talking about what we call binary outcomes, yes or no, patients either die or don't die, they have heart attacks or they don't have heart attacks, they either have strokes or don't have strokes, typically, we will need hundreds, or even more typically, thousands of patients before the sample size is sufficiently large that we get a confidence interval that is narrow enough that we can really be comfortable about the magnitude of the effect.
So, the messages are the bigger the sample size, the narrower the confidence interval, the more precise our estimate, the more we are confident that the truth lies close to our point estimate and typically we need with binary outcomes quite large sample sizes either in individual studies or nowadays more likely in systematic reviews and meta-analyses where you pool across studies to get the sample sizes you need for a narrow confidence interval, that is the relationship between sample size and the confidence interval.
>> Dr. Guyatt, how can clinicians use confidence intervals to interpret the results of clinical trials? >> The way we suggest using confidence intervals is to think of the upper and lower boundaries of the confidence interval and see what clinical action you would take depending on the boundaries of the confidence interval. Let me use the example of anticoagulant therapy in patients with atrial fibrillation.
Let us say that you have an appreciable risk of bleeding as a result of use of anticoagulants and you are concerned whether the reduction in risk of stroke is sufficiently large. And let's say in the course of a year you have a set of trials and the pooled estimate is a 3% reduction in stroke. So, in 100 people, three people who would have had a stroke will not have a stroke because of your anticoagulation. Well, even with appreciable bleeding risk, people are sufficiently stroke averse that they would say 3% probably it's worth using the anticoagulation.
And then, you would ask, would you still do it if it was a 2% reduction in stroke? Well, maybe yes. What about a 1% reduction in stroke? Well, maybe not. So, say your point estimate was a 3% reduction in stroke; you need to look at the confidence interval. The confidence interval is from 2% to 4%. At 2%, would the patient still be interested in the anticoagulant therapy? Answer is yes; your confidence interval is narrow enough.
Let's say the confidence interval extends to only a 1/2% reduction in risk. In that case, in 200 people that you would treat, you would only save one stroke. If that is the case and the patient under those circumstances would say no thank you, then the confidence interval is too wide. So, what you need to do is in an effective treatment look at the lower boundary of that confidence interval and say, if that lower boundary were true, would I still be interested, and would my patient still be interested given its undesirable consequences in using the treatment.
>> How does the use of confidence intervals vary when interpreting a positive trial versus a negative trial? >> The example that I just gave you was the positive trial. So, in the positive trial, you have the best estimate of the effect, and let's assume that it is big enough that you would use the treatment if that were the truth where the patient would be interested in the treatment given its undesirable consequences if that were the true benefit. You then look at the boundary of the confidence interval closest to no effect and you say, if that were the truth, that lower boundary of the confidence interval, the range of plausible truth, would I still be willing to use the treatment?
Would my patient still be interested in the treatment? And if the answer is yes, then the confidence interval was narrow enough. If the answer is no, then one is less certain and that may influence the patient's decision making. What about a negative study that failed to show a difference between treatment and control? Well, let's assume that the point estimate is no effect at all and the question then is can we conclude that this treatment is certainly useless and we needn't study it anymore.
Now, the boundary of the confidence interval that is important is the boundary that represents the largest plausible effect. With a very large study, that boundary of the confidence interval that represents the largest plausible effect may be a 1% reduction in the bad events you're trying to prevent and that you might consider as trivial in which case the study has excluded important benefit and we needn't do any more trials.
If on the other hand the boundary of the confidence interval consistent with the biggest effect that might still be true is 5% and that was important to people, then the study has not excluded an important treatment effect and we need further studies before we exclude it as a possibly beneficial treatment. >> Is there anything else you would like to tell our listeners about confidence intervals? >> Only to emphasize that when you are thinking of chance and statistical power, people get very easily confused and understandably confused by sample size calculations and things like alpha and beta error.
Fortunately, as clinicians, we don't have to worry about any of those things. All we need to do is to understand the concepts that I've tried to lay out here about confidence intervals. Remember that they represent the boundaries, the range of plausible truth, and think, if the point estimate were true, what would the best management of the patient be? If one end of the confidence interval were true, what might the best management be?
If the other end of the confidence interval were true, then what would be the best management? And if both boundaries of the confidence interval and the point estimate all lead to the same management, one can be confident that the trial enrolled a sufficient number of patients. If clinical action would differ at the boundaries of the confidence interval, we then are considerably less certain and that may influence what the best course of action is for the patient before us.
>> Thank you, Dr. Guyatt, for this overview of confidence intervals. For additional information about this topic, JAMAevidence subscribers can consult Chapter 8 of Users' Guides to the Medical Literature. This has been Joan Stephenson of JAMA talking with Dr. Gordon Guyatt for JAMAevidence.