Name:
Lynne Stokes, PhD, discusses sample size calculation for a hypothesis test.
Description:
Lynne Stokes, PhD, discusses sample size calculation for a hypothesis test.
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/2566dd63-2675-4d7a-af26-1fd69e7273b4/thumbnails/2566dd63-2675-4d7a-af26-1fd69e7273b4.jpg?sv=2019-02-02&sr=c&sig=o3t7fO6unaw46HQJtHyjTyB2oTVR5CaZL5AfB8SoaBk%3D&st=2022-05-27T17%3A43%3A09Z&se=2022-05-27T21%3A48%3A09Z&sp=r
Duration:
T00H12M13S
Embed URL:
https://stream.cadmore.media/player/2566dd63-2675-4d7a-af26-1fd69e7273b4
Content URL:
https://asa1cadmoremedia.blob.core.windows.net/asset-a802590a-2d12-4c63-aa8b-9af426fdaba0/18572398.mp3
Upload Date:
2022-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
>> One of the more common reasons papers get rejected from JAMA is because of an inadequate approach to determining the necessary sample size for a study. Sample size determination relies on the minimal clinically important difference, which is the smallest possible difference between groups that's observed as a result of some treatment that one would consider clinically important. Unfortunately, some investigators do a poor job of crafting the minimal clinically important difference making interpretation of their study results problematic.
Sample size determination is dependent on the minimal clinically important difference, and the topic of sample size determination was covered in JAMAevidence by Dr. Lynne Stokes, Professor of Statistical Science at Southern Methodist University, and starting next year, Dr. Stokes will be the Director of the Data Science Institute at SMU. [ Music ] You wrote an article in 2014 for JAMA entitled, Sample Size Calculation for a Hypothesis Test. So this is all about powering studies to determine a sample size, so could you tell us why it's important to perform a power analysis before conducting a study?
>> Yes. One of the most common questions I get from people who are requesting help about statistical issues is sample size because it always seems a puzzle to people. The power can help you decide how big a sample you need in order to have a good chance of seeing a result, if it exists. And the main reason you want to do that is you don't want to waste your resources in conducting an experiment where you don't have a chance of seeing a difference or an improvement in your intervention.
And likewise, you don't want to waste your resources by using a larger sample size than you need to reach a conclusion. So you can make mistakes in both directions, having too large a sample size or too small a sample size. And a power analysis helps you eliminate or at least it helps you to reduce the chance of making either one of those mistakes. >> How do you go about doing a power analysis? >> You need to know a fair amount about the expected results.
And so that's always a little bit tricky because the reason you're doing an experiment is because you don't know all the details. But you need to have some knowledge about how large a difference you expect your intervention to make, or maybe not expect it to make, but how large a difference your intervention would make for it to be important. So that's one of the harder things for a statistician to do by themselves because they don't know the science, and that's where the scientist has to provide insight.
So the smaller the size of the difference you're trying to detect, the larger the sample size you need to detect it. And so you will need to have some guess as to how large a difference is an important difference. So that's referred to as a minimum detectable difference. And whenever you start to make decisions about sample sizes, you will need to have some idea, either based on science or based on practice, how big a difference would be an important enough difference to be detected.
>> What else do you need to know in order to perform a power analysis? >> You need to know a power analysis is connected to a specific question. So if you have a study that has several different endpoints, you'll have to decide which one is the primary endpoint and what kind of characteristic you're trying to detect or measure. So if you want to estimate, for example, the proportion that respond to a particular study, then it's one kind of power analysis.
If you want to estimate the average survival rate, for example, or an average that's another kind of power analysis. So you'll need to know exactly what it is -- what the hypothesis is that you're trying to test. You'll need to know the minimum detectable difference. And if you're trying to measure an average, a difference in means from a new intervention, for example, you will also need to have some information about how much variability between patients you expect to see in your population.
You're also -- you may be asked -- so a power is about a probability of being able to detect a difference. So you will also be asked, what is a probability that's acceptable to you? So power is the probability of detecting a difference if a difference of a certain size, the minimum detectable difference, exists, and so you'll select the sample size to give you a specified power.
>> Probably one of the most important problems we face when processing manuscripts at JAMA is an inappropriately designed or poorly defined minimal clinically important difference or MCID. That's the same idea as what you're calling the minimal detectable difference. But instead of just finding the smallest difference one can detect in clinical medicine, we're concerned about the smallest difference between groups that's clinically important. How should one go about determining the minimal clinically important difference? >> Well, it's really not a statistical issue.
It's more about the science. So it could be either something that is clinically important, meaning that, well, in the article there was discussion of a paper about smoking cessation. And the point was made that if you decreased the success rate for a smoking cessation intervention by one-tenth of 1%, no one would care much because it wouldn't have much effect on your clinical practice.
So that's one thing you can use is that you need to -- your minimum detectable difference should be something that would matter in practice if, in fact, the intervention was successful, an amount that would make a real difference in practice. [inaudible] trying to determine a reasonable minimum detectable difference is one that's based on the science. So if there's some other intervention that, let's say, go back to the smoking cessation, that increased the success rate for -- of the intervention for smoking cessation by 16%, let's say, then that -- you might like to have a new intervention.
You might like to find one that reduces it by somewhat less than that. So maybe you would try 14% or 12%, something that would be an improvement over what exists already. So it can either be based on the science or it can be based on practice. Now, if there's no competitive intervention, then, of course, the science will be harder to come by, so you'll have to use judgment. But that is the most difficult thing on a power analysis is try to figure out what would I consider to be success of my intervention.
What is the smallest amount that can be that I would be happy to call it a success, either for practical reasons or scientific reasons? >> Another common thing we see at JAMA is studies being under powered. Can you explain what under powered means? >> Well, the term under powered is usually used simply to mean the sample size was so small that you're not able to reach statistical significance. And often, what happens is that you might get a point estimate, an estimate itself, of let's say, the difference in the proportion of patients responding that seems like the estimated difference is, let's say, a positive result, and yet, it does not reach statistical significance.
And what that means is that your sample size is small enough that that difference in the estimate could have simply happened by chance even if the intervention made no difference at all. So that will happen when the sample size is small that the variability of the estimate is so large that it could have happened by chance. So the way to solve that is to have a larger sample size. That also happens sometime when the results are more variable than you predicted them to be.
So if patients, if you're measuring something that is a continuous response like survival time or something like that, if patients vary more from patient to patient, if their survival times varies more than you had anticipated that it would that also makes it harder to reach statistical significance. And so those are -- that is a problem that could happen if you underestimated how much variability you expected your sample to have.
>> I've seen people do power analysis after they completed their study to determine if they had a big enough sample size. Do you think that's appropriate? >> Usually -- well, that doesn't usually make any sense. Because if you did a study and you got statistical significance, you don't need to calculate what's the probability that you would get statistical significance because obviously, you did. And so that's a moot point. If you did not, the problem there is if, you know, if you did a study and you were very confident that this new intervention would work but then you did your study and you didn't get you didn't reach statistical significance, sometimes people want to do a power analysis afterward.
Well, what you can do is say that if the you can't take the estimated difference and use that as your minimum detectable difference. What you could do is say that if the true difference was a given amount, then I would have been unlikely to have reached statistical significance with the sample size I had. But, you know, that's really just kind of speculation at the end. You should have done that calculation at the beginning on what the minimum detectable difference would be based on science not based on the actual result that you got.
I will say that the most challenging thing for most consultants is to drag out of the client what is the minimum detectable difference. Sometimes people don't think of the science that way, and they think of it as a statistical issue, and it's really not a statistical issue. And a statistician can't help you with it because you're the expert on the science. >> That wraps up this episode of JAMAevidence and the JAMA Guide to Statistics and Methods.
I'd like to thank our guest, Dr. Lynne Stokes for joining us today. You can find all of our JAMAevidence audio at jamaevidence.com and you can find all of the JAMA podcasts at jamanetworkaudio.com. You can subscribe and listen wherever you get your podcasts. Today's episode was produced by Daniel Morrow. Our audio team here at JAMA includes Jesse McQuarters, Shelly Stephens, Lisa Hardin, and Mike Berkwits, our deputy editor for electronic media here at the JAMA Network. Once again, I'm Ed Livingston, deputy editor for clinical reviews and education for JAMA.
Thanks for listening. [ Music ]