Name:
JAMAevidence - Ana Carolina Alba, MD, PhD, discusses the Users’ Guide on discrimination and calibration of clinical prediction models.
Description:
JAMAevidence - Ana Carolina Alba, MD, PhD, discusses the Users’ Guide on discrimination and calibration of clinical prediction models.
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ba8fcc74-d285-4b84-b114-d138ae2d1de6/thumbnails/ba8fcc74-d285-4b84-b114-d138ae2d1de6.jpeg?sv=2019-02-02&sr=c&sig=DP6cu9Szaglw7Xwqr%2F3VzVd4CWMmsNZDJRvx%2F01EiBo%3D&st=2025-01-15T07%3A31%3A09Z&se=2025-01-15T11%3A36%3A09Z&sp=r
Duration:
T00H16M45S
Embed URL:
https://stream.cadmore.media/player/ba8fcc74-d285-4b84-b114-d138ae2d1de6
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ba8fcc74-d285-4b84-b114-d138ae2d1de6/alba_cut.mp3?sv=2019-02-02&sr=c&sig=JndReKOdbGd4xUx7t5hQz83X%2ByJfYrmTWVgy7h967qM%3D&st=2025-01-15T07%3A31%3A09Z&se=2025-01-15T09%3A36%3A09Z&sp=r
Upload Date:
2022-10-03T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[ Music ] >> Hello. I would like to welcome you to the JAMAevidence Users' Guide to the Medical Literature Podcast. I'm your host, Gordon Guyatt. I am a Distinguished Professor of Health Research Methods at McMaster University in Hamilton, Canada. And today I'm being joined by Dr. Carolina Alba. She's an Associate Professor in the Heart Failure and Transplant Program at the Ted Rogers Center of Health Research at the Toronto General Hospital, part of the University Health Network in Toronto.
Dr. Alba was the lead author who led us in looking at prognosis and outcomes in the Users' Guide and that's what we're going to be talking about today. Dr. Alba, welcome. >> Thank you, Gordon and JAMA, for providing us the opportunity to share results and explanation of this study. >> So let's jump right in now. What's this article about? >> So this Users' Guide help clinicians to understand that they have a metrics for assessing model.
They get performance in terms of discrimination, calibration, and the performance of different prediction models. Informing patients about their prognosis is part of a daily dialog between physicians and patients. However, assessing prognosis is very complex because multiple factors interplayed in their future events. But luckily for physicians and patients, predictive models can assist them in this activity. The reason why variety of models that can be applied to estimate within different diseases and the accuracy of these models is very diverse.
So the ultimate goal of this guide is to help clinicians to make optimum use of existing predictive models. >> So you're going to tell us more about how to make the best use of predictive models. But why should clinicians bother in the first place to use models? As a matter of fact, why should they be so interested in prognosis? >> So accurate prognostic information is vitally important for patients and physicians to make optimum health-related and life decisions.
For example, if the patient has a very low risk of future events, the absolute benefit by an effective benefit may be very small in relation to the potential harm, burden, or cost of that medication. Among higher risk patients the opposite could be true. For example, the same treatment may offer substantial benefit. So providing accurate prognostic information can help patients and physicians in making this shared decision making and preventing testing or using costly and risky therapy in low risk patients, and avoiding delays in treatment or the use of effective therapies in patients who are at very high risk of events and would benefit from applying for effective therapy.
>> Okay. You've done a great job of telling us why clinicians should be interested in prognosis. And in the introduction, you told us that we're going to be focusing on prognostic models. But prognostic models aren't the only ways to estimate a patient's prognosis. Can you take us through what the options are as to how clinicians can go about estimating a patient's prognosis? >> So there are very different ways to estimate prognosis.
And they are associated with advantages or disadvantages. One may be to use just the physician judgment or patient intuitive estimate of their own risk. This has proven to be very limited. For example, from the physician point of view in general, we tend to overestimate risk substantially. So a second way could be to take the estimate or the average risk from observation and the status. For example, there may exist a registry describing prognosis, or the risk of future events in patients with these specific diseases.
We may take the risks from these registries. However, this registry many times fail to report risk across different patient characteristics. So a physician could apply the effect of different factors are known to be associated with a risk of future events to this average risk ways to try to estimate how some patient characteristics, for example, age, may change this average risk. But a better way is to use predictive models that combine the effect of these multiple risk factors into a single estimate, and by applying the mathematical formula behind this thing, these risk models can provide the absolute risk of future events for a patient with a combination of different predictive factors.
So we call these mathematical equations prediction or prognostic rules, rights, or models. >> You have told us that physician intuition is one way to estimate prognosis. But physician intuition is often misguided. You've also told us that there are observational studies. But observational studies have considerable limitations. And you've implied, and I think you are right, that the best way for clinicians to estimate prognosis is to have a model available.
But some models may be preferable. Some may be very satisfactory. And some may be unsatisfactory. How can clinicians differentiate a good model from a model that they should stay away from? >> The ideal model correctly identify every single patient who is going to have an event from every single patient who was not going to have an event without misclassifying any patients in this exercise; however, this model does not exist, unfortunately.
So to the extent to which a model comes close to achieving this goal can be characterized by two main properties. One is discrimination and the other one is calibration. So discrimination refers to how well the model differentiates high from low risk patients. So discrimination depends on the distribution of patients' characteristics. So it has two main limitations. One is the model can be [inaudible] very well in very heterogeneous population with widely different values of the predictors included in the model.
For example, if a model relies only on age, the model will differentiate patients very well if the age range of that population is very wide; for example, from 20 to 90 years of age. However, the model will not be able to perform very well if the age range is very narrow; for example, only including patients between 50 and 60 years. And the other problem with discrimination is that a model which have very good discrimination and tell us that a patient is at higher risk of having an event in comparison to another patient who may be at lower risk, however, that does not tell us anything about the absolute risk.
So a model could predict that the risk of a patient is one percent versus the other one who is at higher risk is two percent, and that can show very good discrimination but in fact, when we follow the patients for some period of time, we observe that the true risk was 10 and 20%. So the model absolute risk prediction was very poor, very limited. And it does not help us to make decisions. So this brings us to the second most important characteristic, which is calibration.
Calibrations tell us how similar the basic risk [inaudible] from a model is to the true risk in a group of patients classified as different risk strata. To the extent that the estimates are accurate, we say that the model is well calibrated. >> You have identified the two key characteristics Discrimination, which tells you whether my risk is greater than your risk, and calibration, which tells you if the model not only says my risk is greater than yours, but my risk is two percent and yours is one percent.
If those are really the right numbers, or if it's 20% and 10%, where mine would be still twice as much as yours, good discrimination, but the calibration would be very off because the risk would actually be tenfold higher than the model tells you. So we've identified those two key characteristics, discrimination and calibration. Now the clinician is looking at the model. What tests will they find to assess discrimination, and how should they use the tests of discrimination in deciding whether or not to use the model and how to use it?
>> Discrimination can be assessed in different ways. For a binary outcome the commonly reported metrics [inaudible] statistic for [inaudible]. For example, these metric will tell us that after taking all possible [inaudible] of patients and comparing the predictive probabilities across all the different predictions, if the model cannot discriminate at all between patients who have events from those who did not have event, this C statistic, or ROC, would be closer to [inaudible] 0.5, which means that the model is not better than [inaudible].
If a models always produces a higher priority for patients having events in comparison to those who are not having events, the C statistic will be one. And that means perfect discrimination. Usually the C statistic will fall between 0.5 and one. It is very rarely almost one. So there is a generally accepted approach, suggesting that if the receiver operating characteristic curve, or ROC curve, the C statistic is less than 0.6, the model reflects very poor discrimination and may not add to the prediction and it may not have any clinical utility.
However, if the C statistic is between 0.6 and 0.75, it is possible the model may help to guide care if used in clinical practice or [inaudible] to inform patients about their prognosis. If the C statistic is higher than 0.75, there is a general understanding that the model has good discrimination and can provide useful clinical information. Such thresholds are just arbitrary and the concept of the C statistics is very hard to apply clinically because it doesn't take into account the consequences of the misclassification.
For example, a model may misclassify patients who have events at lower risk, or may be more frequently misclassifying patients who did not have an event as having higher risk. So the consequences of misclassifying patients with event or non-event would be different and the C statistic does not take that into account. And that should be something that should be taken into account. And there are different metrics to do that but they are currently still developing and still hard to apply in a clinical setting.
But those are general rules that I think physician can follow to help to understand where the model provides a good discrimination. >> So you've told us how to identify good discrimination. This C statistic, or area under the ROC curve, and if we're over .75, we're in good shape, except we may not be in such good shape because you've also told us that the discrimination can be good but the calibration may not. So how can physicians identify well calibrated models?
>> So calibration is the most important property of a model. And usually unfortunately it's under reported. So assessing calibration can be done in two different ways. One is for the whole population which we refer to that as [inaudible] or mean calibration. And the other one that is probably the most accurate and important one is to report calibration at different risk strata. So [inaudible] could have very good average calibration but could be miscalibrated in the extreme; for example, in low- or high-risk patients.
And if we don't report calibration at close different risk strata, we will miss that important information. Calibration could be excellent for some patients, for example, could be very good for patients who are at low risk of an event, for example, lower than 10%, but the model whose significant overestimate risk in those who are at higher risk than 10%. We may want to know that information depending on the risk threshold that we are going to use to make a clinical decision. So informing calibration across different risk strata is the most important property of the model and should always be reported and looked for while we are assessing the performance of the model.
>> Okay. Great. Now you've shown us how to identify a model that we might consider using. And the discrimination looks pretty good, area under the ROC curve of .75. And the calibration looks pretty good. But now we have two models that look pretty good. How are we going to choose between the two of them? >> Oh, that's a very difficult question to answer. But in general terms, a physician could quite qualitatively compare discrimination and calibration of the two models that are being compared.
So if a model shows their discrimination and their calibration by the comparison, it's an easy pick. We know that the new model or the second model is better than the one that show worse discrimination and calibration. However, sometimes these two metrics are very close or one is showing better discrimination and worse calibration and vice versa which makes it difficult for physicians to choose. But our main different is statistical techniques that can be used to compare the performance of two models.
One is called risk reclassification analyses and classification analyses can be summarized in different metrics. Some of them even consider the weight of misclassifying patients with or without events and makes the selection between two models much easier based on the possible consequences of one versus the other misclassification. So there are different metrics that can be used. And depending on the discrimination or calibration of our model, we may report one of them or one of them could provide more useful information when applying a model to a clinical setting.
>> So let me see if I've understood your key points. Number one, when we are considering whether to use a predictive model, the first thing is is it worth the effort? Is the patient's prognosis something we need to know? The patient is interested in it. We need the information to best manage. If we're in that situation, we may do it by our intuition but we're liable to be misguided. We may pick a single observational study but that probably is not the best way.
If there is a good model with adequate discrimination and calibration that's probably where we should go. Have I got it right? >> Great summary, Gordon. I think that one point to highlight is that if the recent model that has good accuracy, the most important metric to use to decide whether or not to use it is calibration. >> Excellent additional point to make. Thank you very much for joining us. Very really a crucial aspect of medical care, the assessment and prognosis which we may perhaps don't pay as much attention as we should.
And there are a lot of models now coming up to help us focus on prediction and you've shown us how to make the best use of them. So thanks very much for joining us. >> Thank you, Gordon, and thank you, too, JAMA, for the opportunity. >> This episode was produced by Shelly Steffens at the JAMA Network. The audio team here also includes Jesse McQuarters, Daniel Morrow, Lisa Hardin, Audrey Forman, Mary Lynn Ferkaluk. Dr. Robert Golub is the JAMA Executive Deputy Editor.
To follow this and other JAMA Network podcasts, please visit us online at jamanetworkaudio.com. Thanks very much for listening. [ Music ]