Name:
Front Row - AI In Drug Discovery Series II - Part 3
Description:
Front Row - AI In Drug Discovery Series II - Part 3
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ec429c48-9de6-4bb7-b8f2-878e1711e4a5/videoscrubberimages/Scrubber_5.jpg?sv=2019-02-02&sr=c&sig=vPoGAZXzfV9fMzWJEGcKEg43wGCWSfxbby7IuOoRrFA%3D&st=2024-12-21T16%3A36%3A02Z&se=2024-12-21T20%3A41%3A02Z&sp=r
Duration:
T00H24M54S
Embed URL:
https://stream.cadmore.media/player/ec429c48-9de6-4bb7-b8f2-878e1711e4a5
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ec429c48-9de6-4bb7-b8f2-878e1711e4a5/AI In Drug Discovery Series II - Part 3.mp4?sv=2019-02-02&sr=c&sig=yHXrdklucgrH3rneSAce31e6B3RnJmzBeftSOMtHjOk%3D&st=2024-12-21T16%3A36%3A03Z&se=2024-12-21T18%3A41%3A03Z&sp=r
Upload Date:
2023-10-19T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[MUSIC PLAYING]
MALORYE BRANCA: Daphne, thank you so much for joining us today. There's been a lot of excitement about AI. And why now? Why do you think that-- do you think it's coming to fruition? Do you think that things are happening that are more than hype? Or how do you feel about it?
DAPHNE KOLLER: Boy, that's a-- there's a lot to unpack in that question, so let me start with the why now? I think that machine learning has made a tremendous amount of progress in the last decade-- way more than I would have anticipated and across multiple different domains. So I think we're finally in a world where machine learning has demonstrated the promise that had been in place for decades, but we're finally there.
DAPHNE KOLLER: Now, as to machine learning and drug discovery, I think these are much earlier days. And partly, that's because the amount of data that is available for training machine learning models is much more abundant in areas like natural language processing or image recognition. And biological data and chemical data is hard to create and hard to come by.
DAPHNE KOLLER: And so I think that there is certainly a lot of potential. And we're starting to see some very large, although maybe not by the scale of images on the web, but some pretty large data sets out there that enable machine learning to be appropriately done. Now, as to the question about hype, the answer is yes and. I think there's a tremendous amount of hype out there that is often quite hyperbolic and misleading to people in ways that I think are counterproductive.
DAPHNE KOLLER: I think that there is a lot of good solid work and progress that's happening. But if you exceed the work that's happening with hyperbolic promises that are not likely to come true in the coming years-- like, oh, we're going to have 1,000 drugs in the clinic in the next three years. You're not. Drug discovery is really hard.
DAPHNE KOLLER: And so I think it's important to stay balanced-- conveying the promise while also conveying the challenges of what is fundamentally a really hard problem for us to solve, AI notwithstanding.
MALORYE BRANCA: You mentioned the data. So what has changed with the data?
DAPHNE KOLLER: So I think I see change on two fronts. First is on the clinical side. We are seeing more and more high-quality, high-content clinical data that are acquired from people. The UK Biobank is a wonderful example of that, and has unlocked so much value in terms of discovery. We're now seeing similar biobanks even in the US. For example, the All of US project just recently released some of the most early portions of that data set.
DAPHNE KOLLER: We're starting to see the availability of electronic health records. Certainly in the UK, via the connection to the National Health Service, but even here in the US. So I think that the amount of clinical data is growing quite dramatically. And we're only at the beginning of that inflection curve. And oftentimes, that's aligned with genetics, which really unlocks the capabilities for drug discovery.
DAPHNE KOLLER: The other form of data that is becoming more readily available is in vitro lab data. When I started working on machine learning and biomedical data sets-- this was back in the late '90s, early 2000s-- a large data set was one that had 200 samples. And now we have data sets where people are doing single cell RNA-Seq, and your data sets have hundreds of millions of cells that you're sequencing or imaging or whatever.
DAPHNE KOLLER: And in many cases, those are also much more relevant to human biology. So we're no longer doing experiments in yeast cells or even in cancer lines. And so I think that's where we're starting to see the other side of data is on lab data that is much more abundant and much more relevant to human biology.
MALORYE BRANCA: Well, I know that you have a huge deal with the UK. But where else are you getting your data from?
DAPHNE KOLLER: So some of our data is happening via partnerships. So we were very excited in the deal that we had with Gilead in NASH to get access to some of their clinical trial data. The trials, as it happened, were not successful, but the data quality were incredible. And so we were able to extract a lot of insights on the progression of NASH, the genetics associated with NASH progression, in that analysis.
DAPHNE KOLLER: Even though those were not huge data sets by machine learning standards, they were still quite valuable because of the quality and the density of the data, like histopathology images from patients at the beginning and the end of the trial. So that's one place where we're getting data. Fortunately, there are other also public or non-profit organizations that collect data, with the promise of unlocking value for patients in PD, in AD, and in many other indications as well.
DAPHNE KOLLER: So that's one place. The other place is that we're making our own data. One of the big aspects of the build at Insitro is that we built a considerable wet lab infrastructure with automation, with induced pluripotent stem cells, with imaging via microscopy with transcriptomics. And we're generating data at significant scale that are specifically relevant to unlocking our understanding of the biology of the diseases that we're studying.
MALORYE BRANCA: So what would you consider a big enough data set?
DAPHNE KOLLER: You know, people always ask that of machine learning people. And there's no single answer to that because it depends how subtle and complex the problem that you're looking to solve. So if the thing that separates, say, your positives from your negatives, or predicts your quantitative trait is relatively straightforwardly read from the data that you're collecting, you can make do with a few hundred data points.
DAPHNE KOLLER: But if it's a really subtle complex signature in a very convoluted step-- space, which in many cases is true, for example, for chemistry, where the space of chemical compounds, people say, is 10 to the 80 or something like that. And if what you're trying to predict is what makes a tiny little molecule with some a variable conformation bind to a similarly sort of moving pocket of a protein, you may need more data than that.
DAPHNE KOLLER: And that's why we actually created a chemistry infrastructure at scale using DNA-encoded libraries, whose primary purpose is to create data to train machine learning models on binding affinity.
MALORYE BRANCA: So what types of data-- what different types of data are you focused on right now?
DAPHNE KOLLER: Yeah. That's a great question. So I think I briefly alluded to most of the data types that we care about. So in our own wet lab environment where we have both efforts in biology and efforts in chemistry-- in chemistry, as I mentioned, we create data using DNA-encoded libraries that allow us to create incredible scale measurements to what compound binds to a particular protein target.
DAPHNE KOLLER: On the biology side, which is where most of our efforts have gone, we create cellular models of disease based on what are called these induced pluripotent stem cells, which carry the genetics of different people with disease, without disease. And we phenotype those cells using a multitude of high-content modalities-- using microscopy, both fixed microscopy with stains, both live cell microscopy.
DAPHNE KOLLER: We do single-cell transcriptomics. We do multiple other readouts of those cells in order to gain an understanding of how disease genetics might manifest in cellular phenotypes. So that's all great. But ultimately, disease models are only as good as their ability to predict disease in humans. And so the other form of data, as I mentioned at the beginning, is high-content data from human clinical outcomes.
DAPHNE KOLLER: So the part that we really care about a lot, and ties into the deal that we have with the UK, is high-content data from humans. So not just the relatively limited and often subjective ascertainment of disease/no disease, but really something that is measured objectively and with a lot of information about the underlying biology. So histopathology data, which is obtained from biopsy samples, is one incredibly rich source of data.
DAPHNE KOLLER: We found that there is a lot of information in brain MRIs that is getting lost when people summarize the MRI output to one or two kind of summary statistics. There is an increased collection of things like serum proteomics and transcriptomics, which measure molecular data from blood. So all of those are data modalities that we think shed light on underlying biological processes that we can then align to what we see in our cellular data, so the experiments in the cell become translatable to what is likely to happen in the human.
MALORYE BRANCA: But aren't the algorithms sort of the-- where the buck stops? And how have they advanced?
DAPHNE KOLLER: So any machine learning practitioner who's telling you the truth will tell you that in machine learning, 80% of the value is from having better data, and 20% is from having a better algorithm. And a great algorithm on the crappy data set can only go so far. So we invested a lot of effort in data creation and data collection so that we can have really good data sets. And once you have those, then better algorithms can, in fact, unlock value.
DAPHNE KOLLER: And so to your point, once we have those better data sets, we've made a very significant investment in better machine learning models. And so, for example, our live cell microscopy, which is a highlight of our company's technology stack, is a really sophisticated microscope that kind of shines light into cells at different angles on a very quick rotation. Because it turns out that light refracts in different ways, depending on what exactly in the cell it hits.
DAPHNE KOLLER: And while a person can't make sense of that blur, a machine can, using machine learning, create a much higher resolution and higher content readout of what's happening in the cell, on top of which we can then impute things like cellular compartments, and this is-- these are lipids, and these are cell membranes, and all sorts of things that are just really not perceivable by the human eye. So machine learning comes in all sorts of different places for us.
DAPHNE KOLLER: It comes in the raw interpretation of data, as in this example. And it also comes up in looking at cells, for example, or high-content data that comes from, say, patients versus healthy individuals and asking, what is it that makes them different? Do we see a signature of disease that really is capturing the underlying pathogenic processes?
DAPHNE KOLLER: And with that, can we then search via some of our wet lab tools for something that seems to revert that disease signature closer to a healthy state? And so that's really another place where the machine learning comes in is also in creating disease models that are unbiased and rich in terms of capturing a biologic state for the patient.
MALORYE BRANCA: But how do you know that?
DAPHNE KOLLER: How do we know that this is really capturing [INAUDIBLE]?
MALORYE BRANCA: Unbiased.
DAPHNE KOLLER: Well, I mean, it's unbiased because-- well, nothing is ever entirely unbiased. You decide what to measure and what not, and that introduces a certain bias into the analysis. The good news is that with machine learning kind of thinking, you can ask the question of whether what you're measuring in the cellular system is truly predictive of human clinical outcome. And so you can say I've learned a separation. I've learned to characterize a signature, if you will, of healthy versus disease in one subset of patients.
DAPHNE KOLLER: To what extent does that actually predict disease state in a different subset of patients? And that's a question that you can ask that is much-- that allows the machine to prove in some sense that it's learned something that is meaningful, versus having it be something that is necessarily just imposed by human intuition.
MALORYE BRANCA: And what brought you to this field?
DAPHNE KOLLER: So I've been working in this field actually for about a little over 20 years. I got into this field in '99, 2000. So historically, I was actually a fairly traditional, if you will, machine learning person, to the extent that traditional machine learning people were around in the early to mid-'90s. I was one of the first people into the field. But I wasn't interested in biology when I started. It was mostly working-- I was mostly working on more standard applications like computer vision and robotics.
DAPHNE KOLLER: But the data sets that were available to machine learning people at the time were not nearly as interesting as what we have today. They were very small, and frankly, unaspirational. Like, how excited can you get about classifying spam versus nonspam? And so I initially became interested in biology because it was just, first, more technologically interesting, and also more aspirational than some of those other applications.
DAPHNE KOLLER: And then over time, I became interested in the field just in its own right, despite not having any training in biology at the start. And just kind of self-taught myself biology over the last 20-some years. And what was funny is that my lab at Stanford had-- as I started to get more and more into biology had a bifurcated existence. Half my lab did core machine learning, published in computer science venues.
DAPHNE KOLLER: The other half did biology, published in biology journals. My computer science friends didn't even realize I did biology. My biology friends didn't imagine that I was in the computer science department. So it was kind of an interesting entry point into the field.
MALORYE BRANCA: What do you see as the major hurdles for you?
DAPHNE KOLLER: I think that data remains a challenge. Data acquisition, with the right-- I mean, biology is really challenging. You're dealing with live systems. Everything influences them. Someone breathes a different way, it changes the experiment. And so creating high-quality data that is not confounded and that is sufficient in scale remains a challenge. It's certainly something we've spent a lot of effort on and made a lot of progress, but there's a long ways to go.
DAPHNE KOLLER: I think another important challenge in this space is the lack of availability of talent. I mean, that's an issue for machine learning in general. I mean, the war for talent in this space is just unbelievably hard. We need a unique subset of those individuals who are either knowledgeable in, or want at least to become knowledgeable in, biology and chemistry. And so we are drawing from a much smaller pool of talent.
DAPHNE KOLLER: I think that this is a place where academic institutions could be doing a much better job of creating a talent pool of what I call bilingual people-- people who speak computing, and people who also speak biology or chemistry. Those people are really hard to come by. And having a lot more of them I think would completely unlock a tremendous amount of value in the space of what I'm calling digital biology, which is the ability to take a very data-driven lens to understanding of biology and human disease.
MALORYE BRANCA: Well, looking at digital biology, have you seen substantial progress? And, if so, I mean, can you point to anything that would say that?
DAPHNE KOLLER: So I think certainly we've seen a lot of progress in the last few years. I mean, a lot of-- if you come back to some of the examples that I gave around the UK Biobank, and the many, many published papers that have emerged from that, it's all enabled by very considerable computational methods that understand the connection between genetics and a whole array of very diverse, and sometimes quite complex, phenotypes.
DAPHNE KOLLER: If we look at the work that's been happening around understanding of cell biology, and measuring things like at the single-cell level, like the Human Cell Atlas, and understanding notion of cellular state and how cells move from one state to the other-- all of that requires extensive computational methods. So I think there's been a huge amount of progress in this field, broadly construed.
DAPHNE KOLLER: And I would say some early successes, even on the drug discovery side. Although, as we know, in drug discovery the proof is really in-- you put the drug in a person and it works, and that takes years. And so going from a new insight to an approved drug is something that's going to take, I think, a while. But I would say that honestly, if you think about digital biology in the broader sense, even the work that happened during COVID by companies like Moderna and on the one-- and Pfizer BioNTech on the one side.
DAPHNE KOLLER: And on the antibody design, by companies such as Vir and AbCellera, that was really digital design of therapeutic matter. I mean, it wasn't that it was just-- they just took something as it was in nature. There was a lot of fine tuning of the compound too in ways that really thought about it as a digital object. So I think in that respect, even in drug discovery, while not the full promise of machine learning discovered drugs, there was a lot of data science that went into the design of the specific compounds there.
MALORYE BRANCA: What about for you-- Insitro? What are your goals, and how are you going to achieve them?
DAPHNE KOLLER: So we are really looking to use some of these high-content data sets on human biologic state to inform our understanding of human biologic state and how that might manifest in disease. So right now, I believe that our taxonomy of human disease is incredibly, I would say, obsolete. It's derived from clinical symptoms that are not reflective in many ways of the underlying biology.
DAPHNE KOLLER: And they're also-- they're very coarse-grained. They're filtered via the subjective lens of a patient, and oftentimes, and the clinician. So a lot of subjectivity in how you interpret what's actually happening to the patient's body. And so that is-- and that basically means that we're often taking things that are quite distinct biologies and calling them by the same name.
DAPHNE KOLLER: We've seen in oncology how much power we get by understanding that breast cancer is not one thing. It is multiple different things. And each of those is best treated by a completely different therapeutic. And chemotherapy, which is the lowest common denominator is really not very effective compared to these modern-day treatments. We've not done that for human germline diseases.
DAPHNE KOLLER: We've not understood the subtypes. We've not understood the intervention nodes for each of those subtypes. So what we're doing is we're taking a lot of these high-content data sets from both humans and cells, and really uncovering what are the underlying biological processes? And for each of those, what is the right intervention node?
DAPHNE KOLLER: So one of the places that we've done a lot of work specifically in that way is in neuroscience, where we've worked-- these are publicly disclosed in a disease called tuberous sclerosis complex. This is a monogenic disease by and large. It's one of two genes that has a mutation. And we've been able to, using some of our high-content phenotypes, identify a new intervention node that we think is potentially a modulator of the disease.
DAPHNE KOLLER: And that's currently something that we've put into drug discovery using our DNA-encoded library platform. We also have a very exciting partnership with Bristol Meyers Squibb in the area of ALS that uses an extended version, if you will, of what we did with tuberous sclerosis complex, in which what we do is we identify the high-penetrance variants that are known familial drivers of ALS.
DAPHNE KOLLER: Those are clear-- those provide us a clear signal of subpopulations that are actually quite different across the different variants that are drivers, and potentially treatments that might help those subsets of patients. And then with the ability to interrogate the phenotypic landscape using our cell-based systems, we might be able to say-- and that's the purpose of this project-- what are subsets of patients among the sporadic occurrences that are more similar to this familial variant versus that familial variant, so that we can figure out how to expand the set of patients that are treated by each of those interventions.
DAPHNE KOLLER: And so that's the goal of the project. And we've made a tremendous amount of progress towards creating the infrastructure to do that, both in terms of phenotyping, but also in terms of creating what I think is arguably one of the largest banks of ALS-relevant cell lines from humans that has been generated with both what we call isogenic lines, which are a line that is a wild type, and then one that is exactly the same except with the familial mutation introduced, so you can have really a one-to-one comparison without a lot of variability thrown in.
DAPHNE KOLLER: We have well over 100 ALS lines that we have onboarded. And so we are now pursuing our understanding of what does one of those familial variants do? And how can we-- to the cells-- and how can we revert those cells-- in this case, it's motor neurons-- back to a more healthy state?
MALORYE BRANCA: You've explained it beautifully. But can we try to boil it down to how does this change drug discovery?
DAPHNE KOLLER: Most of our drugs currently fail typically in phase two, or sometimes in phase three when people squint at the data and push something that should not have been advanced usually into phase three. And they fail because of lack of efficacy. And the reason they fail is because our understanding of disease biology is very limited. And we use either our intuitive cartoon pathways that are drawn on the board, or sometimes we use animal models that frankly don't get the disease in question.
DAPHNE KOLLER: Animals do not get Alzheimer's disease or ALS. We introduce a phenotypic copy of that into the animal, and then pretend that what we're curing is the disease. But really, what we're potentially curing is some kind of variant that is not translatable to a human. What we hope to do is to use human as a model for human, and really focus on human biology as the basis for our target selection. And hopefully, by doing so reduce the probability of failure, which is currently 95%.
DAPHNE KOLLER: That is the probability of failure. So honestly, if you can reduce the probability of failure from 95% to 90%, you've doubled productivity. So there's a lot of headroom there in terms of how much one can improve. So I mean, are we going to get it-- how high can we go? I don't know. But God, it's worth trying.
MALORYE BRANCA: It certainly is. And thank you so much for your time. It's been a very interesting discussion. And we appreciate your time and your insight. And we look forward to seeing where you guys go.
DAPHNE KOLLER: So do we. Thank you so much, Malorye. This was a fascinating conversation, and thank you for inviting me.
MALORYE BRANCA: It's my pleasure. Take care.
DAPHNE KOLLER: Thank you. You, too. Bye.
MALORYE BRANCA: Bye. [MUSIC PLAYING]