Name: Keynote: David Baker on AI in Protein Formation

Description: Keynote: David Baker on AI in Protein Formation

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/cdaab88e-d0c5-436e-9458-01df7d74d3ae/videoscrubberimages/Scrubber_2.jpg?sv=2019-02-02&sr=c&sig=MxZbKB2lVBAbhPddYZRrAvfatviMyFsHSqeWdgxpqJ4%3D&st=2026-01-08T21%3A22%3A47Z&se=2026-01-09T01%3A27%3A47Z&sp=r

Duration: T00H38M00S

Embed URL: https://stream.cadmore.media/player/cdaab88e-d0c5-436e-9458-01df7d74d3ae

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/cdaab88e-d0c5-436e-9458-01df7d74d3ae/6_SOB_AIProteinFormation_ONDEMAND.mp4?sv=2019-02-02&sr=c&sig=ti6fXBYr6MpElBe7bJ%2FF3m9jXHk0bkGKpzX90toq2xM%3D&st=2026-01-08T21%3A22%3A49Z&se=2026-01-08T23%3A27%3A49Z&sp=r

Upload Date: 2023-11-21T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .
[MUSIC PLAYING]
JULIANNA LEMIEUX: We're thrilled to welcome David Baker to the Summit for his keynote presentation. David is a Professor of Biochemistry at the University of Washington, an HHMI Investigator, and the Director of the Institute for Protein Design. He and his lab have pioneered methods to predict and design the three-dimensional structures of proteins that have the potential to not only touch, but transform many, if not all, areas of biology.
JULIANNA LEMIEUX: But it's not just proteins that the Baker Lab has set their sights on. It's also the design of molecular switches, vehicles for targeted intracellular delivery of biologics, and much more, maybe even something about an artificial nose we'll hear more about today. David has won multiple awards himself for de novo protein design.
JULIANNA LEMIEUX: And in 2021, AI-powered Predictions of Proteins won Science's 2021 Breakthrough of the Year Award. We're thrilled to hear more about all of this work. So now, David, I'll turn it over to you.
DAVID BAKER: Well, thank you very much for that introduction. And I'm very happy to be here. So today, I'm going to tell you about-- give you an overview of where we are on de novo protein design. And we've been working on this for a number of years, for most of that time, using physically-based models based on the idea that proteins fold to their lowest energy structures. We developed the Rosetta software with our colleagues basically design proteins whose sequences are predicted to fold to the design structure.
DAVID BAKER: And I'll give you examples of that today. More recently, as you all know, deep learning has really transformed protein structure prediction. And we have been working hard to apply deep-learning methods to the inverse problem, which is protein design. And we've made quite a bit of progress. And I'll tell you about where we are now on deep learning based protein design. So it's an exciting time for protein design.
DAVID BAKER: We have our first de novo designed medicine, which is actually in use in people. That's a COVID vaccine designed by my colleague, Neil King, and his colleagues. It's a self-assembling nanoparticle with the spike protein RBD on it. And we have a number of other medicines that are now in human clinical trials. So hopefully, we'll see more of those in the clinic soon.
DAVID BAKER: I'm quite optimistic about what this is going to look like five or 10 years from now. So we're really kind of at a tipping point. So the first problem I'm going to tell you about is the design of protein binding. I'm going to first tell you about design of protein binding using the traditional physically-based approach. And then, towards the end of my talk, I'm going to show you how we can design binding using deep learning.
DAVID BAKER: So we now have designed proteins to bind to quite a large number of different protein targets of therapeutic relevance, over 50 now. And in the Rosetta approach, we start with the structure of the target, and we de novo generate hundreds of thousands of scaffolds, we dock those against the target, we find those that are shaped complementary, and then we design a binding surface so that the designs will be chemically complementary to the target.
DAVID BAKER: And here, you see comparisons of crystal structures to design models. And you can see that they're very close or there's quite a range of targets here that are listed. And they range from receptors involved in-- TrkA is a NGF receptor. Here we have a mini protein that we designed against COVID. Here's a transferrin receptor binder that is showing promise for blood-brain barrier trend reversal, and a number of others as you see here.
DAVID BAKER: So we have quite a long list. We actually have binders now in most cytokine receptor binders and growth factor receptors. And what we're doing with these is using them to bring together receptor subunits in new combinations, orientations, and valences. We're getting very interesting signaling readouts, which I don't really have time to go into today. But I did want to give you one example of a higher order assembly made by holding these little binding domains in a rigid.
DAVID BAKER: Orientation and that's shown here. This is that COVID minibinder protein I showed you before. Here's the spike protein of the virus. And here is a designed trimer that holds these three domains. These are the binding domains here in exactly the right orientation to simultaneously engage the three RBDs on the spike protein. And this is really a neat protein. Unlike any of the antibodies which have been examined so far, this protein completely neutralizes all strains of the coronavirus it's been tested on.
DAVID BAKER: That's because each of those domains binds with picomolar affinity. And there are three of them lined up. So we think this is a general strategy for antivirals moving forward. And as I'll show you, we can design these by binding domains very, very quickly now. So this could be very good for very rapid pandemic response. OK, so that's design of proteins that bind to a folded protein.
DAVID BAKER: We've made a lot of progress recently in the design of proteins which bind to disordered peptides and disordered proteins. So a classic example of this are peptides that form amyloid. And this is work from Danny Sahtoe and Hannah Han. And so what they reasoned is that peptides that had some propensity to form [INAUDIBLE] strand structure could be bound within an empty slot in a beta sheet, as shown here.
DAVID BAKER: And this has worked really well. This is designed to bind to beta amyloid. And this data from Tuomas Knowles's Lab showing that whereas this A beta peptide will aggregate very, very rapidly in solution. Once you add the protein in, it completely shuts down aggregation. And we've seen this with a number of different amyloid peptides, for example, tau.
DAVID BAKER: So we think this is a general method for suppressing amyloid formation with some obvious therapeutic potential. For biotechnology, it'd be really great to be able to make sequence specific peptide binding proteins. And for that, Kejia Wu has developed a series of proteins which bind to extended peptides, where each residue, or every other residue on the peptide, binds to a pocket within the protein.
DAVID BAKER: And you see that illustrated here, where different types of peptides in different conformations. What this then allows us to do or Kejia to do is to change what's in the pocket and get specificity for different peptides. And so she's been able to design very high affinity binders for quite a variety of different peptide sequences. In this case, these are scaffolds which bind to proline-rich peptides.
DAVID BAKER: But she can change the pockets to get specificities for different residues beyond the proline. And as you can see here, the binding constants are quite tight. Now, by changing the pocket suitably, Kejia can design binders that are specific for cellular proteins. This is a collaboration with [INAUDIBLE] lab, where she basically picked up-- he had a protein of interest, which was disordered.
DAVID BAKER: She designed a binder to a part of the disordered region, and we can use that can specifically pull down that complex from cells. So this is a very general approach to making binders to disordered proteins, which have been pretty hard to bind up till now. We've done this now for quite a number of different disordered proteins. So that's designed proteins and designed peptides.
DAVID BAKER: Cameron Glasscock, and Robert, and Ryan have made really great progress in designing DNA binding. And here are some examples of what they've been doing is using a similar approach to what I described for proteins to design-- basically take a DNA target site and then design from scratch a protein which binds to that site. And here are examples of two such binders, and they bind with quite high sequence specificity.
DAVID BAKER: This is experimental data here in which each residue is mutated one at a time. And the color tells you what the loss in binding is upon mutation. So these little design proteins have very high sequence specificity. And because we're close to being able to design them for any arbitrary DNA sequence, they could have really nice properties for genome engineering.
DAVID BAKER: They're much smaller than Cas9, obviously. And so we're really quite excited about what can be done with these. They work in cells, so this is now transcriptional repression in E. coli. Here, we have two different binders, one that's binding and their corresponding binding sites. So this protein here represses transcription its target site, but not at a different target, at the second protein's target site.
DAVID BAKER: And the inverse is true for the binder that binds to the second target site. So we can get sequence-specific repression. And we also have sequence-specific activation in mammalian cells. So we're very excited about the applications of these moving forward. Now, another class of proteins we've been very interested in are transmembrane nanopores.
DAVID BAKER: And so this is work from Samuel Lemma, Sagerdip Majumder in [INAUDIBLE] lab and collaboration with Anastasia Vorobieva and Carolin Berner in Belgium. So we've been able to design proteins, which these are beta barrel proteins, unlike most of the proteins I've shown you so far. And these are transmembrane proteins. We solubilize them in detergent, and here's a crystal structure comparing to the design model.
DAVID BAKER: You can see it's very close. And we can design proteins with pores of increasing sizes. So this is a 10-stranded beta barrel, a 12-stranded beta barrel, and a 14-stranded beta barrel. And you can see the pore getting bigger and bigger. And the conductances, we can measure by incorporating these into planar bilayers. And you can see that they are increasing systematically as we increase the size of the pore.
DAVID BAKER: Now, there are all sorts of really exciting applications here. We can put binding domains on the surface of the pore, as shown here. And we get very strong-- this is the binding domain I told you about before for the coronavirus. And we get very strong gating of the current by in the presence of spike protein.
DAVID BAKER: We can also do this to send small molecules in this specific design. Now, one of the things I'm particularly excited about now, since we can make really large numbers of different pores, is we can take different small molecules, and what we find is with different pores they give different signatures. This is the modulation of the inductance. And so there's been this idea of artificial nose for a long time, but there hasn't been a really good way of making many, many different nanopores.
DAVID BAKER: Now we can make an essentially unlimited number of those, changing the size, the shape of the pore, and the chemical composition. So I think this will really become possible. And once you have, you can imagine, training on data on many different compounds, many different pores, and then using that to train your neural net so that, given a mixture, you can apply it to each pore and then deconvolute what you have at the end.
DAVID BAKER: So in terms of materials, we have been making progress here as well. So this is work from Tim Haddy. And here, most proteins are very irregular, which makes it hard to build with them. But Tim, and Yang, and Ryan have been making these very regular proteins shown here, which makes it easier to construct things out of them. Here are some of the building blocks on this slide.
DAVID BAKER: So we have these curved building blocks, which we can assemble to make these rings of different sizes. And we have the straight ones with right angle specifics. These are experimental data here. We can make hexagonal proteins, square proteins, circular ones. And one of the neat things about these structures is since everything is straight, it's like making things out of lumber. You can simply use longer boards.
DAVID BAKER: So here, we have small squares and big squares. This is the Yang data. And we can likewise build three-dimensional structures. These are cages built from these very straight parts. And as we increase the length of the edges, the cages get bigger and bigger. It's really easy to see with this cube right here. So now we have really modular expandable protein nanomaterials.
DAVID BAKER: And this works for unbounded structures too. Here's basically a protein train track made out of these two blocks. And by changing the sizes-- so here, yellow is inserted regions-- we can change the spacing of the block. This is the experimental data. Here, they're close together. Here, they're far apart.
DAVID BAKER: Here, the two ties are longer. And here, the two rails are further apart, and the ties are further apart also. So we now have a control over build up of assembly that we didn't have before. OK, so that's what we can do with traditional or physically based modeling using Rosetta. Now, deep learning has really transformed all of this, and we can now do even more complex things.
DAVID BAKER: So here's a first example of catalysis. So this is a luciferase reaction on a synthetic substrate. And we try to design proteins that would specifically stabilize this high-energy anion and an ionic species. And we did it by generating scaffolds with the right shape and size to hold this substrate and then designing a pocket around that would selectively stabilize that species.
DAVID BAKER: And this protein catalyzes the reaction quite proficiently with [INAUDIBLE] of 10 to 6 per mole per second, which is quite respectable. And one of the neat things is that while the naturally occurring luciferases, which have these big, open pockets, are not very selective among substrates, this design one, which has this very close-fitting pocket, is very selective, so it only operates on its own structure on its cognate substrate.
DAVID BAKER: And then Yeh, who did this work has moved on now to designing specific catalysts for these other substrates to enable multiplex luciferase imaging. So just after DeepMind announced AlphaFold, we were inspired by their work. And it wasn't clear at that point whether AlphaFold would become available or what was really going to happen. So we developed RoseTTAFold, which turned out-- when the details of AlphaFold were revealed, it turned out to differ from AlphaFold in a key way in which we had a third track in which information is successfully transformed-- the three-dimensional coordinates.
DAVID BAKER: And so the RoseTTAFold2, which is the version which we're just releasing, is now-- the original version of RoseTTAFold is not as good as AlphaFold. RoseTTAFold2 is about as good as Rosetta as AlphaFold on monimers and AlphaFold multimer on multimers. So it was interesting doing this. We could see which features of AlphaFold are really critical and which ones-- we obviously when we were improving RoseTTAFold or RoseTTAFold2, we, at that point, knew what was in AlphaFold because the paper had come out.
DAVID BAKER: So it turned out, some of the things that looked to be really important weren't actually as important. But some of the things like the simple concept of the loss function used turned out to be really critical for getting that improvement. OK, so since what we've gone on is to generalize RoseTTAFold so it doesn't just model protein, it models protein plus anything else that's around-- carbohydrate, nucleic acid, any small molecule, covalent modifications.
DAVID BAKER: So it's very general now. And so we can use RoseTTAFold and this is work from Frank Tamayo and his group-- we can use RoseTTAFold to build models of protein DNA complexes, protein RNA complexes, to model covalent modifications, and bound small molecules. I should say, the limit on predicting small-molecule binding is really the size of the training set. So we're not as accurate at that as we are predicting protein structure.
DAVID BAKER: OK, so now I'm going to talk about design using RoseTTAFold. And so the first thing we did was to generalize RoseTTAFold, which the initial version took in sequence and predicted structure, we generalized it so we could delete parts of sequence and structure and train a network to put the sequence and structure back. We call this procedure inpainting. And this worked quite well.
DAVID BAKER: Here is an example. So here is taking three epitopes from a virus, the RSV F protein, and embedding them, building a small protein that displays all of these. So basically what we're doing is we're giving it these three epitopes, like there are three phrases in a sentence, and having this inpainting protocol build out the rest of the structure. So these proteins are well folded, and they bind the corresponding antibodies.
DAVID BAKER: But there were two problems with this inpainting approach. One is it was deterministic. So if you ran it multiple times, you always got the same answer. And the second problem was that it couldn't really-- you needed to have a certain minimal amount of information. It's like if you're trying for sentence completion, you need to have enough of the sentence, so you can figure out context and extend it. So we developed RF diffusion to get around these problems.
DAVID BAKER: And it really works well, and I'll tell you about this for the rest of my talk. So here, what we do is we start with completely random noise, as shown here. And then, in the training stage, we basically take native protein structures, noise them to different extents, and then we train fine-tuned RoseTTAFold to predict what that original structure was.
DAVID BAKER: And then, at the inference when we're actually using this design proteins, we start off with random noise, and we successively apply this denoising network to remove the noise. And what happens when we do this is we can start with absolutely no information, generate folded structures at the end. And if we run it multiple times, each time starting from a different random noise distribution, we get different results.
DAVID BAKER: And so one of the first things we did was to use this to design symmetric oligomers. So we basically generate the noise, then we copy it over, in this case, three times, and then we successively denoise. And the nice thing about these is that we can very quickly determine whether the structures generated are correct by electron microscopy.
DAVID BAKER: And so here are different types of structures that are generating this way, starting with completely random noise. And it's interesting. They look vaguely-- this one looks vaguely like a TIM barrel with beta strands on the inside and the helices on the outside. But it's a TIM barrel on steroids because the TIM barrels in nature have eight beta strands and eight helices.
DAVID BAKER: And this one, I think, has 12 beta strands and 12 helices. And this one has 16 and 16. So you can see the structures are coming back almost identical to the design models. And so we can really make an almost unlimited number of new structures this way. And we can go all the way up to icosahedra, specify icosahedral symmetry, and the structures that we get are very close, again, to the design models.
DAVID BAKER: So other things we can do with this is we don't have to start with completely random noise. We can start with a particular motif, like, this is P53 peptide bound the MDM2, and we can start with the random noise around the peptide. And when we do this, we get things that, in this case, are binding to the target MDM2 with 1,000-fold greater affinity than the starting peptide. We can take metal binding sites-- this is work by Nikita Hanikel-- and basically fixed residues, say, histidines chelating a metal, and diffuse around that in a symmetric fashion to make these symmetric oligomers.
DAVID BAKER: And again, these match very closely to the design models, and the proteins bind very tightly to metal, actually more tightly than we can measure. Now, coming back to the binder design problem I started with-- here, we start with a target, and we basically put the noise on top of the target and then successively denoise, remove the noise in the presence of the target. And this results in proteins being generated that are shape-complementary to the target.
DAVID BAKER: And we've made quite a few binders to different targets this way. I just wanted to highlight a few examples. This is the TNF receptor, which is, of course, the target for many inflammatory diseases. It has this very extended binding site, which had been hard for us to target in the past. And here are two different binders. You can see their beautiful shape complementarity to the target.
DAVID BAKER: And these proteins bind straight out of the computer with nanomolar binding. And we can basically add noise and then remove the noise. So we can jitter these and make them even better and get down into the picomolar range for binding. And it's not too surprising we get really tight binding here because the interfaces are so extensive. So this is a really powerful way of making binding proteins to any arbitrary target.
DAVID BAKER: So we have some Cryo-EM data. Here's a Cryo-EM structure of a protein diffused against the influenza virus. And you can see it's very close to-- the actual structure is very close to the design model. We can also use diffusion for the peptide binding protein. Here, we have the peptide, and we assemble the protein around it. And here, you can see the-- here's actually-- we don't have the crystal.
DAVID BAKER: We've solved the crystal structure, and it's nearly identical. And what's neat about these proteins is these bind with subnanomolar affinity straight out of the computer, which is really the first time we've seen that. So we simply put in the peptide, and we actually only just need the sequence of the peptide. And we can design proteins which bind with picomolar affinity. So, Susana Torres has been using this to systematically design proteins which bind to the components of snake venom as a direct application of these binders to health.
DAVID BAKER: So there's three or four toxins that contribute-- that a lot of the toxicity of snake venom. And she has been able to design binders to these that protect cells from toxicity, both for the-- there's two examples for the short neurotoxin and for the long-term neurotoxin. And these are the changes in current that-- so here's the toxin alone, and then here's the rescue by the different designs.
DAVID BAKER: And you can see there's quite a bit of rescue. The toxins block the current, and then the binders block the toxin from blocking the current. We can now, using RoseTTAFold All-Atom, diffuse binders to small molecules. And we now have experimentally verified binders to a number of small molecules. And this is just neat because we're taking the small molecule and basically building the protein around them.
DAVID BAKER: And then, finally, we've now extended-- this work from Joe Watson and Nate Bennett have extended this binder design by diffusion to nanobodies and antibodies. So here, you can see the random noise. It's been biased in this case to form an antibody. We don't specify anything about the geometry of the loops or the rigid body. And you can see that we're specifying that we want a binder in this region.
DAVID BAKER: That's why the noise starts out there. And we've gotten a number of binders. We just got a Cryo-EM structure of one of these-- again, bound to flu. And you can see that the design is binding nearly identically to the way that the diffusion added. So now making antibodies is an arduous thing. We have to go through big selective selection campaigns. Likewise for antibodies, you might have to immunize an animal.
DAVID BAKER: We're currently testing SCFE in antibody designs, but I'm quite optimistic that we can really change the way these kinds of reagents are made. OK, so I've had really amazing colleagues who did this work and many others who aren't listed here. Longxing and Brian developed this general Rosetta based binding protocol. Kejia developed the peptide binders, and the repeat protein-- the disordered peptide binders, and also designed that high-affinity COVID trimer binder.
DAVID BAKER: Wei and Inna were involved in many of the binder designs that I showed you on the first few slides. Danny and Hannah designed the amyloid peptide binders. Cameron, Robert, and Ryan designed the DNA binding proteins. Samuel, Sagerdip, in collaboration with Carolin and Anastasia designed those beta barrels, and Sagerdip did the work shown towards the artificial nose. Tim, and Yang, and Ryan, and Xinwei did the work on the modular regular building block expandable materials.
DAVID BAKER: A lot of the design-- I didn't have time to describe the deep learning sequence design method that Justas Duparas developed, which was an important part of much of what I talked about. The luciferase design was done by Andy and Chris. Minkyung and Frank developed RoseTTAFold. And Jue and Rohith have been developing this All-Atom version. And RF diffusion has been a collaboration between many people, really led by Joe, and David, and Nate.
DAVID BAKER: And Susana did the peptide binder design and also the snake bite-- the snake venom binders. Preetham designed the-- Nate designed the small-molecule binders I showed you together with [INAUDIBLE]. And then I want to thank many wonderful colleagues at the Institute for protein design. And I'll be happy to take any questions afterwards.
JULIANNA LEMIEUX: To our live question and A session with David Baker. David, thanks so much for joining us again at the State of the Summit.
DAVID BAKER: Thank you. It's great to be here.
JULIANNA LEMIEUX: Yeah, what a great talk. Oh, my goodness. So my first question is, you talked a lot about applications that were health-related, related to people-- COVID, Alzheimer's, the nose. And I'm wondering how what you're doing might be applied to other non-health-related applications like maybe climate change, or food insecurity, agriculture, things like that.
DAVID BAKER: Yeah, we're very interested in applying protein design outside of medicine. So one general problem, for example, is increasing the thermal stability of plants, crop plants in particular-- so increasing the stability of proteins. We're working on more efficient photosynthetic systems making artificial light harvesting systems. We're working on plastic degradation and basically degradation of toxic compounds in the environment.
DAVID BAKER: And we have some exciting results on carbon capture. So yeah, we're really committed to try and make the world a better place in these really important areas outside of energy. We're also working on trying to put together biology with electronics so to involve better detection of what's going on in the world around us.
JULIANNA LEMIEUX: Yeah. I have a follow-up question now to that, which is, is that work any different because of the resources that are available as far as the human genome and the human proteome and things like that versus what's known in, say, agriculture? Or does it not really matter for the work that you're doing?
DAVID BAKER: Well, I think it does matter, and the more information that is available and the more that's known about a problem, the better we can solve it. So for example, for increasing the thermotolerance of plants as the planet heats up, understanding exactly which proteins and plants that need most to be stabilized, if we had better information on that, that would help. So more information is always better when you're trying to solve a complex problem.
DAVID BAKER: Just knowing more about what the actual proximate mediators of that problem are is really helpful.
JULIANNA LEMIEUX: Yeah. OK, great. Thank you. OK, I have another question, which is, you can do so many amazing things with these technologies designing a protein from scratch that binds DNA, transmembrane beta barrel pores. My question is, what is the Achilles heel here? Have you bumped into something that you just cannot design or is really hard?
DAVID BAKER: Right. Well, there are a number of really hard problems that we're working on because they're hard problems. And so I'll just give you two of them. So actin, and myosin, and kinesin are motors that can move and translate chemical energy into work. And so we're working hard to design motors completely from scratch. And so we've designed things that architecturally like motors, like with rotors going around axles, but now coupling that to a source of chemical energy.
DAVID BAKER: That's an ongoing research project. Another one that's related is designing really high-activity enzyme catalysts for any arbitrary chemical reaction. So I talked about plastic degradation. But there are many different chemical reactions where, if we could really speed them up enormously, it would be incredibly beneficial. And so we're working on that problem as well.
DAVID BAKER: Both of those problems, they're current research problems because we haven't solved them yet. They remain outstanding problems, but I'm optimistic that we'll make progress.
JULIANNA LEMIEUX: Yeah. Yeah, actually, a question came in from an audience member, what are the difficulties of using AI for enzyme activity? So it sounds like-- I mean, is there anything specific to add to that?
DAVID BAKER: Well, there are-- I think for the binding and the self-assembly, you just need to design something that's really chemically and shape complementary to your target. So if you want to bind a small molecule, you need to design something that has a binding site that is shape and chemically complementary. And we're using the diffusion methods I described to do that. But if you want to catalyze a chemical reaction, there are multiple transformations of that small molecule.
DAVID BAKER: So have to bind the substrate. Then, you have to selectively stabilize the transition state. But there may be actually intermediates and other transition states. So you have to compromise between those different states. And the end design may have to adjust to accommodate those different states. And that's why it's somewhat of a harder problem.
JULIANNA LEMIEUX: Yeah, yeah.
DAVID BAKER: And then on the machine case, it's a similar problem. You have to do the chemistry to hydrolyze the fuel-- say ATP or something or some completely synthetic molecule. But then, that has to be coupled to mechanical work. So that's also complicated.
JULIANNA LEMIEUX: OK. This is all work going on in your lab though?
DAVID BAKER: These are some of the major areas of focus in my lab now.
JULIANNA LEMIEUX: OK. All right, cool. OK, a question from an audience member is, how do you deal with the immune response? Can you train AI to bias the protein design, avoiding immunogenicity?
DAVID BAKER: Yes. Well, there's a two-part answer to that question. First of all, we have looked at the immunogenicity of many proteins that we've designed. And you might have thought that it would be very high if these are completely foreign proteins. But it turns out, their immunogenicity is relatively low. And so why might that be? Well, they're small proteins, so there aren't many peptides in them that can get presented.
DAVID BAKER: They're very, very stable, so they're hard for dendritic cells to break down. And they are very soluble, so they probably don't get taken up very well by dendritic cells. So empirically, we find that they aren't very immunogenetic. And this immunogenic-- that holds for mice up to humans, so the design proteins in humans have not elicited really strong immune responses. But where it is a problem-- and it probably will be in some cases-- we can explicitly design out sequences which are strong, likely to be presented on MHC.
DAVID BAKER: So we can extend the AI algorithms to reduce immunogenicity likely.
JULIANNA LEMIEUX: OK. Yeah, and then somebody actually was-- I'm sticking with the immune response question. Another person has asked, what are your thoughts on using AI for antibody discovery against antigens given that it's being integrated into life science research?
DAVID BAKER: Yeah. Well, I described that in my talk, that we have some very promising results now on basically you have an antigen. And then designing, I showed that we can design nanobodies from scratch that bind very tightly to a target, and we're currently extending those methods to full antibodies. So I think that will become increasingly possible.
JULIANNA LEMIEUX: OK. Another question-- do you think the AI-based models will completely replace the physical models? Or is there still a space for both types?
DAVID BAKER: Yeah. Well, I think we still use the physical models quite a bit for making sure the final designs are physically realistic and for modeling systems where there isn't a lot of data. So for these AI methods, you need a lot of training data. Where you don't have the training data, then the physical models are still very useful. And ultimately, I think it should be possible to incorporate the physical information into the AI models, and that's a work in progress.
JULIANNA LEMIEUX: OK. And I think we just have time for maybe one or two more questions. Somebody asked, for the AI programs used, are they set? Or are the programs evolving and changing as more research on proteins comes out?
DAVID BAKER: Yeah, so we are constantly retraining the models. We're developing new and improved models. We're extending into new areas, like I talked about designing full molecular systems or predicting the structures of full biomolecular systems with RNA/DNA small molecules. So we're continually extending the models. And then as more data comes in, we can update them by retraining.
DAVID BAKER: So yes, they're continually evolving.
JULIANNA LEMIEUX: OK, perfect. And our last question in our last minute-- it's not really just a one-minute answer, so sorry about that, David. But I want to just-- and somebody actually asked about this too. I want to ask you just briefly about the companies, that you've founded a bunch of companies based on this work. There's too many to get into individually. But can you maybe tell us a little bit about how the work going on in this space in industry is different from what's going on, say, in your lab at the Institute for Protein Design?
DAVID BAKER: Yeah. Well, that's a really good question. The work that we're doing at the Institute for Protein Design is really discovery. We're trying to develop. We're doing brand-new science and stuff and trying to break open new areas. And then, in that process, we design proteins that could make the world a better place. And so what will frequently happen is a graduate student or postdoc who's working on that will decide they want to start a company to really push it out into the real world.
DAVID BAKER: So we have a translational program where they can get support for a year to do additional experiments to get to the point where it's been sufficiently de-risked that an investor or investors would want to support a spin-out company, and then the companies spin out. And then, those companies really stay more focused on what it is that were designed or on a particular very specific problem and really try and advance that.
DAVID BAKER: If it's a medicine into the clinic, if it's something for sustainability or diagnostics, they try and get it out in the real world. So that's very hard to do from within the University of Washington Institute for Protein Design. So there's a division of labor in that way.
JULIANNA LEMIEUX: OK, awesome. Well, thank you so much. Thanks for the talk. And also, thanks for coming back to answer our questions. And it's all so fascinating. I can't wait to cover it more at Gen in the future. So thanks again, David.
DAVID BAKER: OK, great. All right, yeah. Goodbye.
JULIANNA LEMIEUX: OK, bye.
DAVID BAKER: Bye. [MUSIC PLAYING]

Cadmore media player playing video Keynote: David Baker on AI in Protein Formation

Video Player

Transcript

Segments

End of Video Player Control