Episode 3 Part 2 - Genomics Futures: AI and Synthetic Biology

Show notes

Speakers:
  • Kondwani Jambo, Professor of Immunology and Infection at the Liverpool School of Tropical Medicine;
  • Alexandra Canet, science communicator and producer of the Genomics Futures podcasts;
  • Ben Lehner, Head of the Generative Genomics Programme at the Wellcome Sanger Institute;
  • Patrick Boyle, a specialist in automation and digital for biology.
Episode description:

This third episode is split into two parts. They are both looking at two workshops that were titled Understanding and engineering cells and genomes for the future and Innovative technologies for measuring and engineering life. We merged the participants, themes and reflections in one same episode as most of the themes overlapped and reflections were quite similar.

In part two, we opened the podcast with Professor Kondwani Jambo with whom we explored the concept of personalised medicine and democratisation of genomics technologies. We then spoke to automation and digital data for biology expert, Dr Patrick Boyle, about automation in the lab and the possibilities it may offer us in twenty five years’ time. Lastly, we spoke to Professor Ben Lehner about the possibilities of engineering biology in the future.

Mentioned in the episode:
  • Vision 2063 – a vision aiming to transform Malawi into a wealthy and self-reliant industrialized ‘upper middle-income country’ by the year 2063.
  • AlphaFold – an artificial intelligence programme, which performs predictions of protein structure.
  • Protein Data Bank – a data bank that collects, organises and disseminates data on biological macromolecular structures.

Transcript

Kondwani Jambo 00:00
So my vision, really of genomics in 25 years is, I’ve been thinking about this idea for a while now, about precision global health, which is really a mixture of what we know as global health now and the future, which is precision medicine. So genomics really could facilitate this process. 

Alexandra Canet 00:23
What will the future of genomics look like in 2050?

Ben Lehner 00:29
So I would hope in 2050 labs will be different. So we will have very good predictive models for lots of the interesting questions and problems and things we want to engineer in biology. And so we will be very much more designing things.

Patrick Boyle 00:43
We have this tremendous technology to sequence genomes, and have been using that for 30 years now. I think the bigger question is, how do we use the next generation of computational tools to help us better determine what those genes do, how they work together, how cells function?

Alexandra Canet 01:00
Welcome to the third episode of the Genomics Futures podcasts. This is the second part of a two part episode in which we’ll continue to explore two of the Genomics Futures workshops, one that looked at The Future of Understanding and Engineering Genomes and Cells, and another that was titled Innovative Technologies for Measuring and Engineering Life. I’m Alexandra Canet, Science Communicator and Producer of this podcast, and we’ll be taking you through our conversations with attendees throughout this episode. Today, in part two, we will hear from workshop organiser Ben Lehner from the Wellcome Sanger Institute and the Centre for Genomic Regulation in Barcelona, Kondwani Jambo from the Liverpool School of Tropical Medicine in Malawi, and Patrick Boyle, a specialist in automation and digital for biology. In this first instance, we spoke with Professor Kondwani Jambo about his vision for the future of genomics.

Kondwani Jambo 01:59
Yeah, so my vision, really of genomics in 25 years is, I’ve been thinking about this idea for a while now, about precision global health, which is really a mixture of what we know as global health now and the future, which is precision medicine. So genomics really could facilitate this process. And this involves looking at populations that are at risk of certain diseases, but not necessarily looking at at an individual level, but in a population level, because then that allows us to have targeted interventions at populations, not just necessarily at an individual level, especially in resource limited settings, which they might not have the luxury of precision medicine. So to me, I feel like genomics in the next 25 years, if it follows a trajectory that I would want love to see, would be something where it actually reduces the cost at which we support global health, because at the moment, what we do is an all size fits all approach most of the times. So we have a vaccine that’s been tested in population A, we apply it in population B. Hopefully it works, but we know from the current data we have from multiple vaccines that vaccines don’t work the same in different populations and settings.

If we could get at the point of a regional level, regional vaccines, like we do this already like with flu because of its nature, where we have a Northern Hemisphere and the Southern Hemisphere flu formulation, but I’m thinking beyond that, if you can segregate regions depending on their similarities, genetic similarities would be important. Africa will be the most complex one, considering it is the most genetically diverse, and genomics can really be used both at the human level and at the pathogen level, because there are all kinds of pathogens that are endemic to one region and not another. So but a lot of what happens is, for example, the pneumococcal vaccine, the most licenced, the kind that is licensed, the most available one, is pneumococcal conjugate vaccine 13, which has 13 serotypes. The pneumococcus has over 100 serotypes, and those stereotypes were chosen because they were causing a lot of disease, mostly in high income countries. So then the vaccine was licensed, then we use the vaccine across the world, and you’ll find that in some cases, those serotypes that are causing disease in the high income countries are not the same ones that are causing disease in the low income countries. So in that scenario, if you could predict that earlier on when developing vaccines, then you could have regional vaccines, and just the approach we have right now, which is all size fits all.

Alexandra Canet 04:31
So to achieve this vision that you’ve just talked to us about Kondwani, what main steps do we need to take? How can we democratise this technology?

Kondwani Jambo 04:42
I think I look at this from two perspectives. First is at the level of the skill, the people, you have to democratise at that stage first. Training the people in the use of AI, making it something that is available to every Jim and Jack, like we have a mobile phone. Everywhere in this world. You get to every village, you find a mobile phone. Yeah, that technology was democratised over time, and we can democratise these kind of technologies in similar order.

The second point really is technology in terms of infrastructure. So one of the key things is really having infrastructure to process data. So most of the stuff that is being done is that the data is processed, usually off site, elsewhere, and in some cases, people have no control over how that data is processed, how the data generated, yes, they contributed, but it’s processed as a different ballgame. So to me, in all resource settings, there has to be some recognition that there has to be some infrastructure that is also invested in those areas, because it benefits all humankind at the end of the day.

Alexandra Canet 05:51
So if we could ask you to paint a picture of these two things in 2050, let’s start with skills and people. What would that scenario in 25 years time look like?

Kondwani Jambo 06:06
I would see things more to do with strategic thinking being promoted more other than routine tasks, because routine tasks would really at that point, would have been being done by AI, being done by other pieces of technologies that have been developed then, a lot of it would be our role will be how to use these things strategically. That’s what I think about in 25 years time, that that’s the kind of thing I would want. And I actually for myself in my research team, this is the kind of stuff I teach them and try to give them opportunities for, to really look at this AI and all the technologies coming as tools. They’re not going to tell us everything. They’re tools. It’s actually a human brain that needs to design what you want the AI to actually do, what you want to use it for, because their applications are absolutely enormous, and it’s up to us to use them to our benefit as humankind.

In 25 years time, there will be different kind of jobs, different kind of skill set and thinking about in a role in a low resource setting, we need to start to build data centres now for that, because, like I said, one of the key components in our setting is the generation of data. So in Europe or in the US, generating data is not much of a problem. It’s very difficult here to generate routine data from systems that we currently have. So there has to be a way, a thought process, to allow for those easy ways of generating data that we could use in large language models, in machine learning algorithms, to predict things in this setting, because, for sure, a lot of the models that we have if especially for example, if you want to do research using the AI technology, AI is as good as the data it’s been fed. That’s it. At the same time, the same bias that we have as human beings is also included in AI process, because that’s how it runs. Meaning, if we allow for more diversity, to allow diversity, we have to allow an investment in generation of data in regions where the data does not exist.

Alexandra Canet 08:12
Kondwani, you attended the first workshop that we ran here in Cambridge, and I’d like to ask you what was missing from the conversation? What should we take into account when thinking of the future of genomics in 25 years time?

Kondwani Jambo 08:29
I really found the whole thing very interesting and fascinating. Probably it was the first time I went to a place where people said, let’s think about 25 years from now. I say to me, I’ve never had that kind of thing. And if anything it’s that there’s probably government initiatives that are thinking about in Malawi, we have what we call Vision 2065 but it’s just a broad brush thing without much detail. The detail was supposed to be made over time. That’s the kind of stuff that I’ve seen people do. But what we actually did to look at, look at the current technology. What kind of thing would want to see in 25, I think that’s amazing, to be honest, it was absolutely amazing. And I would love to be involved in those kind of discussions more and more. And also saw that the things I take for granted, others don’t know about because of just the circumstances that they face themselves into, I look at myself as somebody who’s privileged. Why? Because I have spent time, I have been trained in the UK, I’ve been trained in the US, I do my work in Malawi, in Africa. So I have three perspectives of different populations that really I can clearly see differences, even just between the US and the UK. You can clearly see the key differences. How we are thinking about the future is all different. We have all different aspirations. But at the end of the day, what I saw is that everybody wants good. What I think was missing from that is diversity. We could have had more people to give other views, probably similar to mine or different, plus also other regions of the world anyway, not just Africa, Asia and others that could have had a different perspective of how these things be used, because we also remember, there’s also culture involved in this, on how do people pick up technology? So we need to take all that in consideration.

And also, I kind of feel like one thing was missing, which is finding out what are the needs, you know? So we probably went at it at the point where we’re saying, This is what we think, what we will need in the future. So what do we how do you plan that? But maybe now that we have had that foundation going backwards and looking at what actually people need across the world, what is also their view of the world in 10 years, 25 years time, I think that exercise could actually be useful beyond talking to scientists, beyond talking to experts, because we have our own way of thinking. Part of this is biased towards our own careers, while the people on the ground, the people in the communities, have a different aspiration of their life in 20-25 years time, we need to be able to, anything we’re going to do, let’s address real solutions that will target real people.

Alexandra Canet 11:08
Artificial Intelligence featured prominently in most of the Genomics Futures workshops, as Dr Jambo has referred to multiple times. Another theme that was also prominent, especially in the workshops we are covering in this episode was automation and its role in the future of biology. We spoke about this with Dr Patrick Boyle.

Patrick Boyle 11:27
I think there are a lot of powerful tools being developed now in genomics, leveraging AI, but for me as a bioengineer, I’m always interested in trying to determine what genomes do and how we can better control and design functions in those genomes to kind of borrow some older bioinformatics words. I think what really matters for the future of genomics and AI will play a part in this is annotation. So in other words, we have this tremendous technology to sequence genomes, and have been using that for 30 years now. I think the bigger question is, how do we use the next generation of computational tools to help us better determine what those genes do, how they work together, how cells function? And I think that’s really where AI is going to be most helpful.

Alexandra Canet 12:12
Patrick, you brought an industry perspective to the workshop. What can we in academia learn from what’s happening in your field?

Patrick Boyle 12:20
Yeah, I think in industry we have the advantage of being able to build large teams to accomplish things. Obviously, the Sanger Institute is one of the places that can do this in academia as well. But for me, over the last 20 years, what I’ve really seen is the ability to really tailor automation for new applications in biology. So you know, why is automation important? Ultimately, in biology, progress is sometimes just limited by our ability to even use the tools that we have at our disposal. So take a DNA sequencer, for example. Oftentimes the bottleneck in leveraging a sequencer is making sure you can actually get, you know, enough samples into that sequencer. So robots, automation help us there. In the future, I’m interested in how we can apply automation to improve AI tools for understanding biology. So in other words, if you can accelerate the number of experiments that you can accomplish per day, your learning rate goes up, right and whether you’re using AI to understand that data or people that can be beneficial. So in industry, we’ve really had the opportunity to bring together advancements in automation to accelerate the amount of data that we can collect. AI is a very data hungry technology. So the hope is that AI and automation really work well together. AI helps you collect the right data, and automation helps you collect more of that data, so that you can then, in turn, create better models, and that ideally creates a positive feedback loop where the AI and the automation are working together to help you learn faster.

Alexandra Canet 13:57
And in terms of automating biology, what challenges, or topics, or themes came out within this field?

Patrick Boyle 14:05
And I think one of the challenges for automating biology is that if you go to a biology lab, really a lot of what the work is moving tiny amounts of liquid around, right? Which doesn’t sound very interesting, but you can actually do a lot of amazing things in biology by designing different experiments in that way. Ultimately, the robots that help us do that work at larger scale, basically just have the kind of tool that we use by hand is called the pipette. There are a few different ways that robots can help with this traditional liquid handling robots are just robots like you said, with robot arms, with pipette tips. Usually they have more than one pipette so they can move more samples around at a time. The typical number is somewhere between 12, or sometimes 96, or even 384, but it’s really just doing more of what you would do by hand. There’s some newer instruments that are really fun that actually get rid of the pipette tip altogether. They’re machines that use, for example, sound waves to move droplets from one place to another, but ultimately, they’re all just fancy ways of mixing together very specific, precise amounts of liquid. And as you can imagine, that’s kind of a different type of automation than you might see, for example, in manufacturing, where it’s more about kind of manipulating pieces of equipment that might move things together. For us in biology, it’s really about like mixing the right liquids together and the right combinations very precisely, microliters or nanoliters at a time, and making sure that can be done consistently, which is actually surprisingly hard to do with automation. People are self correcting and can become very good at this, and calibrating and maintaining robots to do that at high precision is in itself an engineering feat.

I think the trend, if we’re in moving in the right direction with automation, ideally, we’re freeing up the time for scientists to spend that time developing new hypotheses or analysing data. Of course, these are all areas where AI can help as well. But I think even in 2050 I feel like there’s going to be a tremendous value from human creativity in this process, right? So that was a topic that came up in the workshop, which is, to what extent are these tools augmenting what we can do versus replacing what we can do? And I think ultimately, one of the challenges, or at least for me, I’d like to augment the creative part of the process and replace the rote part of the process. If you look at the history of lab automation, a lot of the automation that we have today was originally developed during the early 2000s when the first human genome sequence was assembled. Interestingly, in my opinion, not a lot of that technology has advanced sufficiently since then. My hope is that increased interest in AI for bio, which will demand more automation, will actually spur another generation of upgrades to robotic hardware. Because, you know, if you want to do lots and lots of sequences of plasmids, that’s relatively straightforward to automate if you want to manipulate mammalian genomes to create perturbed cell libraries to train an AI model, that ends up being a lot harder.

Alexandra Canet 17:10
And what did you think of the whole horizon scanning process, or what was missing from the conversation?

Patrick Boyle 17:17
I guess. Question for you, is there anything that’s come up in your conversations that you think is controversial? Because, you know, I think one of the things that was, that came up to me that I thought was really interesting, and has actually been, I’ve been thinking about it since the workshop, is what is the contribution of people to academic progress? When we went from punch cards to programme computers to teletype. There wasn’t a lot of controversy about moving beyond punch cards, right? It was seen as something that was necessary, but very labour intensive to be able to write and programme software. I think as biologists, there is an anxiety that we have in our field that ultimately, what if we get replaced by robots? And one of the things that I’ve seen in my career so far is that if you’re doing it right, you’re not getting replaced by robot. You know, that robot freeing up your time to be more creative. One thing that came up in the workshop is this idea of like, okay, will we not see lab benches anymore in 2050. I think we’ll see lab benches, lab benches are really great to allow you to be, ultimately, as flexible as you want in terms of designing experiments and thinking about what you’re doing. I just hope that there is an automated lab chugging away in the background doing all the stuff that we don’t want to do, so we can spend more time at that lab bench or analysing data, whatever it is that you feel is your contribution. But we have to let go of the things that are equivalent to punch cards, the things that are basically just standing in the way of us asking the right questions.

Alexandra Canet 18:52
To end the second part of episode three, we spoke with Professor Ben Lehner, his views as organiser of one of the workshops and the challenges of understanding and engineering genomes for the future. We asked him, in 2050 what would his work in his lab look like?

Ben Lehner 19:09
So, well, I think actually, at the moment, we sort of trick ourselves into thinking that we understand biology, microbiology quite well, and I think actually we don’t. And I think the evidence that we don’t is that it’s quite hard to make any accurate predictions in biology. So to quantitatively predict how a biological system, even a simple one, will respond to a perturbation, or what a mutation genetic perturbation will do.

And also engineering biology. So changing biology or making new things is really hard and very much still, in many aspects, a trial and error process, very labour intensive.

So, I would hope in 2050 labs will be different, so we will have very good predictive models for lots of the interesting questions and problems and things we want to engineer in biology, and so we will be very much more designing things. Experiments will be very much more automated and parallelise using massively advanced technologies from what we have. So you can actually, if you can think of an experiment, you can just do it. So we’re very constrained experimentally, but if it becomes much easier to do experiments, then we can test many, many ideas. We can test many designs, and everything will go< faster and easier.

Alexandra Canet 20:35
You said something really interesting there Ben about experiments being difficult now and that they will become easier. And you have hinted a little bit at automation, robotics, AI, but what do you think fundamentally needs to change for those experiments to be easier in 2050?

Ben Lehner 20:54
If we compare the experiments that we can do now, so from in my lab, the thing that’s really changed is DNA sequencing is very cheap and super quantitative, and DNA synthesis has become much better and cheaper, and so now one PhD student in my lab can do as many perturbation experiments on a protein or a bit of regulatory DNA as like the whole world could 10 years ago, which I mean. So we can do half a million experiments in a single flask, and have very quantitative measurements for some bits of biology. So if we want to change the sequence of a protein and measure how it changes the activity of that protein for many activities, we can now do that on a massive scale, which was not possible 10 years ago. And I think the challenge is how to get that kind of throughput and get those kind of quantitative perturbation experiments in a parallel way for lots of other things in biology, so higher order things we’re interested in, so going away from single molecules, so that rather than doing an experiment, looking at the data, thinking, doing an experiment, etc, you can do half a million experiments all in one go, and then it won’t be us looking at the data, it will be some AI looking at the data and coming up with the next set of experiments as the models refine.

Alexandra Canet 22:13
I asked you at the beginning Ben, about the changes you’d see in the lab, but what about for all of us who are outside the lab?

Ben Lehner 22:19
So we now know the genetic causes of many different human diseases, particularly rare diseases, more than 5000 at least. And yet we don’t have effective therapies for most of those, the vast majority of them, and it takes a very long time from when a disease gene is identified to when the first therapies are developed. Good examples, the cystic fibrosis gene was identified, one of the very early, relatively common, rare disease genes, and the first effective therapeutics that, which are now on the market, which are actually very good. It took me decades for these things to be developed, and that’s for a more common genetic disease. And there’s many, many, many, very rare genetic diseases. So we have to work out how to do this for the others. So basically, one of the problems is, I think, is actually not the genomics, a lot of the problem is actually the chemistry, and the drug development is not particularly effective at the moment. So we know lots of things that we want to inhibit or activate, we know what the correct targets are in many diseases, and we just don’t have a mechanism to do that using using drugs. There is a potential to generate hugely more large data sets about how small molecules, etc, are interacting with proteins, to do that in a public way, so that the information is all shared and that the people can train models on and learn how to predict this stuff. This is not something that we’re actively working on, but it is something that definitely should be happening more in the world. And the problem is that the closer you get to chemistry and the pharmaceutical industry, the less data is shared, the more everyone just hoards their own data because it’s of value. And this really slows things down. You know, if one-, if there’s one thing we’ve learned from genomics i’ts that pre competitive data sharing is a great thing for science and also a great thing for the economy.

Alexandra Canet 24:09
I recall AI being a protagonist of these workshops. What are the challenges that were discussed?

Ben Lehner 24:15
I guess there’s two answers to that. So one of the challenges of AI is having the right kind of data to train the model. So that’s kind of deflecting the question back towards data generation. Fundamentally, in molecular biology, I think the reason why we can’t predict a lot of things is not because we don’t have the correct neural network architectures or machine learning approaches. It’s because we don’t have the right data to train them. Okay, so I think fundamentally, biology is very data scarce, very data limited. We don’t have enough data. We don’t have diverse enough data. We don’t have data of the right type. We don’t have well calibrated data. And the example of protein structure prediction so AlphaFold and all the methods that preceded it, the reason that worked is, yes, very good machine learning and engineering, but fundamentally, it’s because there was this wonderful data set, which is the PDB, the Protein Data Bank, the set of all the protein structures collected by the scientific and community over decades at very high cost. But that data is super quantitative. It’s represented in a very standard way, and so it’s a wonderful training data set, and we have to think about for all the other questions we’re interested in, biology, all the other things we want to predict and understand, how do we generate the PDB for those problems, like, how do we efficiently and cost effectively and quickly generate these large, standardised, quantitative, diverse data sets. That’s the data side. So I really think the data is actually the most limiting thing in biology.

On the AI side, I mean, there’s many super interesting questions in AI, I mean, one, one for biology, that’s that’s very important, is to how to go from black box models that we don’t understand how they work to interpretable models, or at least explainable models, so that we can actually learn something. The science bit is understanding how things work, not just being able to predict them, but it’s also very important in biology, because for making, for example, clinical decisions, you want to have a model where you can explain and understand how the model has made a decision, not just have the output from some black box model.

Alexandra Canet 26:24
And continuing with the challenges, another one that was identified during the workshop was how to advance in vivo measurement technologies.

Ben Lehner 26:31
Yes, exactly. So this comes back to a point I was trying to say earlier on, that in my lab, we work on individual proteins and bits of DNA, and this is because I think these are tractable systems at the moment, as we can make enough measurements that we can constrain models, to train models to predict what’s going on. But when you go to higher levels of organisation in biology, so cellular phenotypes, or phenotypes in tissues, organs or whole organisms, there is a very big gap between the perturbation you’re doing, which might be like making a mutation and then what you’re measuring. There’s a huge number of possible mechanisms which could act between the perturbation, the change in the sequence, and the measurement you’re observing. And that makes it very difficult to constrain meaningful models like, because there’s many, infinite number of possible mechanisms. So we have to work out ways to make very large numbers of measurements at the same time as we’re making very large numbers of perturbations, and to make these measurements across the hierarchy of scales in biology. And we also have to think a bit about what kind of measurements do we want to make, and what kind of level of model do we want to make in order to explain something. If we want to just understand how a tissue develops or an organ develops for example. It probably doesn’t make sense to come up with a modelling framework that involves considering all the positions of all the atoms and all the proteins inside the cell. You probably just want to understand it at some more conceptual cellular level, where you have some slightly phenomenological model, which is still generative and predictive and can be constrained by data. But the question of how do we make large scale measurements in vivo, in biology, I think, is a key question that people need to think deeply about.

Alexandra Canet 28:22
Thank you for listening to the third episode of the Genomics Futures podcasts, as mentioned in the introduction, this is part two of episode three. If you haven’t listened to part one or to the other first two episodes of the series yet, you can find them wherever you get your podcasts, under Genomics Futures.

The next podcast will look at the topics of rewilding, bioprospecting, climate change, and what genomics can do to solve challenges in farming, biodiversity and conservation.

The Sustaining Life on Earth workshop, in which we talk with experts in marine biology, climate change, conservation and the philosophy of science and what this will all look like in 2050.

If you want to get in touch, please, do you might agree, disagree or have your own thoughts about the topics and themes discussed in these conversations. We’d love to hear them. You’ll get in touch with us at genomicsfutures@sanger.ac.uk

Genomics Futures Workshops

Wellcome and the Wellcome Sanger Institute invited scientists from around the world to look towards the future to imagine the new opportunities presented by genomics research