January 29, 2020
The Algorithm for Precision Medicine
Hugh Kaul Precision Medicine Institute
Precision medicine promises to deliver ultra-personalized care by casting medicine as an optimization problem: identifying the best possible treatment with respect to all available data.
A slew of recent advances in biology, starting with the ability to sequence the human genome, have caused an explosion in the amount of data one can collect on a single patient and a similar explosion in the complexity of reasoning about this data in order to solve this optimization problem. Computational support for the practicing physician is no longer an option.
This talk covers precision medicine from the ground up for computer scientists — through a personal journey from programming languages research into academic medicine. It will demonstrate progress to date, including the now-routine use of relational programming in miniKanren to identify personalized treatments for patients with some of the rarest and most challenging diseases in the world.
Matt Might has been the director of the Hugh Kaul Precision Medicine Institute at the University of Alabama since 2017. At UAB, Matt is the Hugh Kaul Kaul Endowed Chair of Personalized Medicine, a professor of internal medicine, and a professor of computer science.
Thank you. It’s a pleasure to be here and to present to you all.
I know you’re deeply interested in medical topics. Turns out, I used to be one of you guys, I spent most of my life as a computer scientist, more specifically as a functional programmer and academic and functional programming languages. I got to give you my standard disclaimer on this sort of stuff. It’s slightly modified for this audience, and you might even consider it more of a confession in that sense.
This is actually a medical talk. I really mostly do, at least what I know how to do, is programming languages and functional programming. Actually, I even try to give a sense of what my work has been like over the years. If you look at the split along object oriented versus functional, it’s almost all functional in terms of the kinds of research I used to do. If you look at the kind of stuff I published up until quite recently, it was all PL. Nothing in medicine.
All that is to say is that I really don’t have any medical training and if I give you medical advice in this talk, don’t take it. Yeah, this is what happens if you try to remove ants from your backyard with gas and forget about what happens when you get an air fuel mixture. I guess the ants are gone. Relocated or something like that.
I also want to make an observation. This is a talk really about the fusion of CS and biology. This fusion is not even remotely new. It’s been going on really since the very beginning. Some of you might be familiar with the famous biologist Alan Turing. This is true. He’s actually a biologist. He just didn’t realize it. In fact, to prove it, go to Google Scholar and look at his publications, his top publication is in biology, more citations than everything else.
[inaudible 00:02:11] it’s his one paper in biology, but it’s a foundational paper in biology and he’s using computational modeling, essentially reaction diffusion systems to show how do complex organisms evolved from single cells. How do you get the interesting patterns you see in nature, like say, the stripes on a zebra or the shapes of a leaf?
Even before we knew what DNA was, he got it right. He found the model that actually predicts how this works in nature. He’s still widely regarded within mathematical biology as one of the founders of the field.
So I’m going to make some claims today. I will claim that data is the greatest drug of the 21st century. I really do think that when we look back from the end of the century, we will believe that this was the best thing we ever used for improvement in human health. I’ll claim that this emerging field of precision medicine which I now work delivers data as a drug. That’s really what we do. That’s because precision medicine is a process. It’s a step by step process. I’ll present an algorithm for doing precision medicine.
And I’m also going to make another claim. For those of you who spend enough time drinking the FP Kool-Aid to play with proofs of inference rules and things like that. I’m also going to claim that for precision medicine to work in practice, you have to generate proofs. By that I mean, every time you make a recommendation to a physician, you have to tell them why you think it’s a good idea. You can’t just say, “My algorithm said so.” Turns out, they will not take your advice. So there’s a role for a lot of FP technology in precision medicine.
What is precision medicine?
I think it really took off back in 2015 when Barack Obama launched the Precision Medicine Initiative and this was designed to lay the scientific foundation for the field as a whole and he defined it very simply back then as precision medicine is all about delivering the right drug to the right patient at the right time. That’s a good working definition even today.
Now, after that, there was this explosion of terms that tried to claim the mantle of precision medicine. Everyone who said, “I do precision medicine. This is what it is to me.” I spent a lot of my time since then trying to unify this under a common definition and I do think that this definition, the use of data, particularly big data, to optimize health, counts, is what precision medicine is really all about.
For me, precision medicine is an optimization process. Some physicians disagree with this. “But I always try to get my patient the best medication. That’s what I’ve been doing this entire time.” What I tell them is, “Yeah, but you didn’t have as much data then as you do now and you didn’t have so much data that in order to find the optimum, you have to use a computer to do it.”
Precision medicine is a natural consequence of huge amounts of data becoming available on a per patient basis and the subsequent computational demands for finding the right thing to do with that data. It is a process, goes from a patient ultimately into recommendations for them either a pill or a procedure of some kind. But it starts with their data.
The piece of data that really kicked off precision medicine, it really is the human genome. This became a clinically accessible entity a few years ago and people started to grapple with the consequences of what it means to have the blueprint for a human being when you treat them for a medical condition. But all forms of data ultimately account as inputs to this optimization process and even the data that was already laying around at the time, like the electronic health record, even that was being poorly utilized and honestly, it still is being poorly utilized in day-to-day health care right now.
But everything counts, Fitbit data, Apple Watch data, social media data. Turns out, you can make inferences, reasonably accurate inferences about mental health in some cases from people’s social media data.
I think all of that counts for constructing this constrained optimization problem. Where you get to modulate things like treatment and lifestyle and maybe even environment in some cases to try to find whatever brings a person to their optimal state of health.
A lot of precision medicine is trying to modulate these two factors, treatment and lifestyle, to find what that person actually needs. I’ll tell you right now that a lot of the time in practice, we don’t jump all the way to an answer. In many cases, what we recommend is not a pill but some kind of scientific procedure where we say, “You have to go run an experiment because we don’t have enough data, but we are pretty sure if you did this, you would have enough data or you’d be closer to enough data to answer this question in a meaningful way.”
I’ve also been claiming more recently that you also have to give a proof along with whatever recommendation you make. This is from direct practice with doing this with real patients where if we try to tell their physician that you should give them this pill and then don’t tell them why, they go, “No, I’m not going to do that because I’m legally liable for that. I have to know why I should do this for a patient.”
I think one of the pitfalls of a lot of machine learning systems like Watson when they try to make recommendations for patients is that they don’t have a proof that comes with it. It really is, I don’t know, just, “This is what the Neural Ed says” and physicians don’t like that.
I’m going to tell you how this works in practice. I’ll give you my personal journey through precision medicine. Why is it that a computer scientist is giving you this talk anyway? And I’ll use that to illustrate this algorithm, large pass of it, and then I’ll generalize it and show you what it looks like for other patients, too.
This really begins for me with my son, Bertrand. This is him back when he has a year and a half. He’s 12 years old today. But life was rough for him when he started out because things were wrong and we didn’t know what was going on. We were trapped in a place that a lot of patients call “undiagnosed island” where you’ve got a child or a family member, they’re suffering from some condition and you just can’t figure it out.
In Bertrand’s case, we suspected it was a genetic disorder and what it manifested as for him was seizures, in his case, hundreds per day, extreme developmental delay, a movement disorder, and for some reason, an inability to cry tears. He could cry, like the emotions of crying, but he would never make liquid tears.
On a day-to-day basis, honestly, that was the worst thing to deal with. If you have chronically dry eyes, if you make no tears, it’s like your eyes act like sandpaper on your cornea and they were scraping off his cornea and he was going functionally blind. He had eye surgeries to correct the infections in his eyes. It actually reached the point shortly before his diagnosis they said, “Look we want to sew his eyes shut just to protect what remaining vision he has.” On a day-to-day basis, that’s what was really causing problems for us.
That was our life for four years for 48 months trapped on that undiagnosed island. We knew this whole time and we’d speculated based on everything we saw that there had to be some kind of bug in Bertrand code. There had to be a genetic mutation driving this disease. And the question was, where is it? And how do you find that?
What I’ll do now is give you a computer scientist introduction to the relevant parts of biology. For Bertrand, we had to get a source code. We had to look into his genome and try to figure out what exactly was happening here. You can think of the human genome as a very long string. It is exactly a string, 6.5 billion characters long, and you know with the genetic disease, you’re looking for one or two mutations in that string. The challenge is to find that typo in that big ass string.
Yes, DNA is a string and it’s only four letters in it. This is the alphabet, A, T, C, and G, and these are encoded as nucleotides within DNA and DNA or subunits within it called genes get translated into RNA and then translated into proteins and they do this three nucleotides at a time. For every three nucleotides, you get a different amino acid. You get this chain of amino acids formed from a chain of nucleotides. These fold under some biochemical properties into the resulting protein and then there might be some modifications that happen after the translation of the protein. These are called post translational modifications, like glycosylation or phosphorylation. But more or less is how you get a protein.
That’s a lot of biology right there. That’s the central dogma of biology. The key thing to remember is that the genome has a syntax and more than that, actually DNA is really just an instruction set. For each three-letter sequence within a gene, that’s an instruction. If you have this little micro gene right here, you can break it down, you can parse it, and you can make sense of what it’s telling you to do.
You always begin your program with an ATG, insert a methionine. This is the start of building a protein. The next instruction, GCC, and we know what this does. We’ll compile C. But in our world and in biology, GCC really means insert an alanine. The last thing is actually TGA. There’s a few ways to do this. It means stop. Stop construction of this protein.
This is what this little program actually means. It’s begin construction, insert a methionine, insert an alanine, done. And then you’ll get this little two amino acid peptides and it won’t really fold much because there’s only two of them. But they’ll then orient themselves in space and if you have more amino acids, they just start folding according to the properties of their side chains. That’s how you get proteins.
Mutations are modifications in these strings. You can change GCC to GAC and this changes the program. Instead of inserting an alanine, now you’re inserting an aspartic acid.
A lot of mutations are actually benign. We all have tons of mutations that make us different from the standard reference genome. Some will destroy function, some will actually increase function, which may be good or might be bad, and some will change function entirely. They can make it toxic, they might actually wipe it out. It just depends on the particular mutation.
But this is the team that ultimately helped us look into Bertrand’s genome. There was a new technology that came out that made this suddenly practical almost overnight and it was done on a research basis. It’s called exome sequencing. With exome sequencing, you look at the 2% of the human genome that actually encodes proteins.
It turns out only 2% encodes all the proteins you have and almost all the changes that drive human disease occur in that region, at least major human diseases. If you want to change one bit and get the most change to a human being, that’s the region you change it in. We tend to find genetic disorders clustered in this 2% of the genome called the exome and suddenly became possible to sequence only the exome. You got this massive cost reduction overnight in genomic sequencing.
When we looked inside Bertrand’s exomes, we found that he had two mutations in a gene called NGLY1 that destroyed the function of this gene entirely. These are basically just syntax errors in the genome. One was a stop instruction got inserted in the middle of this gene. You get half the protein out and it just says eject. You get a broken protein. The other one actually deleted a character. It shifted all those instructions off by one character. Now, everything after the shift was just garbage.
It still had a meaning in terms of amino acids but they were nothing like the original amino acids that were there. They really are just syntax errors and you can spot them like syntax errors. These are the easiest damaging mutations to find in a human genome. This thing just stood out like bright red. There’s probably something bad that just happened here because almost certainly this protein is gone and it was.
And then they said, “Oh, by the way, we think we found the cause and if this is it, he’s the first patient to ever have this disease. No one’s ever seen this disease before. He’s the only one we know of in the world. He really is an N of one.” That was about seven years ago now sitting in a lab at Duke getting that answer.
Of course, I said, “How do you know that these mutations drive the disease?” And then this became a process which we now repeat a lot these days for many patients which is called varied interpretation, where you say, “This mutation has happened. How do you know that this really drives what we’re seeing in the patient?” We took essentially all the aggregated medical records we had for Bertrand, all these data points, and we tried to map them back onto the function of this missing gene to see if this explained what we were seeing inside Bertrand.
What this particular gene, NGLY1, does and this won’t mean anything to you. I guarantee you didn’t mean anything to me all those years ago was it’s an enzyme that is responsible for deglycosylating misfolded n-linked glycoproteins after they had been retrotranslocated from the endoplasmic reticulum into the cytosol. Yeah, that’s the sentence I heard seven years ago and my functional programed brain went, “What? What do any of those words mean?” I understand it now. It’s become part of my life’s work to understand it.
But in a nutshell, Bertrand is missing a recycling enzyme. There’s a certain kind of cellular waste called misfolded glycoproteins and this thing was suddenly gone. If that thing goes away, the trash piles up. It’s really as simple as that.
That’s really what it was. They said, “Yeah, we’re pretty sure this is the cause of the disease. Everything seems to line up.” And of course, they said, “But there’s nothing you can do about this. He’s the first patient. There’s no drug companies working on this. We can’t even give you a prognosis. No one knows what this disease will do to him at this point.”
But I just disagreed. Maybe it’s my nature as a scientist to refuse to believe in nonactionability, but I said, “Everything fundamentally is actionable. It may not be clinically actionable. But there’s always something you can do.” That something is always going to be science. You can always run experiments and find out more and maybe that will lead you to some actionable insights for the patient.
In precision medicine, for a lot of the cases we deal with, we don’t have a drug right away but we can always prescribe an experiment. We do focus on that pretty aggressively.
That’s four years. Four years down into the odyssey. We finally get an answer that he has NGLY1 deficiency. I’ll start to abbreviate some of the stuff that happened next.
I started turning a lot of my [inaudible 00:16:03]. I learned a lot of genetics at that point. At that point, I dove into glycobiology to try to understand what was really happening inside Bertrand. I also knew that there was no way we were going to have one family take on an entire disease. I knew we had to find other patients that had it. I ended up writing a blog post. This is the blog post that I wrote, it was really designed to do two things. It had to go viral and it had to rank very highly in Google search results. I did some SEO on this.
What I wanted to do with this is I wanted to bring other patients to us. I want them to find this when they were typing in things like “lack of tears.” I wanted them to land on this page and figure out that they had the same disease. The title, picture of Liam Neeson. That’s all the clickbaity stuff you need to make it go viral because the internet is just so sadly predictable in certain ways. It actually worked. It really did go viral. I sometimes wonder if I put a cat picture right there, would that have done just as well. And it probably would have. Honestly, it’s just how these things work.
That post spawned a number of things that led to an article in The New Yorker and led to me finding patients all over the world. We found about 70 patients for this disease all across the world the last seven years. You can make principal estimates of how many there are total based on the frequency of the pathogenic alleles in these population databases. I estimate there’s about 500. There should be 500 patients that have this disease somewhere around the planet.
Things started snowballing from here. The New Yorker article led to my involvement in something called the Undiagnosed Diseases Network where the goal, this is a brand new NIH national project, to scale up what we just done for Bertrand but for literally the entire country and it’s a project that is still going. I’m actually now one of the co-PIs on the project. Finding all the patients led to interest from NIH and said, “We want to do a study on all these patients.” So all the patients with this disease go to NIH once a year for one week at a time to do really in depth study on their biology.
Now, if you start to combine everything that we learned from the patients and from the biology, we’re able to start making predictions. At the same time, my engagement with the Undiagnosed Diseases Network led to a faculty position at the Harvard Medical School. I still haven’t told them I have no formal medical training beyond a C in sixth grade biology. But for some reason, that doesn’t bother them.
Now the therapeutic predictions is where things started to get really interesting. This was two years post diagnosis here. I was able to make a principled calculation that something might be beneficial for Bertrand and the way I did it was I said, “In the absence of this enzyme, some things are going to pile up. We know what those are. Those are these misfolded glycoproteins. But is there anything Bertrand’s missing as a consequence of not being able to recycle these things?” There were some answers there, too.
There isn’t even a computational way to do this but you can do with pencil and paper, too, which is what I did at the time. I said, “Looks like the sugar is attached to these glycoproteins aren’t going to get properly recycled. What if this recycling process isn’t just to get rid of trash? What if it’s actually an essential source of these sugars too in this particular part of the cell?” It turns out it is.
It turns out you are deficient in something called N-acetylglucosamine if you don’t have NGLY1. Your cells are about 90% lower in this than they otherwise would be. This is actually a pretty important metabolite. It has a lot of functions inside the cell. Here, Bertrand was missing it. I did what my computer scientists self knew how to do at the time which is to use Google. The name meant nothing to me, you have to understand. It was like a symbol, just a random symbol, a variable name. It’s like “foo.”
I type it in and realize you can buy it on Amazon. Logic is, Bertrand is missing this. You can buy it on Amazon. So I bought on Amazon. Shows up two days later, thanks to Prime. And then there’s like a Star Trek level ethical problem now. Do I give it to him?
The first thing I had to do was let’s just see if it’s somewhat safe. I ate an entire bag of it in one sitting. And I woke up the next morning alive and said, “FDA phase 1 safety testing is complete.”
But I was still like, “Wow, Do I give it to Bertrand?” I don’t know. I did give it to him after because occasionally, when he would have these severe hospitalizations, I remember a moment sitting by his bedside of the hospital thinking, “My God, if this thing was sitting on my shelf and I never found out what it could have done for him and this is the end for him in the hospital, I don’t know how to forgive myself as a parent.” So once he was out and stable and healthy again, we did a little end of one trial for Bertrand.
I didn’t know what to expect because there’s a lot of components to this disease. It’s hard to predict what downstream mechanism this might actually impact. What it did for Bertrand was after about three days on this compound, I was making him breakfast, went up to feed him breakfast, I looked over into his bed and he was crying. It wasn’t just like crying crying. He was actually making tears for the very first time in his life. I was just shocked. I just never had seen this before. And remember his eyes were just in terrible shape from this chronic lack of tears.
Suddenly, there’s tears rolling down his cheeks, and I go, “Oh my God. This is different.” The only thing I changed recently was that supplement. I thought, “Maybe that’s what did it.” Turns out it is. But at that point, I did what I think any normal parent would do when they see their child cry for the first time, I collected his tears and I packed them on dry ice and I shipped them into a lab in California for analysis. I really did that.
In fact I collect his first 80 tears, I think, and sent them all away.
Those first tears ended up swelling into an ocean of science for the disorder as a whole. We learned so much about the biology of this disease from making him cry–it was really astonishing. You won’t get that that’s a good thing. But when you give this talk or something like this to biologists, they go, “Oh, that’s great. It’s nice you can help people. But we’re biologists and we have different concerns. We want to know if it works in flies.” Let’s find out.
We genetically engineered flies to have the same disease. Turns out it’s pretty bad for flies. About 80% of them die without this gene. And then we go ahead and we fed them this sugar because that’s what it is. It’s a weird sugar is what it is ultimately. When they’re raised on the sugar and maintained on this sugar, their survival rate shoots up to about 80, 90%. It makes a big difference for flies. I don’t know if they cry or not. I don’t know if flies have tears. But they’re alive. That’s a big difference for them.
I was inspired at that point to see, “I’ve been able to help Bertrand somewhat. Can I start to help others and how much further can I go?” Around that time, I ended up co-founding a company that does drug screens specifically for ion channel driven epilepsies where we try to find an existing drug to work for them. That ended up been so successful that after two and a half years, we were acquired by Q State Biosciences. That was last year. And then I actually end up getting a grant somehow to work on novel therapeutics development for NGLY1 deficiency. These are collaborators at the time back at the University of Utah.
I’ll short circuit what we did here. We played around with planarian worms. A lot died in conduct of this research, I’m sorry to say that. But we learned a lot of things about NGLY1 biology yet again. In particular, we found that if you take a second gene out of worms that are already missing NGLY1, they survive and you can sort of rescue the major features of the disease for these worms. That gave us a drug target.
By realizing that it was better to have two genes missing than one missing, we could say, “Well okay how do we chemically disable that second gene in Bertrand.” That meant we had to find an inhibitor. Even if you don’t know anything about drugs, you’ll see on the labels of bottles of things are inhibitors for this or that. A lot of drugs are inhibitors. They inhibit some mechanism you have inside of you. We found a mechanism that we wanted to inhibit inside of Bertrand and it’s a gene called ENGase.
Turned to computational screening. We did what are called docking simulations. The shortest explanation of this is it’s exactly what video game programmers do with physics. It turns out molecular dynamics, if you want to precisely simulate a protein, three dimensionally physically interacting with some other molecule, that’s incredibly expensive. In the same way that simulating physics in a video game is incredibly expensive. Video game programmers make all sorts of shortcuts and approximations and cheats and hacks. That’s what drug designers do as well.
Let’s assume the protein is a rigid structure, first of all. Now, things just start to get a lot easier. You just say, “Is there any small molecule out there that is inverse in shape and charge to the catalytic domain on this protein?” That’s a much more tractable problem to solve. In fact, we’re able to screen 200,000 different structures in this fashion over the course of a few days. We found 70 compounds that looked like they were inverse in shape and charge to that catalytic domain, fourteen of them happen to already be FDA approved.
This gets to a major feature of precision medicine which is trying to leverage existing approved drugs and really the side effects of those drugs to do something you want.
When we took them to the lab, we tested them using mass spectrometry and one of them actually worked and that was Prevacid. Prevacid is for acid reflux. Sure some people in this room take it. It’s a widely used drug. Its designation is a proton pump inhibitor. It is a proton pump inhibitor. That’s what it does. However, this entire time, it had this hidden side effects that no one knew about. It’s also an ENGase inhibitor.
What Bertrand needs is to inhibit ENGase. Prevacid looks like a promising therapy for him. He’s been on that for a few years. We’ve seen significant gains in that time period for him in terms of his development and communication capacity.
And then things got sort of exponentially weirder for me over time. I was giving a lecture at the med school in Stanford on Monday or Tuesday, I think it was, and I got a random email saying, “Can you come to The White House by Friday?” I thought, “Yeah, I guess. Sure. Why not? I guess I’ll do that.” I showed up at The White House on Friday. As I’m landing the night before, they sent me these really cryptic instructions, “Just let yourself in and wander around. Do whatever you want,” I guess, and then, “Just go to the portrait of JFK at 10:30 and someone will meet you there.” All right, fine. So I did that.
Showed up at The White House. Went through security, said, “Put your phones away” so I hid it in my coat pocket, kept taking pictures. That’s the basement. That’s the Great Hall. I just gave myself a tour. That’s the basement. Looked like the library down there. Literally, I just had an hour to just do whatever I wanted. It’s weird.
That’s the china. There, I found it. I found the portrait of JFK. I was waiting there at 10:30 and sure enough, somebody opened another door and said, “Come right this way.” I was in the suite of what I now know are the staterooms. I just wandered around there for a while. Again, give myself a tour. Taking pictures.
And then at some point, one of the guards, “You can use the furniture. You don’t have to stand this entire time.” If I can use the furniture, I will use the furniture. I’m going to sit on every piece of furniture in this place. So I did. Sit on all the things.
Don’t let computer scientists in The White House. We’re very weird people. We will test the limits of all the rules you give us.
Suddenly, they’re like, “Give us your cell phone.”
Honestly, it wasn’t a problem because then this guy [Barack Obama] walked in. He said, “I read your story at The New Yorker and I was curious if you’d be willing to help out on this initiative I want to set, called the Precision Medicine Initiative. “What I really like to do is scale up what you’ve done for Bertrand but for everybody.” And I said, “Okay, I’ll do it.” What else do you say? Honestly.
And that, for me, was the start of a three-year engagement with The White House. One of the very first things I did was put together a white paper describing the architecture of this forthcoming scientific initiative to build a scientific foundation for all of precision medicine. Today, that program is known as the All of Us initiative and what it’s doing is it’s collecting a million genomes and medical records for Americans so that we can build essentially a correlation between mutations and impact on health.
That’s really what we’re trying to do. Because it’s really hard to figure out what mutations mean. If you go down to the molecular and try to extrapolate what that means on the other side for a human being.
You can think of this as building the Rosetta Stone of the human genome where on one side, we have genomes and mutations and on the other side, we have medical records. Again, we can draw these correlations. Spent a lot of time with the Million Veteran Program which already collected half a million biospecimens at that point and had half a million genotypes if not full genomes. There was a lot of data there to play with.
I ended up launching a small program within the Precision Medicine Initiative called PEPMA, which is the Patient Empowered Precision Medicine Alliance, and the whole goal of this was just to repeat what we’ve done for Bertrand but several times over for other disorders. Just want to quickly identify treatments and we did it for about five.
The whole point of this was we wanted to have some things we could point to say, “It’s not just Bertrand that’s done this. We’ve been able to do it again.” And so we did it for five different ion channel driven epilepsies plus one ubiquitylation disorder. We did find treatments in less than 12 months.
When I went back for one of my final briefings with Obama, I say, “Look at these, we did it again. I think that your vision for precision medicine is quite real. I think we’re going to pull this off.”
You may recall. There was an election and I actually stayed at The White House. I got an invitation from the transition team led by Pence to stay on because they didn’t have enough scientists. So I did.
It was interesting. I did that for about a year. And that was a very interesting year. I’ll put it that way. After a year, I quit. I guess it got weird enough.
I think legally if you look at the documents I signed on the way out, I think all I can tell you about that time period with normal blood alcohol levels, is that it was interesting on a day-to-day basis. I infer from news reports that more or less as much the same these days.
Then I got an unbelievable offer from UAB. They said, “We want to build a whole institute around precision medicine. Would you be willing to come down here and build it from scratch and achieve…” The White House has been trying to set this up at a national–lay this foundation. Here, suddenly was the opportunity to go use that foundation and take it all the way into the clinic in a scalable way. That was the goal, to scale up everything but on the ground and the health system at UAB for those patients.
How do you do this in practice? How do you do precision medicine over and over again?
A lot of it is drug repurposing. Doing a drug from scratch for a single patient is just not even close to tractable. It costs $2.6 billion on average to make a drug. But it’s very easy to see if something else out there actually works. That’s a lot of what you do with this. You take the patient, you extract from them a model of their disease, it could be even their own cells, and you find a way to do what’s called phenotyping. You do scientific experiments or assays to observe the disease in that model.
Now, once you have a signature, this phenotype for this disease, then you can start testing drugs on it and see if anything out there rescues it so that you have the model going back to what looks like a healthy state. And then the hope is that whatever worked in the model also works in a human being and sometimes, it does. Sometimes, it’s not potent enough and safe to do medicinal chemistry. But there are lots of ways to deal with going from what works in a model up to what works in a human being.
We’ve done this for other things like SCN8A epilepsy where we really did screen 2000 drugs against the cellular model of this epilepsy. If you want to know more about it, I’m not going to go into the biology stuff here, you can read about it. We published it since then and the kinds of stuff we did.
One thing I will show you, though, is the cutting edge of this kind of screening. This is the company that bought my company and they do just like the most ridiculous stuff ever.
They modify neurons with two different genes. They put two genes into the neurons. One of them is sensitive to blue light and excites the neuron. The other one emits red light when it’s excited. If you raise a population of neurons from some patient, which you can derive from their skin, and put these genes in, you can grow these neural networks up and then you can poke their brain. There’s this externalized brain for these patients and watch what happens. You are watching now what happens when you excite manually by choice an individual neuron. You can watch the propagation of signal to neighboring neurons with just extraordinary resolution.
This is something we can do now on a patient by patient basis. It’s not cheap yet, but we can do it. We can do this to build really high resolution models of almost any neurological disorder that has a genetic component to it so far. This works on ALS and Alzheimer’s and a lot of epilepsies very well. And then we can start dropping drugs on it and see if anything rescues the phenotype that we observe.
Another way to do it, this is the way that gets back into functional programming is to use logic programming. Some of you probably know Will Byrd. He’s the developer of mediKanren and he’s down faculty at UAB with me now. We’ve been using logic programming to do drug repurposing.
And so really, the way this happens is we were just applying good old fashioned logic to all the biomedical knowledge that we can get our hands on. The shorthand version the way I describe it is take old school logic, combined with high-speed automated reasoning to achieve superhuman-like deduction, with clinical insight. That’s what we’re trying to achieve with mediKanren. You can ask at very low level questions. You can say, “What’s an inhibitor for this particular mechanism?” Or you can get to very high level questions like, “What might be a treatment for this disease?”
For example, if you ask it about, say, overactive bladder, it can come up with 145 different suggestions for what to try. If you ask it for a disease that has no known treatments at all like Fanconi’s anemia, it actually generated 10 of them, the top of which was xylitol, the kind of stuff that you actually find in chewing gum.
The reason it said xylitol looked like a compelling treatment for this disease is that there’s a logical explanation. It says xylitol inhibits alcohol dehydrogenase. Alcohol dehydrogenase produces reactive aldehydes. Reactive aldehydes caused double stranded breaks in DNA. Fanconi’s anemia patients cannot repair double stranded breaks. There’s this chain all the way through it says, “This is why xylitol might work.”
I’m not even kidding. I’m talking to people like Wrigley’s research about trials of chewing gum to prevent head and neck cancer in Fanconi’s anemia.
How do we do this? How do we build this tool?
We did a lot of reading. Obviously not us. It’s all natural language processing. One of the first data sets to come in was all the abstracts published in medicine, which is about 30 million of them. We synthesized this into a knowledge graph so that the nodes are things like genes and drugs and symptoms and diseases and then you have relationships between them.
For example, there’s an edge in there now that says Prevacid inhibits ENGase. Something we learned as a consequence of my research and now, you just have a big old knowledge graph and you can query it using logically meaningful relationships so that you can say, “What is the biological significance of this concept relative to this concept?” You can prune pass that are sort of just pass that don’t have any real biological meaning.
The core relations inside of mediKanren are things like this drug inhibits this enzyme, this disease manifests with this symptom, this gene or this drug increases this other gene, and I think there’s about 16 of them that we initially started off with. I think we have 80 different relationships between concepts.
Now, and from this, you can form an artificial relation. May_treat is something where it says, “This drug may treat this disease if this gene causes this disease and this disease is known to increase the expression of this gene and this drug decreases that gene.” This is a very, very crude example of a way to look at a synthetic predicate.
Of course, you can represent this as something like an inference rule. A lot of what we do as an inference rule, the kind of which you might play around with sometimes in PL.
And then, in general, we have strategies for how to find novel relationships. For example, if blue increases red and red decreases green, you can infer that by increasing a decreaser, blue will decrease green. Even if there’s no known target for going after green, if we can pass through another gene on the way, sometimes we can find an indirect way to hit that target. In fact, this turns out to be tremendously useful for finding new uses of existing drugs. And again, we can represent that as inference rules as well.
Why do proofs really matter? They’re helpful for actually finding these things but turns out proofs themselves matter because physicians don’t believe you unless you give them one. Now, they’re not looking for a proof tree. They want something rendered in English text that they can comprehend and what they really want is a proof.
We’ve embodied this at UAB in the form of a research consultation service. Our goal at the service is just always find the next step for any given patient that reaches out on their diagnostic or therapeutic odyssey. We do follow a step by step process which you can think of as an algorithm.
Just to show you the algorithm in high speed and flowchart form, I have it right here and then I’ll zoom in on the core of this algorithm in just a second. But it’s really broken into two parts. There’s a diagnostic part where you’re trying to zoom in on the core mechanism and there’s a therapeutic part where you’re saying, “As a consequence of this mutation, what has happened to some molecular mechanism in this patient and how do we compensate for that?” The goal is to ultimately end up doing an N of 1 clinical trial where you measure the impact of the patient on that drug.
This is the heart of the algorithm right here and it really explains almost everything that we do. We find the core mechanism and then we ask what’s happened to it and this applies beyond genetic disease. Almost every disease has mechanisms involved in some way. We say, “Is it overactive?” in which case inhibited. “Is it an underactive?” in which case activate it. “Is it absent?” in which case compensate for it. “Or is it toxic?” in which case eliminate it. We’ve got big game plans behind each of these depending on which direction we go with any given patient.
We have seen lots of patients. We started this program about a year and a half ago. Patients can just reach out and say, “I got an ultra rare disease,” or “I don’t know what to do at this point. Can you help me?” People do reach out.
The question I’ve been asking over and over again is how do we scale this up. Clearly, automation is a big part of this. There’s no way we can do this with manual intervention the entire way. But it turns out, it’s not all automation. Actually, we do have a secret ingredient that really makes this process go and that actually turns out to be people. In particular, undergraduates.
We have a good team of undergraduates. Undergrads specifically for a couple reasons. One is that they’re really cheap. Just be honest, they are very cheap. The other is that they exist everywhere. If we’re trying to build a model that other health systems can replicate, if you’re using undergrads as the substrate, you could put this almost anywhere. What I’ve been trying to do for the past couple years and I think I have proven beyond any reasonable doubt is that yes, you can get undergraduates to do this. You can teach them this process. You can have them use the tools we built to find treatments for patients.
These undergrads end up as the router between the scientific literature, the scientists, the physicians, the patients, and then the AI tools that we and others have built for finding treatments. They do it all. We actually have gone so far as to build even our own EHR that allows us to very rapidly review all the cases that we have under consideration at any one point in time. I’ve also built up a wonderful faculty and staff at the institute for doing this as well, here’s a few of the folks we have right here, so that we can run more or less any kind of experiment that a patient might need run on their behalf.
What I’ll do now is just give you some examples of the kinds of things we have done with this research consultation service. You can see what precision medicine looks like in practice on a day to day basis.
One of those undergrads was dealing with a patient that had a mutation in a gene called TMLHE. She figured out this is a loss of function in this gene. Only one of the two genes is still functional for this particular patient. This led to deficiencies in carnitine biosynthesis. This patient was short on carnitine at this point. It seemed logical to recommend supplemental carnitine as a theory for the patient.
This undergrad wrote this into a research report, sent it to a physician, the physician believed her, because she said, “Here’s the logic. Here’s why you should do this. Here’s why this is safe. Go ahead and try it.” And they did. This patient who was suffering from lots of seizures who now suffers from almost none by aggressively supplementing with carnitine.
And then, Jillian had a patient. They had a mutation in RHOBTB2. She inferred It was a gain of function, in this case, and that’s because the mutations occurred in a domain responsible for degradation of the protein. By breaking the ability to degrade the protein or have it tagged for degradation, you have an abundance of this protein. She could not find a direct down regulator of RHOBTB2 but she did find another gene that regulates RHOBTB2 called E2F1 and she found a down regulator for that gene. That happened to be celecoxib.
That report, which she generated, has been accepted by a physician in Brazil who since put the patient on, and the parents are now reporting back after a couple months that… I’ve met patients with this disease and they’re effectively vegetative. It’s such an awful disease. These patients apparently are alert and looking and focusing and paying attention to things really for the first time in their lives. That’s big. We got to do some science to see, “Is this real? How do we measure the endpoints?” Because there’s other patients out there that we know with this disease and we’d like to capture what’s happening with them based on the experience in Brazil.
And then, Lindsay was working on a patient with a mutation in MAPK8IP3. She reasoned it was a partial loss of function in this gene and when she looked it up in mediKanren she saw right away that vitamin A just some toxic quantities will actually potently upregulate this target. You go from half the amount of the gene to closer to normal expression of the gene. The patients with this disease, they don’t walk. They can’t even stand. There’s not many of them, but of the ones we know of none of them could walk or stand.
But after about six months on very high dose vitamin A, and don’t try this at home because vitamin A can actually be very dangerous if you’re not careful. If you eat polar bear liver, it will kill you because it has too much vitamin A. This patient actually started standing and not just standing but also walking. The mom started sending us these videos saying, “My kid is actually starting to not just stand but take steps.”
He’s been taking ever more steps since and she now feels a moral obligation to report this finding to the patients and the patient community just like, “This is clearly working for my kid. I have to tell the other parents what’s going on.” We’ve worked out a strategy for how she can responsibly disclose this to other patients and how we can help them find the right dose of vitamin A.
I think it’ll be helpful to just show you one example of a non-genetic disease or there’s probably a genetic component here but we didn’t find a gene for this. That’s intractable cyclic vomiting which sometimes people just get–it’s awful. In this case, it was a 19-year-old woman. She had multiple episodes of vomiting every single day. She’s five, four but she weighed 78 pounds by the time her parents reached out to us and they were just desperate. They said, “We’ve tried everything to try to help her. We don’t know what to do at this point.” But she’s permanently hospitalized. No one knows what to do.
But we said, “You can’t have tried everything.” Everything is a big thing. Because there’s 30 million papers out there, 64,000 of them are on nausea, 4,000 are on treating nausea, and there’re 374 distinct treatment modalities for nausea reported is somewhere in the literature. I said, “You haven’t tried 374 things. We guarantee you there’s stuff left to try.”
We just ranked it. We did a histogram saying, “What is the most popular treatment for nausea?” And then all the way down to stuff that has only one paper in support of it.
At the top is Zofran, the most popular antinausea drug there is. But if you keep going down, you eventually find the region in this curve where she’d never tried it. And then we could take that region, we could rank it by safety and say, “What’s the safest thing in this unknown known region where it’s published but people don’t just know about it because it hasn’t been widely cited?”
In that region, we found all the way out at three citations, nasally inhaled isopropyl alcohol. Rubbing alcohol on a cotton pad and you sniff it when you feel nauseated. We sent that back to her team. They recommended it and literally in the same day, she came back and said, “Oh, it worked. I just aborted an episode of nausea that would have led to vomiting.” And it kept working.
When she first checked in again, it would have been eight months without vomiting after vomiting multiple times every day for years. She was up to 125 pounds from 78. More than that, the reason they checked in was that she got married. She got out of the hospital, she married her high school sweetheart, and they moved to San Diego. It never would have seemed possible for her and what blows me away is that the solution was sitting in her bathroom cabinet the entire time. It was right there. She’s got her life back as a consequence. There were no genetics, no molecular biology involved, it was just ranking publications and having applied NLP to literature. That’s all we had to do.
When people reach out with more chronic conditions, we can run these queries too. We can give you a histogram of all the standard treatments and very quickly a long tail of very weird treatments. Sometimes, you get very lucky in that long tail. If you’re interested in observing this process, you’re certainly welcome to come down to UAB on a Monday. We hold case reviews every Monday and just review every active case to see if there’s an update to process. We are moving into the UAB health system.
If you’re wondering whether mediKanren has made predictions for Bertrand. It absolutely has. If you run NGLY1 through the system now, it will tell you that he should take broccoli. Really, specifically sulforaphane, a component within broccoli. Luckily, you can get this in condensed pill form because otherwise it would be 60 pounds of broccoli a day which is just obnoxious quantities of broccoli.
It builds a little proof tree it says, “This sulforaphane may NGLY1 deficiency because when you have NGLY1 deficiency, you can’t activate Nrf1. Nrf2 can compensate for inactive Nrf1 and sulforaphane [inaudible 00:47:04] Nrf2.” There’s a very simple logic for why you’d want to do this in NGLY1 deficiency.
Yet another way is actually through static analysis. There’s a really interesting… I guess this is a functional programming language, if you think about it, it’s like a graph rewriting language. That’s what this language is. It’s a graph rewriting language. It is a functional language. It’s called Kappa and you can model enzyme reaction systems using this language.
And then you can simulate the effect of removing an enzyme. You can simulate the effect of removing a gene. We’ve done some of these simulations. What happens in the absence of this gene? What goes up? What goes down? What disappears? What appears? You can use static analysis to avoid the simulation step and just predict what’s going to appear or disappear as a consequence of taking out an enzyme.
People have actually done this work too. People in our field that worked on static analysis programs have done static analysis of biological systems encoded in Kappa.
These days, if you’re wondering how Bertrand is he’s actually doing pretty well. He’s a happy little kid. I can’t really complain. He’s got two younger siblings who he loves to play with. He’s more or less happy all day every day. His vision is perfect. He has no seizures anymore, down from hundreds per day, and I’m optimistic about what the future holds. I think we have a lot left to do for him as we chip away at it one mechanism at a time.
In fact, even more recently, he’s begun using an eye gaze communication device, I guess he’s been doing this for about a year and a half now, and getting better and better with his vocabulary as we engaged him where he even asked for pet fish specifically a whale shark. But we did the best we could.
I’ll do one last story. I’ll tell you that we do the same thing for cancer too. We have a whole clinical trial pipeline where we genotype tumors, we look for the mutations driving the cancer, and then we look for medications that counteract the effect of those mutations but that’s just cancer. That’s boring.
We’ve had success stories where we found out that there was this guy who had intractable prostate cancer. It turns out genetically speaking the tumors were ovarian cancer. It can happen. It’s a weird thing but it can happen. We actually had to treat him with an ovarian cancer drug.
Final example. This is back to Bertrand. Last May, Star Wars Day, I took him to the Science Center in Birmingham. Took him down to see the fish and then five hours later around dinnertime that night, he was almost dead. He had septic shock. I go, “Okay so that’s weird.” I didn’t even know what septic shock meant at the time. Again I’m not a physician. With this for us was the launch of a second diagnostic odyssey.
While he’s going into the ER, I’m asking mediKanren three questions. I said, “What causes septic shock? What are the consequences of septic shock? And what are the treatments for septic shock?” And it spits back albumin. As one of the things way down on the long tail for treatments for septic shock.
That’s important because we know from the NIH studies that Bertrand has chronically low albumin. If you have low albumin, you’re going to die with septic shock. You will die. I told the medical team, “Look at this paper. Look at this paper. Give him albumin.” And they did, they actually listened to me. He got out of bed and he survived the first night.
Septic shock by itself has 50% mortality and if you have these aggravating factors, it goes up real fast.
That began week one in the ICU, all the cultures were negative but he was kind of stabilizing. And by the end of the first week, he was on high flow oxygen. You can see that right here. Into week two, slow improvement continued, very slow improvement. He got onto very low flow oxygen and they felt comfortable discharging him on Thursday with home support oxygen with a mild fever. But unfortunately, 24 hours later, he was back. He was in extreme pain and when we took him to the ER, we realized a lot of other stuff started going wrong, too.
He had a bone infection. He has swelling in his abdomen. He had fluid outside the lung building up so fast that it collapsed his lung and yet his culture is looking for the organisms causing this were all negative. This began as sort of a rapid medical interventionism program where he had two surgeries to clean out the infection in his bone. He had to get a chest tube because you have to get the fluid off the lung. And this is him shortly before he was supposed to go into surgery for this. And then unfortunately, four hours later, when he was about to go in, he coded in the ICU and came about as close as I think I’ve ever seen him to dead.
When we took this photo, we actually were basically just saying goodbye. I thought this was the end for Bertrand this year, this right here that night, in the ICU. But I pleaded, “Please operate. Please, let’s try this.” At 1:00 AM they said, “He’s stable enough now. We’re going to wait for morning to do the procedure.” Then by 8:00 AM, he was definitely not stable because his hematocrit had plunged to 19%. They said, “Now, his hematocrit is so low, we can’t even do surgery.”
We got a blood transfusion. They put in a chest tube. They put it in the wrong place, so he coded again. He got another chest tube and this one actually worked. We were able to finally get the fluid off of his lungs and it somewhat stabilized him. Unfortunately, the cultures were still negative. We still had no idea what was driving this disease and this brought us three weeks in to Memorial Day weekend.
I’m sitting there in the hospital thinking, “Let’s try this. Let’s try this.” It’s just one after another. I’m just aggressive like that. The resident said, “Whoa, hold on a second. This is Memorial Day weekend. Nothing happens on weekends in hospitals.” This is actually true. Nothing happens on weekends. Don’t even bother going to the hospital on the weekend because nothing happens.
I said, “Nothing’s going to happen. I’m going to write some code. I got to figure out what this thing is.”
I created a diagnostic extension to mediKanren. It’s pretty simple actually. It’s got this knowledge graph. It knows what diseases cause which symptoms so why not just run it in reverse. Let’s say if you have some disease that explains all four major symptoms, put that up at the top. You literally just got a ranked list of all known diseases that could cause the constellation of symptoms we saw in Bertrand. If you look at the ones that explained everything, there weren’t many. There was just a handful of diseases that explained all the symptoms that we saw inside of Bertrand.
We started aggressive testing for these and in parallel with that did metagenomics. Your metagenome is all your DNA plus all the other DNA floating around inside of you. We all have pathogens or weird bacteria at all times and some of those shed their DNA. You can see them that way. If you take a sample from somebody and you look inside that sample, you will see the organisms, but you also see their DNA. It’s just there. You can use this as a way to find what’s actually inside somebody. And I want to know could we use this for Bertrand to find the bug inside him.
Checked his blood and pleural fluid out of the hospital from pathology. We drove it up to Huntsville, Alabama where there’s a sequencing facility. I relayed at a Jack in the Box to this guy my number two at the institute, Andy Crouse. He got into this guy, Shawn Levy, who runs the sequencing lab and he did 40X shotgun sequencing four times over in three days on these samples.
So now, we had enough data to figure out what this thing really was. If you look at a distribution of all the organisms inside Bertrand, oh, God, there’s a lot. There’s always a lot in all of us all the time as it turns out. This was not terribly useful in and of itself. But if you intersect it with the list that came out of the AI, there was only one thing left and that was Pseudomonas.
You wouldn’t typically expect to see Pseudomonas in healthy people. But it was plausible that it had made its way inside Bertrand. The question was, can we prove that this is what it was? And it turns out, we had enough data to do that too. Because the way we ran the experiments allowed us to localize the pathogen to his lungs. What we did is we burst all the cells in the lung sample to check to see if anything changed the distribution of its genetic content before and after this bursting process.
The logic here was that if there’s not a lot of DNA in a sample then you can’t see it, but if you burst the cells and whatever is there, you’re going to see it now through its genetic fingerprint. We burst the cells. This allowed organisms that were in the sample itself to be suddenly overrepresented. Suddenly, they got more of their DNA represented in the sample by bursting it.
We did this, and one organism had a massive shift only in the lungs, and that was Pseudomonas. Pseudomonas went from 2% of the genetic content of the sample to 13% of the genetic content of the sample. We said, “That’s clearly what this is.”
The other big hint was that if you looked at the viral DNA floating around inside Bertrand, there was a lot of it for something called Pseudomonas phage, which is a virus that eats Pseudomonas. If something’s it’s eating Pseudomonas, it’s got to have something to eat and so we thought it was there. There was also evidence of natural amounts of E. coli. The drug recommended to kill both these things simultaneously was meropenem. It’s a very powerful antibiotic. Sometimes called gorilla-cillin because it does a lot of stuff.
After four weeks in the ICU, he was out within 48 hours and onto a regular pulmonary floor. Within a week, his effusions were receding, his abdomen was no longer swollen, he was back on to room air. When we did another round of metagenomics at the very end, just looking for Pseudomonas and E. coli, both of them were gone. No trace of these organisms in his system at that point.
This is him right before he went home and this was him reunited with his fish. This is all back in May. This is June by now, actually. He’s very happy to go home and see his fish. And then one week later, I kept my promise to him that I made while he was in the ICU.
This is Father’s Day. We drove him down at the beach in the Panhandle of Florida and took him swimming with dolphins. It’s a week after he got out of the ICU or a week out of the hospital, he was literally swimming with dolphins.
During that time, I got an NIH grant funded for a clinical trial for NGLY1 deficiency and that trial now starts in two weeks. Thankfully, he survived. Thankfully, we’ll get to try this new experimental treatment at the Mayo Clinic starting in two weeks.
I don’t want to poach anybody. But if you’ve been inspired and you’re interested and thinking about spending some spare time on biomedical endeavors, I do recommend this one book. This is my one recommendation. This book, you can read it in an afternoon, tells you pretty much everything you need to know as computer scientists to kind of understand at least the vocabulary of the field. We do have clear immediate needs for NLP. I want to reprocess the full text of everything with better more modern NLP to build better knowledge graphs. Really, ultimately, in the service of building proofs for patients as we make recommendations for them.
The take home message here is that I really don’t believe there’s anything that is not actionable whether it’s in medicine or otherwise because I think it’s certainly always been the case no matter what anybody told you that in the event of the non-actionable, in the event that there’s nothing you can do, you can always do science.
Thank you. If you have any questions, I’m more than happy to take them.