Hamid: AI has, AI research in healthcare has and maybe beyond that has four major issues right now. Four major challenges. And as you pointed out to that, so when we look at some of these challenges and some of us. Look at cases and go into it and you wanna publish the weak points.
There is, there’s not much willingness in the publication world to publish negative results validation reports. And it’s very difficult when platforms publish results that on the surface appear breakthrough. And then all of us, we get excited and we when foundation models get published in histopathology, we work through weekends to see what can we get out of it is very serious business. [00:15:00]
We are at the institution that the maxim is, the patient needs come first. So if there is something that can help us to do a faster, better diagnosis, better treatment, be all very serious about it, we wanna do it. So we take it seriously because it’s published in very high impact, very high reputational places.
And then we test it. Then it falls apart. And then we think others may be interested how we test it. Because we have to confess, we test really in a tough way. We don’t really give any way for algorithms to escape. We say, okay, we take the toughest cases and say, okay, if I really bring it in a clinical setting and I have to use it this way, what happens?
And then the last one that we did foundation models for histopathology, average accuracy on TCGA was 44%. We did zero shot, which means we did not touch them. We did not fine tune them. And I’m sure everybody who publishes the foundational model said no. You have to fine tune it. [00:16:00]
Aleks: Then you, that’s a foundational model, right?
Hamid: Yeah. And if I wanna find you, then I have many other possibilities Who says the fine tune foundation model would be the best I have to go And also find you many other smaller, leaner models that are more manageable. And we tested with a detailed subtyping, not tumor type subtyping, much more difficult.
We test that with highly imbalanced data, which is the practical, the clinical reality. So the 44% on average across foundation that, and that’s an interesting observation. Didn’t matter which foundation model we tested. On average, we came to the same number, 44%, which is consistent with what we call in computer science, no free launch theory.
Tell me about that. There is no free launch, so if you wanna be better than others, as an algorithm, you have to specialize. If you don’t specialize on average, you [00:17:00] are the same level as everybody else. You will not be the best model, you will not be the best algorithm. Which means no free launch theorem in combination with Ocamm’s razor.
Keep it small, keep it simple. Would negate the concept of foundation model.
Aleks: Foundation model.
Hamid: Really large. If they are really large and they cannot deliver, maybe if we at some point have the multimodal, heterogeneous data longitudinal for millions of patients, and then we do maybe we break through that.
At the moment, that’s not the case. So if we. If you look at this and say, okay, we cannot get there at the moment because on average we are the same because we are working on general purpose histopathology foundation model. What does that mean? That means you are,
Aleks: That is my question. I could not find it because I, what I was looking in the publications [00:18:00] I read, like what does this model actually do?
We are used to the specialized one. Okay. It shows malignancy, it’s segment cells. It does that. I was looking for a list, a table. What are all the things that this model does? I could not find it in any of the publications.
Hamid: The fact if a model does not take any action or make any decision, that’s fine.
That’s a different issue because that means it is trying to understand the data. And then you can capitalize on that understanding for diagnostic purposes, for treatment purposes, for anything you want. That’s not the problem because they are usually trained in a self supervised manner. We don’t have annotated data.
We come up with some made up fake task for the network to understand the data, detached from any task, including diagnosis, treatment, planning, survival, prediction, anything like that.
Aleks: But then I don’t understand what they [00:19:00] do. They go and understand data. What is the output of the model?
Hamid: Okay very simple thing.
I take a tissue patch, a tie, not whole slide image. Of course. I take a tissue patch and give it to the network and then rotate at 90 degrees, give it to the network again and say, look, this is the same. Of course if you rotate the tissue image for the pathologist, the same we realize nothing changes for the computer.
It’ll generate different numbers. So now we force the network to generate the same numbers if I have a tissue and I rotate it. So in order to do that, when we do it with millions and millions of tissue patches, the network will understand the texture, the pattern of the tissue, without knowing this is papillary carcinoma, or this is adenosis, or this is adenocarcinoma.
It doesn’t know what that is. It just says, I see a pattern, and then you [00:20:00] rotate it. And I figure out, oh, that’s the same pattern.
That’s the same pattern. That’s all it figures out. But in order to figure that out, it has to understand the complexity of that pattern, which is the magnificent idea of self-supervision is one of the major things that AI community has put forward.
Amazing. That we don’t need now annotated data. Without self-supervision, there would be no large language models, no foundation models, no nothing, regardless of their actual applicability and usability.
Aleks: That is my next question. Applicability and usability.
Hamid: Yeah. Why usability? We get to the point that, okay, now you say it’s general purpose, that means for the entire histopathology, so it understand brain and prostate and kidney and breast and everything.
It understand so that to begin with, goes against what we are doing in pathology. Because in pathology, everybody is specialized again. No…
Aleks: In the US. [00:21:00] Because it’s, and it’s actually either US and Canada or one of the only countries where you have such a very, super specific narrow specialization of pathology and all the other countries.
It’s gen… you’re a pathologist.
Hamid: Look here we get last number. I have, we get 50, 55,000 consult cases.
Across US and internationally coming to the department.
Aleks: And I’m not saying that this is not how it should be because the nuances of the disciplines basically cause the high specialization in the US and the wider second consult.
Second consult. And why you consult a specialist. And I just wanted to let, and you know that it’s not the case in other countries.
Hamid: Let’s take breast cancer. So 80% of them on lobular or duct carcinoma. Most pathologists with a little bit of training can look at it and say, that’s ductal, that’s lobular.
If you look at the remaining 20%, [00:22:00] the remaining 20% are highly rare, complex cases. For that 20%. You need highly specialized breast pathologist, and that’s why we have consult. That’s why the 80% can relatively easily be diagnosed in most hospitals across the planet. But then for the 20% you need specialists.
Aleks: So where do you use AI as leverage? Where? Where does it make sense?
Hamid: General purpose means you wanna do those easy cases, so you wanna do only ductal and lobular, and that’s why it falls apart. When we test that with detailed subtyping of TCGA data that you go beyond the tumor type and you go in detailed subtyping, which is much more valuable and useful for a treatment planning, then it falls apart because of course it cannot do it because it is general purpose.
Then you have to do something. You have to fine tune it. You have to train another classic. Oh, okay. I thought [00:23:00] foundation model will solve all my problems. No, so it’s a philosophy now that we say the major issues with AI. So that, so I’m departing a little bit. Deviating a little bit from the big picture.
But fundamentally, one thing is that you cannot build on general purpose AI for medicine. That’s at the moment, based on everything I have seen. That’s my way of thinking, that if you wanna bring AI to the bedside of patients, we have to specialize. Okay? So that means you have to have a model for breast, one for brain, one for prostate researchers won’t like it because if I have a journal purpose model, 5,000 people will download my model.
So what if I work on prostate? Probably only 200 will downloads.
Aleks: So Hamid, when you say, oh the general will do the easy cases for me, when you say that, I’m like, okay, then it’s not a help for a pathologist [00:24:00] because pathologists can do it without clicking a mouse.
Hamid: It could be Alex, because imagine so in developing countries if you talk about countries that have no pathologist,
Aleks: Yes, that’s what I’m saying. It’s not gonna be a help for a pathologist. It’s gonna be a he a help for a healthcare professional that needs to then figure out how to get that to a pathologist and then the next pathologist is gonna have to figure out, okay, do I need a specialist or can I deal with it? So it’s like that would be the triage of triage application of this for non pathologists.
Healthcare systems or healthcare.
Hamid: But this look confusing, so I have, I’ve been trying it, trying to figure it out for myself as an individual researcher.
Aleks: Me too.
Hamid: So we throw all that data on a gigantic model and then train it with a lot of carbon footprint, and then we wanna solve the easy problems.
So I thought AI wanted help us to solve [00:25:00] difficult problems. So that’s one, one issue. Going back to the big picture. So the major issues with AI at the moment, AI taking any model, especially foundation models, large language models, they are not addressing those five accurate, consistent, fast, lean, robust.
They are not doing that. So there are, you can list many problems that they are deviating. They try barely to address one of them, or maybe one and a half or maybe two, but never five. So we can analog as we don’t have those five, we cannot get to the bedside of the patient. That’s one problem.
Foundation models, large AI models are failing.
Nobody likes to hear that. I don’t like to hear that. I have made my…
Aleks: No, I was hoping they’re gonna succeed and they’re gonna solve our lives.
Hamid: I want, I wanna stay enthusiastic and we will stay enthusiastic, but that doesn’t mean we have to be blind.
Aleks: Yeah. We have to, I, you know what, a tough balance, [00:26:00] I have to say because…
Hamid: I know. I know.
Aleks: I am always super enthusiastic about all those new technologies and then I’m like, like you say you said, oh, where are the validation reports? I’m like, that’s what’s needed to take it anywhere close to the patient, because that’s gonna be the first time you have, first thing you need to do in any lab, under any type of regulatory constraints.
And. Then it’s just like talking about technology, which is nice. But the question is when is it gonna be applicable?
Hamid: If the first problem is that you are not addressing those five conditions? Because the second problem is the data is not available yet. And it sounds very strange because most hospitals are, that have large amount of data, we are migrating, combining them in the same place most of the time’s cloud.
That creates a lot of delay. So we have to solve that problem. That problem is outside of AI community, that the data [00:27:00] is not available. AI researchers can complain about it. I have been complaining about it. The data is not available. If you give me better data, more data gotta do a better job, we’ll do a fantastic job.
We’ll, we will see how it plays most likely the next two years, we will close those gaps and then data is unified somewhere that you can operate on it. And then we will see what we can do with it. The third problem, the third major issue with AI is this out of control enthusiasm for large language models.
Why I’m saying that I use large language models all the time.
Aleks: I use them all the time too.
Hamid: Editing for this, for summarization, I do that all the time. So I’m talking about medicine. Language is not evidence.
We said evidence is imaging is lab data and maybe historic data. Language is not evidence.
Language [00:28:00] is subject to variability and subjectivity. We cannot build the future of medicine based on language. That’s a fundamental problem.
Aleks: So question here let’s take an example of language used in medicine would be a pathology report. Do you say this is not objective because the evaluation of the data that is subjective is objective.
Sorry. The other way around. The objective data. The image is the image. It’s gonna always be like that. But then you have an interpretation by a pathologist, and we all know that if we get oh 0.7 concordance, we’re fantastic which means 30%, at least we’re not that fantastic. So then it gets translated into subjective language, which is the report.
Is that what you’re referring to? Or am I misunderstanding?
Hamid: No just, go from the other side. Look at it.
Aleks: Okay.
Hamid: There will be no pathology report or radiology report without the tissue, without the [00:29:00] patient. So the pathologist is looking at something. Using his, her brain with entire medical knowledge in there summarizing it in some words, which is subject to variability.
So if you wanna really capitalize on the information and knowledge in that report, you have to bridge it back to the evidence, which is the image, which is RNA sequencing, which is X-ray image, which is blood data, whatever the data is that was involved in that in, I’ll give you an example from dermatology.
We were looking at skin cancer. We wanted to look at highly differentiated, poorly differentiated type of ous squamous cell carcinoma. And after a while we realize, oh, the tissue image is not the only source of information that dermatologists get. So what else he or she getting? They look among others at the medical photo that is taken from the lesion on the skin.
Prior to the biopsy. This [00:30:00] is a lot of information. You are not giving me that information. How do you expect me that….
Aleks: It’s the same for like glass images from…
Hamid: Whatever comes in that report is the result of several other modalities and the general medical knowledge in the head of the pathologist. So that report in itself, you cannot detach it and then work with that data and get, you can get some statistics, understand some experiments.
But based on that, I cannot come up with, I cannot come up with next generation diagnostic treatment survival type of thing.
Unless again, the bridge to the evidence is restored such that we know this is the variability, this is the evidence, and so on.
Aleks: Question. Because that’s something you were talking a lot about in the talk.
I’m gonna link to the talk from the API was the retrieval augmented generation. How does that play into the explainability, into incorporating text [00:31:00] into the multimodality? Because when I learned about, I’m like, where is it? Why can’t we use it? It’s like points you to where the information is from, and I thought that’s gonna be the next super hype.
Never never happened. Didn’t happen in 2024. Yet
Hamid: you just read my mind because…
Aleks: I just listened to your talk…
Hamid: No, that the two problems that we talked, let’s skip the data is not available. All that is AI researcher. Life of hospitals, lack of money, heterogeneity of archives landscape of corporations many reasons.
So the first problem was we are not accurate. We are not fast, we are not lean, we are not robust. The second problem was language.
Aleks: We’re so bad.
Hamid: Yeah, we’re so bad. A language is not evidence. What is evidence? Imaging lab, which means you have to go multimodal. So at the moment, there is not a single, truly multimodal system that [00:32:00] has been trained with high quality clinical data and has been validated at multiple sites.
There is no such, most likely they’ll do it in the next I, when I say VI mean the research community at large. Most likely we will do it in the next two, three years. Okay. When that data is available yes, I can go and get 50,000 cases and then get a little bit of the x-ray, a little bit of reports, a little bit of demographic, and do something.
This is all sandbox playing experimentation, which is necessary. We are, you are warming up.
But I’m talking about serious research that leads to products that can be really used in the practice. There is no say, so we need, I’m gonna quote this
Aleks: one serious research that leads to products, because life is like a funnel.
You’re gonna have so many publications, and then only, it’s like drug development. I work in drug development. You have so many candidates. One is making it into the, no, five are into the clinical trials. [00:33:00] One is actually making it to the shelves of the pharmacy. So serious research that leads to products. Lets continue…
Hamid: So the multimodality, which is evidence, brings in the necessary information and knowledge to compensate for lack of accuracy, consistency, reliability, but then speed and lean. Being lean has to be engineeringly managed. That’s a design question under robust. So the first two, accuracy and consistency is a matter of evidence.
So we have to bring in multimodality to if you look at, let’s say breast cancer. So you need the X-ray images, you need the MRI images, you probably have ultrasound, you have the tissue sample, you have the patient demographic, you have some genetic information, all that goes into a system. Then you connect it to a large language model.
And now we get to the magical wood that you mentioned.
RAG.
So now if you do even multimodal large language models or la… language vision [00:34:00] models. Still, you cannot deploy to the bedside.
Because one, one key word is the explainability, which is another word for is source attribution, backing it up.
Aleks: Yes.
Hamid: So how do you back it up? Because again, you can give 10 million patients to the large language models that do multimodal, understands X-ray, understands tissue, all that. But then I ask, how do you know that? You have to back it up. You have to tell me why you are saying…
Aleks: Because I saw this kind of image in radiology then hyper eosinophilic part of the tumor in whatever.
Okay. So basically as if a pathologist would tell you, why do you diagnose this as a squamous cell carcinoma? Because I see this, and these are the attribution attributes of a squamous cell carcinoma.
Hamid: But we trust a human being because according to the tooling. We still have nothing but the human expert [00:35:00]} to judge the performance of the AI.
We don’t have anything better. Is the human expert, the pathologist in our case, who tells us right, wrong, correct, incorrect. So the retrieval, augmented generation is now what we are seeing and is emerging very slowly is a renaissance of information retrieval.
Information retrieval in medicine was very limited and is still limited to just searching text.
You type text or you select some boxes and you search in an archive. We cannot search for x-ray images, we cannot search for tissue images. We cannot search for RNA sequencing. We cannot search for social determinants in combination with this and so on. So we do not have a multimodal information retrieval system.
Now we are in a pickle. Now we are in a very tough spot because we know. Large language models with all the capabilities cannot be used [00:36:00] unless we go multimodal, and then we have to back it up with evidence. And the evidence has to be outside of the AI, which is very interesting. AI can be backed up and explained with an information outside of the AI, not the information that AI has digested.
So because that information that is digested goes into the black box, we don’t have access to it, we don’t understand it. We cannot justify it. So we need an outside archive of knowledge. Reli… reliable knowledge, which is high quality clinical data, hopefully. Heterogeneous, hopefully free from bias, which bias will be there.
We have to deal. So now if we wanna do that, do we have multimodal information retrieval system? No, we don’t.
Aleks: No. But now it makes sense to me why you might need a general foundation model. Because if it understands stuff in the image, then that’s gonna be the deliverable from the model [00:37:00] showing me where, whichever information is, without telling me what to do with this information.
Hamid: Absolutely.
Aleks: So now I start understanding
Hamid: At the moment, so one and a half years ago, we had the possibility to start doing a foundational training, a foundational model. And I realized we, without retrieval, we cannot really check it. So we stopped. We stopped and we worked on developing a platform for information retrieval.
We still working on it, and now we are. Truly multimodal for breast. You have six, seven different modality, for example. And you realize, oh, we cannot be general purpose. You have to specialize. If you really wanna solve the tough problem, you have to specialize. It’s a very painful decision to specialize
Because again, if I have a general purpose model, 10,000 people will download it.
If I have one for prostate, probably only 200 will download. It’ll affect [00:38:00] my life as a researcher. So that’s a problem, that’s a limitation that we have to deal with. But again the patient needs come first. It’s not about me. It is about how can we bring AI to the bedside of the patient in a reliable manner.
Now this emerging renaissance of information retrieval is everywhere. Not just in large language model, it’s also in the a genetic AI. So AI based on agents. So…
Aleks: Yes. The agent. Yes.
Hamid: Which is a field I’m not working on. But the core of that is also not possible without retrieval.
Aleks: Without retrieval, because basically you have the orchestration of data retrieval from different places, which I’m of course super excited about. And I’m like waiting for this to be available. And I’m like, where is it? How long will I have to wait?
Hamid: So not only, we have some, we find out, so we don’t have multimodal information. [00:39:00]
Retrie, we need to be able to deploy large language models and foundation models. But we also find out that AI cannot be used in a reliable way unless it is connected to good old fashioned information retrieval. Why is that? So that should make us think about. What does that say about the state of artificial intelligence in general that AI needs outside help?
So now this brings us to the biggest headache I have right now, just as a researcher thinking about things.