DPP sponsors:                            

Swarm Learning: The Future of AI Collaboration in Digital Pathology w/ Oliver Saldanha

Swarm Learning in Digital Pathology: Revolutionizing Cancer Histopathology

In this episode of the Digital Pathology Podcast, we explore the groundbreaking concept
of swarm learning and its potential to revolutionize AI collaboration in the field. Our
special guest, Oliver Saldana, shares insights from his recent Nature Medicine paper
that showcases the power of decentralized AI in predicting mutations from H&E-stained
whole-slide images.

Support the Show

What is Swarm Learning?

Swarm learning is a decentralized machine learning approach that enables multiple institutions to collaborate on training AI models without sharing raw data. By allowing each participating institution to train a local model using their own data and share only the learning progress (weights and biases) through a secure, blockchain-based platform, swarm learning ensures patient privacy while iteratively improving the global model.

Highlights:

– Understanding the mechanics of swarm learning and its role in privacy-preserving AI collaboration
– Exploring the case study of predicting colorectal cancer mutations using swarm learning
– Comparing the performance of swarm learning models to locally trained and centralized models
– Discussing the benefits and challenges of implementing swarm learning in digital pathology

Swarm Learning in Digital Pathology: A Case Study

In the featured Nature Medicine paper, Oliver Saldana and his team demonstrate the effectiveness of swarm learning in predicting mutations in colorectal cancer from H&E-stained whole-slide images. The study involved multiple institutions collaborating to train a deep learning model without sharing patient data, and the results showed that the swarm learning model achieved comparable performance to a centralized model trained on pooled data.

Benefits and Challenges of Swarm Learning in Digital Pathology

Swarm learning offers numerous benefits for AI collaboration in digital pathology,
including:

  • Privacy-preserving collaboration
  • Increased access to diverse datasets
  • Reduced data sharing barriers
  • Potential for global scalability
  •  More robust and generalizable AI models

However, challenges remain, such as ensuring data preprocessing compatibility across institutions and establishing standardized protocols for model validation and deployment.

Episode Timeline:

[00:00] – Introduction to swarm learning and its applications in digital pathology
[05:30] – How swarm learning enables decentralized AI collaboration while maintaining patient privacy
[12:45] – Case study: Predicting colorectal cancer mutations using swarm learning
[20:00] – Comparing swarm learning performance to local and centralized models

[25:30] – Benefits of swarm learning in digital pathology
[32:00] – Challenges and future directions for swarm learning in the field
[40:00] – The potential impact of swarm learning on diagnostic accuracy, efficiency, and patient care

Conclusion

Swarm learning represents a paradigm shift in AI collaboration for digital pathology, offering a privacy-preserving approach to develop powerful AI tools that can improve diagnostic accuracy, efficiency, and ultimately, patient care. As the technology matures, engaging pathologists, researchers, and regulatory bodies will be crucial to establish best practices and guidelines for swarm learning implementation in the field.

Don’t miss this engaging conversation on the cutting edge of AI in digital pathology! Subscribe to the Digital Pathology Podcast for more insights on the latest trends and innovations in the field.

Keywords: swarm learning, digital pathology, deep learning, AI in pathology, decentralized AI, collaborative AI, privacy-preserving AI

watch on youtube

THIS EPISODES RESOURCES

EPISODES YOU WILL ALSO ENJOY

transcript

Oliver: [00:00:00] They both were the founders of swarm learning as a technique. So they use this in just predicting a lot of diseases. But one of the drawbacks of the study that they did is they didn’t use it on any image-based analysis. So it was all text or tabular information for prediction of COVID-19 tuberculosis or any other cases or diseases.

As I already told my supervisor, in his first Nature paper in 2016, in this paper, he showed that deep learning could directly predict mutations. So directly from the whole slide images, you could predict mutations in the Nature paper. So what idea we had was to combine both these studies to use swarm learning as a technique and show that deep learning can predict mutation.

Aleksandra: Welcome my digital pathology trailblazers today. And my guest is Oliver Saldana and Oliver is the first author of this nature [00:01:00] medicine paper published in 2022. And he’s the author of many other papers. This particular swarm learning for decentralized artificial intelligence in cancer histopathology is a novel way of doing deep learning for histopathology. And, we interacted on LinkedIn and Oliver told me about this research. And I was so interested that I did a YouTube video about this paper, Oliver. So I’m kind of talking about this paper. But first of all, if you’re watching on YouTube, you’re going to see me in a very fancy jacket and I was motivated to put this red fancy jacket on because Oliver was so fancy when he came to the podcast. So I had to take off my hoodie and match that very professional look of my guest. Welcome to the podcast, Oliver. How are you today?

Oliver: I am great. Thank you. Thank you so much. I love your energy, the way, the way you interact, and the way you introduced me. It’s wonderful. [00:02:00] Thank you so much for this wonderful opportunity. So telling a little bit about myself. I am Oliver Saldana. I come from Mangalore. This is a coastal town in India. So I did my electrical engineering in Mangalore. And then I came for my master’s in the smart system here in Germany. So as soon as I finished my master’s, I always wanted to do my PhD.

For PhD, I enrolled in RWTH and there I was introduced to histopathology. So my supervisor was Professor Dr. Jakob Nicolas Kather. So he introduced me to this wonderful field. So since then, in 2020, I have been working with histopathology, and then I started my journey with decentralized AI.

Aleksandra: So Professor Kather, I never met him personally, but since I entered the digital pathology space, he was already active there. And that was in 2016 when I entered. So he has been doing this for a long time. So yeah, Oliver, please brag about [00:03:00] your group. Brag about your papers because I’m not doing you justice. By just saying, you know, Oliver, he’s the author of this and that paper. If people haven’t read this paper, it’s a major paper.

So that is a pretty high, impact factor. And, your most important paper as we were, interacting is the swarm learning for decentralized artificial intelligence in cancer histopathology. And before we dive into what swarm exactly means.

Can you tell me about the concept of decentralized AI and why it matters?

Oliver: Yes. I would like to give a small introduction to decentralized learning. So, before we go into decentralized learning, it is important to know what is local learning or centralized learning, right? Because then I can give a better, yeah.

Aleksandra: Exactly. All the, you know, different concepts that we need to, know before we can appreciate [00:04:00] how novel your work is actually in the histopathology space.

Oliver: Yes. So, as I already told you, local learning is something where you have one center with, maybe a hundred patients. So they want to train a model. locally. So they have a physical system in which they put all the data and then they train a model. So this is called local training or somehow training in a specific center.

So if you have three centers, then how would you do training? So beforehand or previously, this training has been done by putting all the data from three different centers into a central location. This is called centralized learning, where you have one single center, that can train a model for all the infrastructure necessary as well as compute power as the storage for the whole slide image this center trains one single model and this is called a centralized. So somehow the reason and drawback in this is that you have to [00:05:00] transfer data from one center to another so there’s a lot of duplicity in data as well and there will be a lot of issues when you have to share patient data, right?

So that’s why there is something new in the field, which is also called distributed learning, where the data never leaves the center, every center has its data, but only the learning parameters are exchanged or learned or shared between different centers to train a single model. So you’re also you’re training a single model, but actually by not sharing the data.

So now comes the very interesting point which is decentralized learning. So the technique, which I already explained in this. All the learning parameters go to a central coordinator that is always to one single center or one single point. But in decentralized learning, what happens is you don’t know which is the centralized coordinator.

That’s the whole beauty of decentralized learning by actually sharing the [00”[06:00] learning, training a model by not knowing who is the central coordinator. So this is basically what decentralized learning means. Training one single model without actually sharing the data. Only sharing the inside, but still not to a central coordinator.

So the central coordinators keeps rotating. So it can be one center on each run, it can be multiple centers on different runs, but just this is what is decentralized learning.

Aleksandra: Okay. Let’s, let’s take a step back just for me to understand. So, centralized is you put all the data together and, you have one model that has different learning parameters and those learning parameters would be, weights, biases, and all the other things that are in this model. That makes it act in a certain way on one batch of images. And the images were put together. So like they were put in one central library and, okay, that’s one thing. And this [00:07:00] is, the thing that is still very prevalent, this, you know, is part of different initiatives and we’re going to touch on those initiatives.

One of the pretty famous ones in Europe is BigPicture. Well, okay, this is the way people have been doing this. Then you say there is a different way where you have one central model and do you have one other model that goes between centers or does each center have their model and then they report to the central model?

Oliver: Yes. So the model architecture remains standard or constant between all the centers. Only what was forward and back was the waves or the learning of the model. So the model architecture is constant.

Aleksandra: Let’s say we have five centers or no, let’s, let’s make it easy. We have three. Okay. We have one model in one center, another model in another center, and another model in another center.

And the center is, let’s say, a clinic, a hospital, [00:08:00] whatever, right? And then the information from those models That were learned on those local things, weights, and biases go to the central model and the central model, but you say this is not good enough. We don’t like that.

Oliver: Yes, exactly.

Aleksandra: So you don’t want to have a central model that knows everything. You want them to like an exchange. How does that work?

Oliver: Yes, that’s where the beauty of swarm learning comes in. So here, what happens is. The, here in the previous technique where we were speaking, there’s always one leader that is one center, which is always aggregating the weights and sending it back.

So weights are somehow numbers, which, which is the learning parameters from each center. So always there is one single center, which aggregates it and sends it back after certain batches or after certain updates. So in swarm learning, what happens is this central coordinator is not fixed. So as you’re told in the example of three [00:09:00].

So in the first run, maybe one of the three is elected as a leader using blockchain. And somehow, since it is… 

Aleksandra:  We have to talk about that, as well

Oliver: Yes, since it is using blockchain, it is completely decentralized and no one has the control or authority to elect this leader. So this is completely random and who elects the leader is somehow not hardcoded and it is not fixed.

That’s why it is using a technique where each, different sync the leader keeps changing, that’s why there’s not a central coordinator. Why? Because one of the centers might stop, one of the centers may not have a network, or one of the centers might drop out of training. So the training should not stop.

That is one of the very important reasons to have, not have one single coordinator, but everyone is a coordinator who’s contributing to the model.

Aleksandra: So you remove the bottleneck of one, of one instance controlling this, and okay, so if we let’s, let’s take it at scale [00:10:00], instead of three, we have a hundred and across the globe, there is no way that everybody’s going to be online at the same time.

And because we have so much data, we cannot lose time for training. So you want this to continue. And if it wasn’t complex enough, it includes blockchain. Let’s talk about that. So blockchain, my understanding of blockchain is, you know, layman’s understanding of the blockchain financial transactions where similar to what you’re saying, there’s no like one central bank, it’s the information embedded in this blockchain.

But, please, give me a better perspective from a blockchain user for histopathology, which I think is also a super novel concept. I don’t think it’s the only, it’s not the only, application in healthcare, but very novel in histopathology.

Oliver: Yes. It’s not the only application, [00:11:00] but it’s, it’s very new.

So I’ll give a small instance, like generally when you train a model, you have something called an API. API is like PyTorch, Keras, or like just the framework in which you train a model and then you have the data. So there is somehow a gap between the API and the data and in between this you have something called a control layer.

So where the blockchain comes into the picture is in this layer. So this layer somehow has to have information about the parameters of the epoch about all the necessary information that it needs to contain So somehow what blockchain does do here? It takes all this information and provides it to the API and then the data.

So somehow it is like an intermediate layer, which is like a smart contact. So somehow if you have 100 centers, you need to know which centers need to have the common parameters like the epoch,[00:12:00] the stopping criterion, the starting criterion, and so forth. So somehow all the centers need to know this common ground.

So this is where the blockchain comes into the picture. It is not having the weights and biases that are shared. It is just having smart contacts. That is this information, basic minimum information necessary for training. So what blockchain does is transmit this information to all the partners establish a secure establishment and elect a leader who does the training for the particular batch.

So in the next batch, there’s a new leader who is elected and then the training continues till this dropping criteria.

Aleksandra: So question, like if this is, that those leaders are randomly assigned. Is it that at the end, all of these models will have the same information?

Oliver: Exactly. So what happens is this leader will aggregate all the weights [00:13:00] and then transmit it back.

So at the end of the training, what we expect is all the 100 centers are all the 200 centers. To have the same weights and biases, that is the model parameters. And then this model has to remain constant over the whole consortium or the whole training platform. So somehow when you validate this model, it should perform equally to a new data set, which you want to validate it on.

Aleksandra: And how is this arranged that, okay, you will not have the same type of data in all centers? Does it, like, take all the information, how the slides are made, the, the, the shade of hematoxylin and eosin and all the other, you know, parameters that we worry about when we want to have a robust, generalizable model?

So it, like, takes these pieces of information from every partner and then adds it [00:14:00] in the leader, and then the leader brings it. To wherever I hope I’m not confusing my listeners even more I’m like coming with the attitude. I understand that after the end of the episode at this episode. They will understand as well, but well, yeah, I hope the question is okay.

Oliver: Yes, yes, the question is good. So, so I, I will explain it from a little bit more broader perspective.

Aleksandra: Yes

Oliver: Somehow, somehow there are two general steps in trading a model. So, when we wanted to predict something like a mutation. So there are two general aspects. One is pre-processing the whole slide images.

The second part goes to the training. So somehow what swarm learning is involved in is only the training aspect of it. 

Aleksandra: Okay. 

Oliver: So it doesn’t do anything on the pre-processing side of the model. So.. 

Aleksandra: What would,

Oliver: somehow… 

Aleksandra: What would be the steps on the pre-processing side?

Oliver: Yes, on the pre-processing, as you know, histopathology slides are very big.

So we need to [00:15:00] do the tessellation, the normalization so that all the histology tiles and the slides look similar. And then maybe there could be some feature extraction with the new SSL methods like transformer-based feature extractors. So this is all coming in the pre-processing aspect. So you have, to go from the whole slide image to the features, everything locally. So this is never a part of Swarm Learning. So going from features to prediction is the Swarm Learning aspect. So giving the feature to the model and getting a prediction out of it is all where Swarm Learning comes into the picture and training the model and making it better. So somehow generalizability, different whole slide images, what are the staining? So all this is handled during the pre-processing side. 

Aleksandra: But locally. 

Oliver: Locally.

Aleksandra: So we, we only have a certain spectrum that is being normalized, right? And then in the second center is a different spectrum [00:16];00] that is being normalized. Let’s say the first centrum has kind of darker H&E and it’s normalized to that. Then the other one has a lighter, the other one has thicker slides. So we still have some kind of batch effects that can be present in the pre-processing, right?

Oliver: Yes, definitely. 

Aleksandra: Does it matter?

Oliver: So there are two ways, yeah, there are two ways to approach this. One is you can normalize it to a standard image, that is an image constant over all the centers.

And then everything is normalized according to this image. But there is another approach by just having this batch effect and giving it to a single model. Somehow it is good for the model to have this variability. Not having the same kind of images, but having, why? Because you never know what is going to be thrown into the model, right?

So for a prediction, it can be from any center that we are not aware of. So somehow giving a little bit of variability to the deep learning model is always good [00:17:00] for it to assess. And to analyze different kinds of staining or different kinds of brighter shades or lower shades. So somehow this is very good for the deep learning model to have a little bit of variation, but I wouldn’t say it would perform very, very well, but somehow the money would be robust and not biased to one center, because if it was a local model, then it would be biased just to that kind of features.

But since it has three centers with three kinds of, yeah, color shades. So I think the model is somehow less biased to one particular center.

Aleksandra: So this is kind of like in line when we have aggregated way of training to just have different, different appearances, different, different types of slides, right?

Oliver: Yes. 

Aleksandra: Okay. So here, non-inferiority. But taking one step back, what is swarm learning? Like in the general picture, who invented it? It’s not the first application [00:18:00] in healthcare. Where was it used first, and tell us a little bit about this, method as such, and then like, why did you, how, how did you come up with the idea to use it for images?

Oliver: So this is, this is a very interesting, very interesting question, and then I have a nice story for this. So…

Aleksandra: Oh yes. 

Oliver: So it was not invented, Swarm Learning was not invented by our group. It, it was by Warnat-Herresthal. And then the team from HP. So this was a small group in Bonn. 

Aleksandra: From Hewlett Packard, the company Hewlett Packard.

Oliver: Yes. The company Hewlett Packard and, and a small research group in Bonn. So they both were the founders of swarm learning as a technique. So they use this in just predicting a lot of diseases, so somehow COVID-19, tuberculosis, leukemia, but one of the drawbacks of the [00:19:00] study that they did is they didn’t use it on any image-based analysis.

So it was all text or tabular information for prediction of COVID-19, tuberculosis, or any other cases or diseases. But somehow one drawback of this whole study was they didn’t use any particular images or histopathology. As I already told my supervisor, in his first nature paper in 2016, in this paper, he showed that deep learning could directly predict mutations.

So directly from the whole slide images, you could predict mutations in the nature paper. So what idea we had was to combine both of these studies. They use swarm learning as a technique and show that deep learning can predict mutation. And somehow what we wanted to show in our hypothesis is by having three different cohorts on three physical systems and the performance of [00:20:00] this on an external validation cohort should not drop when you have a centralized model.

The swarm learning model should somehow perform on par with this centralized model. So this was the hypothesis and this is what we tried to achieve in this whole, whole study that we had.

Aleksandra: Okay. So the main, the main advantage that they see here, okay, performance-wise, it’s supposed to be the same as centralized, right?

But because we’re dealing with healthcare data, and there’s a lot of regulations and it’s very sensitive, prob… So in the healthcare setup, we don’t want to give away, private information, of course. It’s sensitive, it’s health-related, but there are also regulations. So, in the U.S. it’s going to be the HIPAA regulation in Europe. It’s going to be [00:21:00] GDPR, General Data Protection Regulation. And how does swarm learning, align with that? Why is it better than the centralized? What are the, like, advantages, regarding treating patient information when we use swarm learning versus centralized learning?

Oliver: Yes. So, I will name two specific quotations or two specific parts in the GDPR article that is.

5.1C which says the principle of data minimization. So somehow when you use the data, the availability of data that is exposed has to be as minimal as possible. So somehow swarm learning kind of complies with this GDPR saying that we don’t expose the patient data and the complete data is not exposed to different centers.

So we don’t move data, [00:22:00] we don’t expose any patient IDs or patient information in the data. So somehow this particular, principle, holds good. And there’s another very important part of the article, which is. 5.1P, which says the principle of the purpose of limitation. So somehow this is, this is also very important to facilitate that we comply with this purpose of limitation to somehow use the data only where it is useful.

So somehow it is very important to use this data, just not exposing the whole image. But just by extracting information from it. So directly, you cannot go back to the same image that is used, but only the features or only the weights and biases. So somehow we tried to comply with this application also, but said we don’t have a white paper or a proof of document, which says that it is.

So there is also a study by Stephanie Rosello, which [00:23:00] also shows some of the benefits of using Federated learning, which is also a kind of distributed learning approach. So in this study, these two articles are mentioned, which say that the principle of data minimization and the purpose of limitation is fulfilled by federated learning.

But there needs to be more research, which shows in, in, shows concrete proof that this abides by all the GDPR for data sharing.

Aleksandra: So that’s. Like a tricky thing to do because in general, like deep learning and especially the, weakly supervised where you have the mutation status and the image, the, the, the like hype or the hope also was, Oh, we don’t know which data is useful.

Let’s use all the pixels and everything. And here GDPR is telling me to only use the useful stuff. Don’t use everything that you don’t need, [00:24:00] but we don’t know what we need. So I guess this is one way of dealing with this

Oliver: But I would like to make a small correction. GDPR says that only use what you need, which reveals the identity of the patient.

Aleksandra: Okay. That’s

Oliver: So, it should not be somehow revealing, say date of birth, for example, or name of the patient. So this reveals a lot of information on patient identity. But, this is the information that should not be exposed to attacks or exposed to somehow information that anyone can get access to.

Aleksandra: Okay. So tell me a little bit.

So we already mentioned that. Since 2016, the group has been working on predicting mutations, and mutational status from the image. And your paper also talks about that. Let’s talk a little bit about what your setup was and how [00:25:00] many centers. Did you have, how did you show that, the swarm learning is on par with centralized learning for what you were doing?

Oliver: Yes. So, interestingly, what we wanted to do was do a prototype study, but somehow when we were using medical data, it was very important to use real cohorts. So we decided to use cohorts from different countries. Which we had for analysis, but somehow we didn’t do a real-time swarm learning study.

So the cohorts from us were not based on us. They were in Germany in the lab but in different physical systems. So this was, this was somehow like a prototype study, but we, we put the data in physical system separated. So there was no virtual machine. So in the previous study, in the swarm learning approach study, were, the group from Bonn showed that swarm learning works for COVID-19 prediction.

[00:26:00] It was all, it was, simulated on a single system. They had different virtual environments and then they tried to predict some analysis on different, different kinds of, diseases. But what we wanted to do is show that this works on physically separate systems. So we put the data on three different systems for three different cohorts and try to analyze or predict mutations on this.

So what we got out of that is three separate models that are the local models Then we put on a separate system, all three data together, and then predicted the mutation. So this is the centralized model.

So after all this analysis, we obtained a swarm learning model. and then compared this swarm learning to the local models as well as the centralized model.

Since we had all the data, we could compare it with the centralized model. But for the present studies that we are doing in a lot of other conversions, we don’t have access to all the data. All the data, that’s why [00:27:00] we cannot compare it to the centralized model, we can only compare it to the local models and say that it improves the performance when we compare it to the local model.

Aleksandra: So you basically did a proof of concept, okay, our method is not worse than what everybody else is doing and now let’s do it our way.

Oliver: Yes, exactly.

Aleksandra: Okay.

You said that the data for Swarm Learning were in difficult, different physical locations. I remember from the paper that you use the term metalcore servers.

Oliver: Yeah. Bare, bare metal. Bare metal servers. 

Aleksandra: Bare metal servers.

Oliver: Yes. This is a good, good point that you got. But somehow, what we meant by bare metal servers is. These are physical commercial hardware. So like, like a desktop, is physical commercial hardware, which is, which is not connected to a particular general server.

So it is separated by, [00:28:00] not connecting it to any, cohort or a group of systems, but it is independent of itself. It is connected to the internet with Bellon and it’s a physical system. So this is what we use for our studies. physical system, which is just connected to the World Wide Web.

Aleksandra: And this is something I learned that you don’t have to buy them, right?

This is a service like cloud computing. You can have it on virtual machines or in the cloud, or you can actually ask a service provider to have a dedicated bare metal server for you in case people would want to do this, you know, Like as a service or at scale and use providers for that.

Oliver: Yes, definitely.

So we have two approaches. Either you can have a cloud-based service, like a service with AWS or anything of that sort, or you could have a physical system, which is like a desktop in, in your center, which could be [00:29:00] integrated with Swarm Learning and then. Some will run the analysis. So these are the two different approaches that we can follow,

Aleksandra: This also gives the researchers and healthcare professionals options to be compliant.

And because in different countries, the attitudes or the regulations about cloud computing are different. I know that in Europe it’s required that, okay, if there is an audit, there needs to be a physical location, required with where the data is processed. And, I assume the bare metal server is something that can be located physically and that’s where the computation is happening.

So let’s go back to the concept of creating centralized slide repositories, of whole slide images. And there are several different initiatives going on, multi-year initiatives. So the one that I’m very familiar with in Europe is BigPicture. This is a private public consortium in Europe, [00:30:00] and there are also initiatives going on in the U.S. Like, for example, the Digital Diagnostics Foundation is working on aggregating images. So does it make sense to keep doing it, knowing that there are disadvantages regarding being compliant and knowing that swarm performance is on par with centralized models? Tell me what you think about that.

Oliver: Yes, definitely. I think it completely makes sense because these consortiums or these projects have come up. 

Aleksandra: So they should not stop? 

Oliver: No, definitely. 

Aleksandra: You’re okay with them? 

Oliver: Yeah, yeah, definitely. 

Aleksandra: They can keep doing it? Okay

Oliver: Yes, I, I think it is. It is pushing the field forward in a lot of ways. So there are different, applications of different approaches.

So what we showed is, is a scenario where you actually cannot share data. So in, in BigPicture, a lot of consortiums or centers have agreed to share data and they could [00:31:00] somehow put all this data in a single center. But when it comes to a lot of different consortiums, because of the data regulations, they cannot share data.

In this aspect, I think swarm learning decentralized learning, or distributed learning is the best way to approach it. But if you have a consortium where all the partners agree to share data and the infrastructure in one single center, say a big center, which is coordinating the whole consortium has the facility to store all the data from all the centers as well as has the capacity to process and train a model, everything on single site.

Then I think big consortiums like this really add a lot of value, but if there is a center that does not have the capacity to process all the data, which is available as well as to train a single model, then definitely swarm learning plays a very important role. So as you mentioned [00:32:00] about BigPicture, I was reading a little bit about this and they also mentioned because of the scale and the data sets involved the GDPR regulations are sensitive about the data.

The consortium is leveraging Owkin, Owkin is one of the, very, very important, central coordinators, who is handling the technical setup. They are leveraging federated learning techniques to… 

Aleksandra: Yes.. 

Oliver: To enable the collaborative development of AI models. So one of the main reasons they are doing this is handling such a big amount of data.

At training a single model on one single site. Becomes tedious when the number of sites increases, when the size of data increases, and when the compute power that is necessary, which increases. Why? Because in distributed learning, if you have five centers with a small amount of computing power, that is enough. But if you have all the data in a single place, [00:33:00] then you’ll need five times more powerful GPUs or powerful hardware to process and analyze this data.

Aleksandra: So question, for example, let’s say there is an initiative going on with a lot of centralized data. And there is one partner that, would like to benefit from that, would like to take part, but, because of legal issues, they cannot share data.

Would that be a use case to use them as an additional partner in a swarm learning framework?

Oliver: Exactly. That would be the ideal case where we could use them as one of the partners and train a model which is shared with, all the centralized models as well as them. So they will be a contributing partner for the swarm learning.

And the other mode will be this center with all the centralized, centralized, particularly whole slide images. So if they have some data with a specific mutation. Then you can train a single model with their [00:34:00] data and the centralized data and then get, get a model, which is specifically trained for that particular mutation.

Aleksandra: Okay. So if we have, and this is a known problem, we don’t have the necessary diversity of data because some regional data or some specific ethnic populations have, there is not enough representation in those big data repositories. So basically you could have a model or have, you know, all the solutions that are training and check, okay, is it working on that specific patient population, and if not, then let’s, can you, do you then need to consolidate to have one model that does everything, or can you somehow have submodels, how do you deal with that?

Oliver: Yes, definitely. If there are different kinds of mutations to train a swarm model, you would need each center to at least have [00:35:00] specific information about that mutation. So if one center has no information for BRAF or MSI, then you cannot train a model from that data. So you can use that data maybe to pre-train or, create a feature extractor.

But somehow you cannot use the data to train a model that can predict MSI or BRAF, because this is very important even for a swarm model to have that specific mutation information in that cohort. If that information is never present, then you cannot use that data just for training.

Aleksandra: So we basically need the same data pairs, like image plus label, whatever that is. Okay. 

Oliver: Exactly.

Aleksandra: So, going back to the specifics of your study on, the mutation prediction from H&E. So, what entities were you working with and what were you predicting, in that particular study? And did you expand the [00:36:00] spectrum of what you’re doing or?

Oliver: Yes. We were working on predicting microsatellite instability in colorectal cancer, as well as rough mutation and colorectal cancer.

So this was these were the two important mutations that we wanted to predict which were available on the data or which were available on the cohorts that we had. And then later this was all with colorectal cancer. Later we put a follow-up study and after this paper was published we wanted to do the same on gastric cancer.

We wanted to predict, MSI on gastric cancer and also EBV on gastric cancer. So this was also a very, highly cited paper, which was also informed about what do we need to, on which other use cases we continue to go forward with the study. So this is, this, these are some of the  [00:37:00] interesting use cases that we used in histopathology.

Aleksandra: So, another super important aspect in all these deep learning efforts is explainability. And I remember from the paper that you said, Hey, the model flagged patches that were, that you had like you said that it’s maybe, has better explainability than other models. And the reasoning for that would be that, okay, the model, took the information about mutation prediction from certain patches and then pathologists looked at those patches and said, yes, this is compatible with the features, the histopathological features associated with the mutation. What? If you work with tumors that don’t have this kind of explainable features. Because not all of them have, right?

Oliver: Yes, definitely. And I think that is, that is a general question or challenge in the field, which, which even [00:38:00] swarm learning cannot address, I believe, because somehow it is very important that these, answerable features in the image is present so that somehow the pathologist could look at these specific patches and say it is relevant to that mutation.

So if there’s nothing relevant to the mutation in the image, then it is very important to see how we predict these because there’s no information that is provided by the image to the viewer or even to the pathologist about this particular. mutation.

Aleksandra: But there are, like, you do not always visually know that there is some feature.

I mean, colorectal cancer is a specific case where pathologists are describing different features and, you know, there are also other publications of, okay, let’s quantify what we see and what we describe anyway and see if it translates into, any predictive properties of this image without even going into the mutational status, right?

[00:39:00] But what if we don’t see that? Is there any, so what I’m going, what I’m getting at is like, how could we, check this? Can we maybe take this particular, I don’t know, microdissection of a histopathological slide of this patch that was marked as something and then do a molecular test or, do any ideas on how we could approach that?

Because that’s probably like… There must be something to confirm and, I recently read a paper about using commercially available, software for, marking EGFR mutations, in lung cancer. And what they did, they basically took this piece of tissue and did a rapid PCR confirming that, mutation.

So, did you guys [00:40:00] explore something like that, or do you have thoughts on that?

Oliver: Yes, definitely. If we also have a possibility to get PCR or any dissection on this specific tissue, then definitely it would be a very good proof of all the efforts and all the predictions that we made. And I think this is the direction the field has to go forward is to have clear proof of all the analysis that we do.

But unfortunately, we don’t have access to this kind of data. So somehow we don’t have access to the. slides that we could somehow do maybe some other tests. So somehow there is this exact. So that’s why what we want to do is show heat maps or patches, which could visually say how it impacts this particular mutation.

And then, maybe a pathologist could go through these specific tiles or patches and then give us information on that, but we don’t have access to this kind of data. That’s why [00:41:00] I think, yeah, we don’t, we cannot, I cannot, I mean…  

Aleksandra: If you only work, on images, then the tissue is not in your data set.

Okay. This is really, really insightful. So logistically is the centralized approach or the swarm approach easier? 

Oliver: Logically? 

Aleksandra: Logistically. Like if you wanted to set up a partnership. From how many partners it makes sense to use Swarm? Let’s say they don’t want to share data, we assume they don’t.

Is it more difficult than trying to figure out how to share data and centralize it?

Oliver: No. Definitely, I feel Swarm learning is much easier in this aspect because just getting a data-sharing agreement, takes months of effort. To set up swarm learning and to run an analysis, it just takes one or two [00:42:00] weeks, depending on the firewall and depending on the infrastructure that is specifically present in that center.

Swarm learning is a much easier and faster approach to training a model by actually not sharing data.

Aleksandra: So what do you need to agree on then? Just say, Hey, we would like to do a project together and our data is ours, and we don’t tell you anything about it other than, the labels for this model actually working on our data or like, how do you initiate a partnership?

Oliver: Yes, I think initiation of a partnership always, has two, two side benefits. So somehow there is a center, for example, center X and center Y. Both of them have breast cancer histology slides and they want to predict a very specific mutation for which they have some labels, but both of them don’t have a big enough cohort to have a better performance or a low biased [00:43:00] model.

So what both agree on is this is a use case. This is the model that we want to train. We use both the centers at the end, and both the centers get the model, which you can use to validate a new data set. And then somehow say this model is trained on our data as well as some other data for one specific application that you want to treat.

Aleksandra: And potentially you could actually, like if you had a method to rapidly confirm when the tissue comes in when it’s in a clinical setting, you could then do it and basically shorten the time to treatment, even if you, if you don’t have the option, don’t want to aggregate it. This is amazing.

So what is the potential for commercializing, models trained in this way?

So commercializing models themselves [00:44:00] and also commercializing the method, or can you even commercialize the method like that? What are your thoughts on this?

Oliver: Yeah. Presently we are using the software from Hewlett Packard, which is still in a community version. So they have announced that they’ll keep it, as a community software for some time now.

So, what you can do is just, this software is available or this platform is available for swarm learning. You can, it is basically the intention is to commercialize the whole process of swarm learning. So somehow, yeah, this would, if you want to use this application, then you may need to pay the license to use the software, but presently it’s still a community, community version.

Aleksandra: So, so let’s train a lot of models now before… Well, Hewlett Packard, thank you for making this available in the first place. And then let’s train a lot. Wow!

Oliver: Yes. That’s what we are trying to do. Just not stick to histology. We have also tried a lot of [00:45:00] projects on radiology, video analysis, single cell and so this can be integrated with any kind of data, which is we work mostly on oncology that is cancer research.

So this can be established for any setup. And speaking about the point where you’re told about commercializing the model, definitely, this is something that has to be compliant with a lot of regulation. So, if you want to use a model on a commercial basis, like on patient data or to predict specific patient information, then there is whole new regulation compliance that you have to…

When you train a model, everything has to be standardized. So somehow this will take still a lot of effort to get a model standardized and commercialized, but this is the direction that we are going towards. So potentially we want to have a model that could be commercialized and somehow any center can be able to access this by just putting, [00:46:00] putting their data on, on maybe a path, and then they get a prediction which is from our model, which is trained on all the data across the globe.

Aleksandra: Okay. So it doesn’t look too different from any other model commercialization, right? Because the platform, the Swarm platform or Swarm software is one thing, it’s like a framework for training, but once it’s trained, it’s a tool that you can commercialize and in order to commercialize it, it has to be compliant like any other deep learning model for assisted diagnosis or for diagnostic purposes, right?

Oliver: Exactly. Exactly.

Aleksandra: So now there is something known as, the FAIR Guiding Principles for Scientific Data Management. This also comes from a Nature paper, Scientific Data from 2016. And this FAIR acronym [00:47];00] stands for Findability, Accessibility, Interoperability, and Reusability.

And this is how the data should be for, you know, science, healthcare, so that people can reuse so that people don’t have to do the same time, science twice. And I can imagine that, okay, in a centralized setting, you have it everywhere. Everything is in one place. You can find it, you can access it.

And by the way, accessibility doesn’t mean that it has to be open. you know, there are different, you know, gates, and it’s gated data, but you can access it there. It’s more or less interoperable and reusable. How can you ensure that it’s fair? With swarm learning.

Oliver: Yes. I think I think it is very difficult to be, yeah, compliant with this guideline of FAIR because somehow it can be [00:48:00] findable.

So somehow, we mentioned all the centers, all the, like how is the data split between different centers of findable, definitely will. It is possible to let the user or let the guidelines see from where this data was accessed and trained, but somehow accessible, we, ourselves don’t have accessibility to this data.

So, accessibility, even if a different center, say center X and Y is using the data, even center Y will not have any access to the data from center X. So this is, this is the main selling part of swarm learning. So this data won’t be accessible and it will be reusable if both centers agree to train a model again, but it can be reusable.

So, that should not be a problem to be reusable, but both the centers should agree to retrain a model and then use the same prediction, l[00:49:00] external validation cohort. So interoperability, definitely all the heat maps. All the explainable methods, so interpretation of our results are very, very fair, so that, that is possible, but making this accessible is, is something that is a little bit tricky.

Aleksandra: I guess it’s also like this definition of accessibility. If you can find it, then if people let you access it, you can access it, right? The thing is, okay, they might not let you access it. So I think it’s, It’s not specific to swarm learning. It’s just enhanced by swarm learning because you don’t aggregate.

But in general, like, do you want the patient data even in the limited way used for, training of the models to be accessible, and who is supposed to access it, [00:50:00] probably not every researcher that wants to do research on that. But they’re findable and reusable if you can. 

Oliver: Yes. If, if you’re, you’re working on data, which is open source and then somehow, a lot of the consortiums have, an initiative to publish or open source some section of the data, then I think all the four can be very, very easily compliant, but somehow some consortiums cannot, openly access or open source all the data.

Then I think accessibility becomes.

Aleksandra: Thank you so much for explaining this. And, it’s another way of doing things and I hope it’s going to get, integrated into the ongoing initiatives.

Are you guys going global with this? What’s, what are your plans? What, what would you like, if you could have people do anything you want now after this episode, what… [00:51:00] Would you want to ask them to do? Those, who want to use swarm learning or in general, what’s your mission with this?

Oliver: Yes, I think the mission with this when we started using swarm learning or using it for histopathology is to have each center, each histology center all over the world to just scan their slides and contribute to a model, which is the, which is trained globally, all the centers across the world can train one single model for a particular mutation prediction, or maybe a generalized model for multi mutation prediction.

So, that, that’s the final aim that we are working towards, but at the moment we are taking smaller steps, because this is the whole overview, smaller steps are, I would request all the viewers to read the papers. Go throughout the research that we are doing and we provide a large variety of data modalities that we have already integrated with swarm learning.

So if some center has some data [00:52:00] or has a project in mind, which is similar to what we are doing in all the data availability, then they can always contact us. And then we can start up a project which can really be something very novel or something very important to the field. So the contribution of each center here plays a very important role.

And what other centers that provide data benefits are they get the same model that we get? So there’s nothing that we take away. Everyone gets the same model which they use for, which is used to train their data as well as the other centers, which provide the data. So this is, this is what I request. Get in contact with this and then… 

Aleksandra: And scan your slides. Please scan your slides.

Oliver: Yes, definitely. 

Aleksandra: Because if we don’t scan, we cannot do it. 

Oliver: Yes, definitely. Digitization is the way to go forward. Yeah.

 

Aleksandra: Thank you so much for joining me. I hope I didn’t confuse my digital pathology trailblazer, but I’m going to link to the paper and link to my video of the paper, where I kind of.

Break it down [00:53:00] at a slower pace thank you so much and I hope you have a fantastic day.

Oliver: Thank you so much for inviting me. It was a wonderful introduction to the study that we have done and I would love to see some inquiries and yeah, get back to you and thank you so much for this wonderful opportunity.

Aleksandra: Okay, and you made me dress up So whoever has not seen it on YouTube go and check out how we’re dressed on YouTube 

Thank you so much for staying till the end. This was a pretty complex subject So I appreciate you even more you are a true digital pathology trailblazer So, to understand exactly what we were talking about, there’s gonna be the paper linked in the show notes. But also, please watch this video where I break down Oliver’s paper and see for yourself what amazing research they’re doing.

And I talk to you in the next episode.