Scaling up your digital pathology operations with Mark Zarella, Mayo Clinic

Scaling up your digital pathology operations with Mark Zarella, Mayo Clinic

This episode is brought to you by Hamamatsu. Thank you Hamamatsu 🙂

So…you are already doing digital pathology in your institution but would like to scale it, take it to the next level? How do you do it, where do you start?

In this episode my guest, Mark Zarella, PhD (previously Johns Hopkins University, currently Mayo Clinic) explains how he did exactly that at Johns Hopkins University.

He talks about:

What is important when evaluating whole slide scanners and how to choose the best whole slide scanner for you
How he organized and managed the whole slide images at Johns Hopkins University
How he scaled the operations from ca. 10K slides to ca. 750K slides a year
How he ensured interoperability of systems
How he approached automated slide quality control

AND MUCH MUCH MORE!

If you are serious about taking your digital pathology operations to the next level, THIS IS THE EPISODE TO LISTEN TO!

Watch on YouTube

This episode’s resources

Mark’s Paper: “High-throughput whole-slide scanning to enable large-scale data repository building”

Blog post: HOW TO CHOOSE A WHOLE SLIDE IMAGING SCANNER FOR DIGITAL PATHOLOGY – THE ULTIMATE GUIDE

Podcast episode: HOW TO CHOOSE A WHOLE SLIDE IMAGING SCANNER FOR DIGITAL PATHOLOGY W/ DOUG STAPLETON, HAMAMATSU

digital pathology resources

Transcript

Aleksandra:

Welcome, Mark. How are you today?

Mark:

I’m well. How are you?

Aleksandra:

Good, thank you. Thank you for joining me. So for everyone who is starting their journey in digital pathology, for sure, you have encountered Digital Pathology Association, and there are different White Papers from Digital Pathology Association. And Mark, who is my guest today, is first author of many of them and probably non-first author of even more of them. Mark, let’s start with you. Tell me about your background and most importantly, how did you start this journey of digital pathology, where it started, and where it is now?

Mark:

Yes, for sure. Thanks for having me. It’s a pleasure. Where did I start? My training is actually in visual neuroscience. I have no formal training in pathology. About 10 years ago I jumped, I guess, from visual neuroscience to digital pathology. There was a lot of allure in this burgeoning field of creating these vast images, of course, that we know [inaudible [00:01:10].

Aleksandra:

I don’t know if it’s a coincidence, but I know several neuroscientists that then switched. I don’t know, maybe there is a lot of image analysis, and then takes you to digital pathology by some flow of…

Mark:

Yeah, it’s not a coincidence at all. I think fundamentally, we as neuroscientists are interested in perception. We’re interested in vision, at least that was my area of neuroscience, and so there are a lot of parallels that you can imagine and so it’s certainly not a too much of a jump. I refer to it as a jump, but it really isn’t. It’s applying the field of neuroscience to medical image perception, but it also includes a lot of other things. And again, imaging now is so important in pathology and that’s something that we as neuroscientists have also had an interest in. I had been doing a lot of optical imaging in the brain, and so now doing optical imaging of tissues is really not that farfetched. The importance of computation in neurosciences is really ubiquitous and now we’re seeing computation pathology, so a lot of parallels and a lot of lessons to be learned from both fields that can be applied to the others.

Aleksandra:

Yeah, sure. So tell me, where are you now with your digital pathology situation?

Mark:

Yeah. I’ve been at Johns Hopkins now for two and a half years and we are certainly growing in the digital pathology area. When I joined, there was quite a bit of digital pathology work going on. A lot of folks had been engaged in, let’s say, AI studies or data acquisition studies or just using digital pathology to support their more basic science research, for example. And I think where we are now is that we’re starting to apply some of the methods in digital pathology at scale. We’re starting to really think about the clinical impact of digital pathology and devising ways where we can, at the very least, dip our toe in the water, but hopefully do more than that. It’s been a little slow going as it often is. There’s so many variables that you need to account for. But I think from a technological standpoint, we’re certainly doing better each passing year and definitely moving in a direction that we want to go.

Aleksandra:

Let’s do a little bit of comparing contrast. You joined two and a half years ago. Where were you then and where are you now? What was your role and how did you grow, in which direction?

Mark:

Yeah. Really, the growth has come from scale and from centralization. So when I joined, it was a lot of, I hate to overuse the word siloed, but it was very siloed in the sense that there was a lot of digital pathology going on, but no real centralized effort. There had been some work done towards centralizing, but I like to think that I’ve pushed that forward quite a bit, especially on the slide scanning front, where we’ve seen technology advanced pretty rapidly, I think, in terms of getting things to go faster, higher volume, more automation, better integration, things like that. And I think a couple years ago at Hopkins, we were still very manual in how we were doing things, very deliberate, very, “I’m going to pull these slides here. I’m going to scan them, probably manually, probably circling tissue areas and checking everything for artifacts,” and this is a very labor intensive process. One of the things that I brought in first was the ability to do all of this quickly, automated fashion, with the idea that we really need to scale up to do anything truly meaningful in the clinical space with digital pathology.

Aleksandra:

So the scaling up from how many slides before you joined to how many slides right now, more or less or ballpark?

Mark:

Yeah, I don’t have the numbers in front of me, but I would say, again, things were going on in a very disparate manner so I never truly had access to all of those numbers, but probably on the order of a few thousand slides a year or 10,000 or 15,000, something like that, closer to half a million, three quarters of a million this current year, so quite a bit of scaling up.

Aleksandra:

So 50 times more?

Mark:

Yeah, something like that. I think that’s close. Yeah.

Aleksandra:

I just finished reading your paper about the high throughput scanning. Obviously, the scanners play a crucial role in scaling. How did you approach that? Did you have to buy scanners? How many did you have at the beginning? How many do you have now? What was the scanner selection journey, also? And in this paper, I’m going to link to that paper, it has a comparison of four, I assume these were the four that you were using, and there is a clear winner. We can talk about it.

Mark:

Yes, that is true. And in fact, one of the scanners that we use, perhaps the most here, wasn’t one of the four in the paper, so it certainly isn’t all of them. But as far as scanner selection goes, it really comes down to three factors. The three main considerations here are the speed of the scanner, so how quickly the scanner can scan a single slide, the capacity of the scanner, which is it becomes a little unrealistic to scan massive volumes if your scanner can only do a few slides at a time. It requires continuous loading, which can be certainly a burden. Then the third part is automation and so we were certainly looking for scanners that enabled us to do a lot of things in an automated fashion, which means you don’t have to necessarily have scanning techs do everything. Ideal here, of course, is to load a slide, hit go, and walk away, which may be slightly overstating it, but not really. A lot of the modern scanners can come very close, at least for the majority of slides of that sort of workflow. So when we were looking for scanners, there’s not necessarily a one-size-fits-all option, but we found a few scanners out there that do much of what we needed to get done, fit our particular workflows, fit our footprint, and a lot of the other requirements we had.

Aleksandra:

Which ones were the ones that you evaluated in the paper and which one were the ones that you were actually using?

Mark:

So the paper was, as you alluded to, an example of one that I would consider high throughput and then three others that were probably not in that category for one reason or another. The Hamamatsu S360 was one of the scanners in that paper, which I think meet all the criteria for how I defined high throughput in the paper.

Aleksandra:

That was the winner for me when I looked through the paper, in terms of high throughput. Like you say, one size doesn’t fit all, and if high throughput is not your need, that’s not going to be your scanner. But for what you were evaluating in the paper, that was the one.

Mark:

Yeah, exactly. It met all of the requirements for that. There were other great scanners on that list, frankly. There was a Vantana scanner that did very well, was very fast, but it didn’t quite have the capacity that you would probably want in a high volume operation. There was a ZEISS, which is a very powerful scanner, very configurable, does a lot of things well, but a little bit on the slow side. There are good things and bad things for all of those, but I think from strictly a high throughput perspective, you want to get through as many slides as possible, get them at high quality, the S360, of the four that we tested, was certainly the one that checked all those boxes.

Aleksandra:

And was that the one that you later were using as well or there was a separate scanner set that you were evaluating?

Mark:

We’ve tried a lot of scanners out there. We do use the Hamamatsu pretty heavily. We also have other scanners in the department that I would also consider high throughput. We have some 3D Histech P1000s, which are very good scanners. They weren’t part of this paper. The Aperio GT450, of course, is a very popular, especially for high throughput projects, a lot of organizations are interested in that scanner, and there are others as well. I’m not listing, sure, that are very high capacity, fast, or automated, but those were the ones that we decided at Hopkins really suited our needs the best.

Aleksandra:

I want to ask a follow-up question to this. You say you have many scanners. How does this work with the interoperability of the systems of those scanners, because each of the scanners comes with their software. Some come even with a separate computer. I believe Hamamatsu comes with a computer. How did you manage to integrate all this in the lab and what was the greatest hurdle? What was the greatest hurdle in scaling?

Mark:

Yeah. That’s a really great question, I think, really central to a department considering all the different uses for slide scanning. They may decide the scanner’s better than this one for a particular reason, and then they may still want them to all be at least talking to the same systems, whether it’s to their LIS or whatever the case may be. Again, we adopted a very centralized approach. Dealing with things like different file formats for example, can be tricky, but we basically push all of our images to a single location. We ingest these images in a single image management system and then the idea is for that image management system to be filed type agnostic enough-

Aleksandra:

Which is, what image management system are you using?

Mark:

We’re using Corista’s image management system, full clinical work.

Aleksandra:

And this is compatible with all the scanners that you guys are using?

Mark:

Yes. All of the scanners that we’re using for clinical work are able generate images that can be ingested into that. And so to the end user, they don’t necessarily know which scanner images came off of. They can guess, certainly, and that’s less important to them. They just want to make sure that the slide is represented faithfully and is easy to navigate and they can load it pretty quickly. And all of our scanners are able to do that, so I think it’s been a success.

Aleksandra:

What was the worst thing to deal with? What was the hurdle?

Mark:

The biggest thing is that the image management system needs to be able to support that file format. If you have a scanner that it just doesn’t support, then that’s a tricky hurdle right there. You would need the image management system to either accommodate that or you need to do some sort of file type conversion, which can always be a little bit messy, but that’s always the first prerequisite. Provided that that is solved, the next issue is figuring out how you’re going to represent your data and where you’re going to put it. We happen to have a single file store. We’ve been fortunate enough to be flexible there, so we manage our own storage, we’re pushing the images up there. We keep them segregated when they hit the storage according to scanner and then we ingest them in from these separate buckets into that singular image management system. There are multiple ways to skin a cat, as they say, and that’s just the way we chose to do it.

Aleksandra:

What’s your reason for keeping the images from different scanners separately? Is it for troubleshooting with a particular scanner when you encounter issues or what’s the reason for skinning your cat that way?

Mark:

That’s a great question. Yeah. I don’t know that we have a really compelling reason. We did that initially to keep everything straight. We did that also because we wanted to have some degree of file system access. So for example, if we had a technician only working on this Hamamatsu scanner over here, or set of Hamamatsu scanners, and they never really work with the 3D Histech or the Aperio scanners, they could always go into the file system, know where their files are. If there’s some sort of trouble going on, they can make sure things are okay there. It helps a little bit with our permissions for scanners. We certainly don’t want scanners to be wreaking havoc on what other scanners are doing, so keeping them separate in that case is useful. And then another thing that we do, which I touched on in that paper but haven’t elaborated on it yet because it’s still a little bit in progress, is Auto QC.

Aleksandra:

Yes. Tell me about that because this is something many parties are working on right now and there are different aspects to QC. The most common thing is for line scanners, those scanning lines that you have, but there are different other issues. Which issues did you work on and yeah, tell me about it. I’m totally interested in this one.

Mark:

Auto QC was a problem that we identified, certainly others have identified it as well, as being still a very manual step in the whole process. So we can automate scanners, they can collect images, they can throw the images up into the system, they can get ingested. All of these are automated steps, but if the image quality is bad, then the rest of that is pointless so you still need someone to visually assess the image, make sure it’s good before they release it into the world, so to speak. Auto QC is really that last frontier of automation, where if we can use tools like image analysis or AI to identify potential artifacts, potential problems with these images, potential reasons to not allow them into production, essentially, then you’ve really introduced this amazing additional automation to the whole procedure, let alone the quality improvements that come with it, the standardization, the consistency, because you can have different technicians looking at the images and doing a better job at identifying which ones are good and which ones are bad.

Auto QC really standardizes all of that and does it very efficient. The three things that we’re mostly looking at with Auto QC is we’re looking for focus artifacts, usually it would manifest as blurry regions in the image. The entire image could be blurry or you could have little pockets of the image that are blurry, so we’re looking at that. We’re looking at tiling artifacts, as you alluded to, often the result of stitching errors. And so when you look at an image, a lot of times you’ll see these horizontal lines or vertical lines or patch and you don’t want that. That’s another thing we’re looking at. And the third and maybe even the most important, missing tissue, right?

Aleksandra:

Yes.

Mark:

So you can may not necessarily know that there’s tissue missing, but technicians will often see, “Hey, it looks cut off here at the edge.” That is a real problem because you can imagine how catastrophic it would be if a pathologist is reviewing and thinks they have the whole thing, but there’s something else that just never got captured in the process. And the trick there is to actually examine the slide overview image, so the macro image that the picture, the snapshot that it takes of the entire slide before it scans it. If you can review that and identify tissue in that, which is actually very challenging, and then you can relate it back to the region that was actually scanned, well, that’s your check. It’s tricky, but very important.

Aleksandra:

In the paper you mentioned two open source programs that can be used. You’ve mentioned HistoQC and Path Profiler. Did you use these? Did you do your own thing? Did you do a combination? What did you use?

Mark:

This is very much in progress right now.

Aleksandra:

Okay.

Mark:

We initially did it in a not great way and the results were-

Aleksandra:

It’s a super valuable information. Tell me what the not great way is so that people who are listening to don’t do it.

Mark:

Yeah. The not great way is to look for potentially blurry regions in your image by looking for regions that have an absence of high spacial frequency information. It turns out that there are lots of regions in your image that just intrinsically don’t have high frequency information. They don’t have those sharp edges. Usually cellular regions or have that high frequency content but you can see regions of stroma, for example, certainly whites, but there are-

Aleksandra:

Fluid, probably, as well, maybe.

Mark:

Yes. What you end up finding is that when you look just for the absence of high frequency content, it flags everything in your image. You look at it visually and say, “Wait a minute. This looks fine.” That was one of the first things we learned early on, so you need something smarter. And it turns out we’ve looked pretty extensively at Path Profiler, which was an algorithm that came out of Jens Rittscher’s lab at Oxford. And they, I believe, are using deep learning to tackle this properly.

Aleksandra:

And that was my next question. So this high frequency property of the image, this is a classical computer vision approach, right? You have a threshold and if it doesn’t meet the threshold, then it flags it or not. Okay, so you say Path Profiler uses deep learning?

Mark:

Yeah, and they’re certainly doing things smarter and so when we look at the regions that get flagged as potentially blurry, from there, they tend to make a lot more sense and they tend to include regions that you would expect would have a lot of that high frequency content, excluding regions that probably shouldn’t. It ends up being a much more refined method. We’ve begun incorporating this into our own framework. We’re still working on and hopefully publishing very soon on the tissue completeness algorithm that we’ve also been, certainly in the test phase right now. So altogether, you can have Auto QC based on deep learning or based on image analysis, really, any number of tools, and then you put it into production. And the way we’re currently doing that is we’re using these tools to identify slides that are highly likely to have a problem and then we’re triaging those and those get manually reviewed so you end up 10% of your slides, rather than a hundred percent of your slides.

Aleksandra:

The basically reduce the amount of visual human QC.

Mark:

That’s right. That’s right.

Aleksandra:

By flagging the ones.

Mark:

Yeah. A very important part is to also include a few that you know are good. This is one of the points I made in this recent paper as well, because you need to make sure that the algorithm is doing what you think it’s doing and you don’t want these that are supposedly good to actually be bad. That would indicate that the algorithm is undercalling and then you have to go back to the drawing board or at least figure out what’s going on then.

Aleksandra:

So logistically, how are you integrating this into the whole fleet of scanners that you have and to the workflow? How is it integrated?

Mark:

Only on a couple projects so far, so we have some projects where we’re simply not using it yet and we’re doing that a hundred percent visual review. We have a couple other projects where we know we will be using it soon and we’re just capturing stuff now and we’ll go back and rescan if needed, but for now we’re just capturing and not reviewing. And then we have a few projects where we are starting to use this. And the way it works, like I said, it’s the triage, it’s to then visually look at the others. We’re using it inside an open source framework called OMERO, which is very much like image management platform. We’re using it in there. Users are basically reviewing the ones that have been triaged and giving it a star rating, five star rating, which is built in one of the built in components to OMERO, and then we go back in and we look to see how they’ve rated it and we can do data analysis on that. Hopefully, we get a paper on this very soon when we finish this analysis, but things are looking rosy for those projects that are using it.

Aleksandra:

Interesting. Super cool. I am very much looking forward to reading your ultimate QC paper because I know there is work going on different fronts and people are approaching from different angles, so I’m looking forward to reading this. So if somebody was in your shoots, came to an institution, and was tasked with scaling the operations, what would be the number one advice for this person to do and another number one advice for this person not to do, if you have some?

Mark:

Gosh. Yeah, that’s a great question.

Aleksandra:

Or a piece of advice that if you knew it at the beginning, it would facilitate your life or save you couple of weeks or whatever, something like a nugget that a person that’s going to be doing this that is in this situation right now, moving somewhere, being tasked with that, and this information would be valuable for them.

Mark:

I think early on, you want to focus on the data acquisition and you really want to focus on the slide scanner. Make sure don’t just buy the most popular one. Don’t just buy the one where the sales people tell you the best thing, but really try them out. So if you’re going to start a big scanning operation, I would advise against going out and buying 10 scanners and just assuming they’re going to do what’s advertised. There’s so much nuance between different scanners. I think it’s worthwhile to try to bring in a couple, if you can bring in some demos or something. Really spend doing that with experienced technicians running them. So a new technician doesn’t necessarily know any better and will work on those one scan and say, “Oh, it’s fine,” and that’s simply because they haven’t used the other scanner and don’t understand that maybe this will be better. I think that’s one element to it.

A second element is really related to number one here, but make sure your barcoding does what you think it’s going to do. You need to be read these barcodes reliably and on the regions of the slides where you expect them to be. If for example, you’re doing consult scanning and the barcodes can be anywhere because they’re your own internal barcodes, whereas you have external barcodes over here, you need to have a plan for that and make sure that the scanner aligns with your plan on that. These are a lot of the considerations you have to think about putting this together. Don’t jump right in, I guess, is my advice. It’s really to test everything at the earliest stages and then build from there.

Aleksandra:

I guess it’s also for yourself as the person responsible, you’re going to be responsible for your decisions and you have to be able to justify them, so then checking everything thoroughly is the best piece of advice that I can give. And even before bringing scanners in, and there was a podcast episode with Hamamatsu as well, and we wrote the checklist, list of 12 questions. What do you even have to ask yourself to choose your scanner. That’s before. So in our example would be, “Okay. Do you actually need high throughput or not? Is a hundred slide scanner enough for you or do you need one that has that thousand slide capacity?” Do you have a next step checklist or how did you evaluate the ones that you already shortlisted, if you can say?

Mark:

Yeah. That’s another good question. I will say we did something maybe a little against the grain and we did not necessarily follow vendor suggestions. What we did is we turned to some very experienced technicians. In fact, one of my former technicians, I brought her back into this when we started one of the big projects here. We brought her back in as a consultant, knowing that she knows a lot about really boots on the ground scanning slides. She ended up actually training all of our new technicians. So a lot of times folks will have a vendor train on their scanner, but what we really want is someone who’s been there and done that before to train. And we want people to be trained not just on one scanner, but on all of your scanners. So it made more sense rather than rely on the vendor to do the training, but have someone who’s very competent and very knowledgeable and who’s been there before do that training.

That was a big part of our evaluation as well. So having that person come in, try maybe three scanners all sitting next to each other from different vendors, finding the trouble slides, finding the ones that you think, “Oh, I don’t know. It might have difficulty scanning this one,” and then really getting a sense of which scanners did better. You’re not going to be able to answer these questions on 10 test slides. I think you really have to go in and scan thousands of slides on these and you really start to know these scanners and you’ll start to get annoyed by the scanners, often, and maybe you just end up buying the least annoying one.

Aleksandra:

Yeah, I think this is such a valuable sentence, I would say, because, I don’t know, every time when, let’s say somebody’s offering something to you. They want to present it in the best light and one of the things that never holds is that everything is seamless in this digital pathology world. Nothing is seamless and like you say, the integrations are complicated. You have different file formats, different scanners, and then you end up choosing the least annoying, which then, basically, is your best choice. But yeah, that’s very true what you just said.

Mark:

Yeah, I’ve never found a perfect scanner. There’s always something that annoys you or something that just isn’t going to work with your plan. It’s best that you discover that early on and choose a scanner that will work with your plan rather than already being stuck with a bunch of scanners and having to adapt your plan around that scanner’s shortcomings.

Aleksandra:

Okay. Thank you so much for all this pieces of information. I think it’s going to be super valuable to somebody who’s going to be tasked with this wherever or build up, scale, start from scratch. And I very much, when I have all this things that did not work, because then people can come to this episode, listen to this, and they don’t have to do the same mistakes again, so thanks so much.

Mark:

Thank you. It’s been a pleasure.

Aleksandra:

And I’m going to be linking to the paper and whenever you have the new paper, you let me know and I’m going to link it under this episode as well.

Mark:

All right. Sounds good.

Aleksandra:

Thank you. Have a great day.

Mark:

You, too. Bye now.

Scaling up your digital pathology operations with Mark Zarella, Mayo Clinic