[00:01:22] Aleksandra Zuraw: Welcome to the Digital Pathology Podcast. Today my guest is Pete Bankhead. He is the author of QuPath, an open source image analysis software. Welcome, Pete.
[00:01:35] Pete Bankhead: Thank you.
[00:01:36] Aleksandra: How are you today?
[00:01:40] Pete: Good, thanks. How are you?
[00:01:41] Aleksandra: Good, good. Thank you so much for joining the podcast, for being my guest today, and let’s start with the introduction. Tell about your background and how did you end up in the digital pathology field?
[00:01:53] Pete: Yeah, very much by accident. So I find a lot of people in academia, mention feeling imposter syndrome, but I kind of do feel it is valid with my background. If I go back to the beginning when I was at school, I did the minimum science that I could. So physics, chemistry and biology squeezed into one subject. Not because I didn’t like them, but just I wanted to do other stuff. So a bit art competing in maths, but then for my degree I went in a completely different direction and my undergraduate degree is in theology, which was-
[00:02:27] Aleksandra: Okay, that is-
[00:02:28] Pete: Yeah.
[00:02:29] Aleksandra: Different.
[00:02:29] Pete: It turned out to be quite useful though, because it really challenges you to think deeply and critically and evaluate arguments and change your opinion and things like that. So at the end of it, not to such an extent that I started a PhD in it, but I give it up pretty quickly, and then needed something else to do, and so, at that point I started to think maybe medicine would have been good to go into, but it seemed too late on computer science, again, a bit more possible with my background.
[00:03:04] So there was a conversion course at the university in Belfast for people with perhaps less than practical degrees to decent skills in computer science. So I did that, it was sort of like an undergraduate squeezed into a year. So it was called a Masters. It ended up, I was doing a project in Munich as an Erasmus student, and that’s where I first was working on open source software for post-processing video lectures back in 2005 whenever the internet connections were maybe not so good or so fast.
[00:03:33] Aleksandra: What did you use for that? I used to work in Munich as well.
[00:03:37] Pete: Yeah. So I was writing it in Java. My supervisor was Peter Siva and he had developed this tele teaching tool software, and then we needed something to post process the lectures, so that was my project. So at the end of that then, I returned to Belfast. It turned out that my Masters hadn’t quite qualified me for a job in company. I applied, I didn’t get into it.
[00:04:00] So I applied for a PhD, and I got accepted for that, and so, I ended up in a biomedical sciences research group seemingly by chance in the physiology group looking at calcium signals and retinal arterioles. So my supervisors were all from biomedical sciences, and they optimistically thought that they could teach me enough biology and lab skills and I would bring the computer science, and I could work on the image analysis problems.
[00:04:25] We decided pretty quick that, that was probably not going to be too good to have me in the lab, and so, I spent the whole of the PhD focusing purely on image analysis.
[00:04:35] Aleksandra: So digital pathology was actually by chance.
[00:04:39] Pete: Yeah. Yes.
[00:04:41] Aleksandra: Or did you have another choice?
[00:04:44] Pete: I didn’t know what it was when I started it. So after I finished the PhD, I went off to Heidelberg to work as somewhere between a postdoc and a staff scientist for three years there in a Core Microscopy Facility, where my job was to come up with ways of analyzing images for the users of the microscopes. So then that’s a load of different people with a load of different projects and scientific questions, and if I approached it with a PhD mentality of three years later you might get an algorithm that was not really going to please anyone.
[00:05:17] So that brought me more into the world of using open source typing tools like ImageJ and Fiji. Adapting them, scripting them, writing plugins and so on, and starting to teach image analysis a lot more, but it was a three year contract. Whenever that ended I applied for a couple of things, and I thought see, which if any, happens first and that happened to be digital pathology. So it brought me back to Belfast at the end of 2012, and all I really knew is that it involved images in some way, and so, it seemed like it was going to be a continuity, but I’ve ended up staying there longer than in any other field.
[00:05:55] Aleksandra: So how did it lead you to QuPath’s, you didn’t know the digital pathology problems, you only knew that they had images and you did Fiji for other bio image analysis, right?
[00:06:10] Pete: Mm-hmm [affirmative].
[00:06:12] Aleksandra: So how QuPath, how did it even happen? How did it originate, this project?
[00:06:17] Pete: So because I find in Heidelberg using the open source tools was a really effective way to work, and then I could adapt it and I could still write code and develop my own algorithms, but I didn’t have to write an entirely new user interface or anything like that. I thought I would apply that in pathology and then I found that it didn’t work. I couldn’t even open a whole slide image in ImageJ, and even today that’s not really possible to do very much with a whole slide image.
[00:06:47] And so, I spent the first couple of years I wrote some horrible software in Python. I wrote some imageJ plugins that sort of did little somewhat useful things, but I just didn’t have the tools to bring it altogether. I didn’t really set out to create them, but one day in my memory, I think it was just before the summer holidays, I had a little bit of a gap where I thought okay, it’s too late to start something big, but it also I have a little bit of space here. So we’ll try it, I’ll see what would happen if I start writing a whole slide viewer.
[00:07:24] It’s not going to work, but at the end of the day it started to look like it might work, and then that kind of took over, well actually the next several years as I sort of started to piece the things together and to realize that if I started to build the tools myself that I wanted, then I had a lot more freedom to think about how to solve the problems in the way that I thought they could be solved specifically for the challenge of pathology. And creating the whole slide viewer was the first part of that, but it was inspired by these open source tools that I knew, but which didn’t fit for pathology, trying to rethink them and to recreate something new that would actually be designed for that from the beginning.
[00:08:05] Aleksandra: So it was just a gap project because you didn’t want to do something big, right?
[00:08:09] Pete: Yeah. I just thought hey, and it’s okay if I try something and it doesn’t work, at long as I know it doesn’t work by the end of today then I think it ruined the holiday from the memory of it, but.
[00:08:22] Aleksandra: And we’re going to talk in a second what QuPath does, but let’s start what was the goal of your software? Okay, you couldn’t open images in ImageJ, there was not really anything, but why?
[00:08:41] Pete: So as a postdoc there, so I was dependent on whatever my project happened to be and in this case it was for IHC analysis, primarily with tissue microarray’s, but I kind of knew that it was going to turn into other images as well. And so, when I was developing it I had a slight idea of what it would become, but because I didn’t really know the field I was always working one step ahead. So I thought I’ve got tissue microarray’s, I need to work with them. It was an option for me to use proprietary software, and we had access to a few, but I find that they didn’t let me do what I wanted to be able to do and I didn’t really want to be restricted if I developed something I could only share the algorithm with other people who happened to have those licenses.
[00:09:26] And because I knew how to write code and knew how to develop my own algorithms I wanted to do it that way, and so, when I got the tissue microarray’s, okay, I need to derail them. So I wrote an algorithm for that, in order to identify the tissue spots and then I think it was K67 we were looking at first, and so I needed cell detection and cell detection back in 2014/2015 was still a big problem. It’s still even now a bit of a problem, and so, I created my own, in that case, image plugin to try and do cell detection in a way that I thought try to minimize the assumptions. So that it should hopefully work robustly, and at that point I thought it would be useful until the pathologist pointed out to me that we really only want to score K67 in the tumor cells, and so, I need to get into machine learning.
[00:10:17] So I started to build on something for machine learning, and then I started to learn about biomarkers that is not nuclear stemming and started to [inaudible] and even PDL1 and things like. And I realized that if I just go one step ahead each time, this is never going to end, and every project is going to take forever and I need to start trying to think a bit more broadly as to what are the individual tools that are needed and what are the common themes? And if, I could try and spend time solving each one of these then hopefully they could unify into something that can be adapted to lots of different problems.
[00:10:55] Initially I didn’t do that because firstly, I didn’t think to do it, and secondly, because it seemed like it was a massive undertaking to do that in which it would just never work, but eventually I spent so long trying not to do that I realized it was much less efficient to avoid creating materials than it would be to actually set about the big project.
[00:11:18] Aleksandra: Mm-hmm [affirmative], so I use QuPath mostly as a slide viewer and annotation tool, minimal amount of image analysis. Obviously, this is mainly image analysis tool, although you said you started with the concept of a viewer. Who is this tool for now and who is currently using it?
[00:11:42] Pete: So it’s really for anyone who wants it. So whenever I was [crosstalk] –
[00:11:47] Aleksandra: Who might want it?
[00:11:50] Pete: So I suppose the most obvious thing about it is that it’s used for pathology. Having path in the name kind of makes that evident, and so, pathologists might want to use it, but I also created it because I needed the tools myself to develop algorithms because that’s why lots of people use ImageJ because you don’t need to write it from scratch, you can write a plugin, and so, with QuPath you can write an extension here, you can write a script.
[00:12:14] So if I want to create an algorithm and share it with somebody particularly with a pathologist in a user friendly way so they don’t have to get into the code and QuPath gives me the tools to do that, but because my background was more in fluorescence across different types of images, I will still always thinking of those problems as well and the stuff that biologists were finding difficult in their images. When I thought maybe I could just without telling anyone try, and solve some of those problems as well a little bit, and so, I started building some of the tools thinking ahead for safe fluorescence images, which turns out multiplies to be quite important, but it’s even not really restricted to that.
[00:12:53] So over, at the minute in these strange times, over Christmas I had the idea that I wanted to digitize my family’s old photo albums so that everybody in the family could have them. I tried downloading some software. I took photos of the photo albums and I was planning to try and crop out the pictures in them. I know there’s some software for that, but it was all really slow and then I realized that I wrote some software, and if I actually draw around the photos in QuPath, I know how to get them reshaped and make them all fit together and I actually know how to manage those annotations and projects and everything. So in the end I used QuPath in order to digitize my family photo album so I could give it to everybody in the family at Christmas. So it’s pretty versatile, and you can apply it to lots of different things.
[00:13:43] Aleksandra: I think I will use QuPath for that. I wanted to do that also.
[00:13:48] Pete: I should probably document the process, it’s not obvious.
[00:13:51] Aleksandra: Yeah, and post on YouTube. Cool. So do you know how many people are using it?
[00:13:59] Pete: No, not really. I don’t track users. I can’t track users and I don’t even know entirely what’s involved in that. I do know that it’s been downloaded over 130,000 times.
[00:14:09] Aleksandra: 130,000 okay.
[00:14:10] Pete: That’s over all of the versions, and the latest version it’s over 20,000 times and it grows by about 4-5,000 E-downloads per month.
[00:14:19] Aleksandra: Okay.
[00:14:20] Pete: I started it off trying to trail a list of papers because I didn’t want to overestimate how much it was used. So if I saw it cited in a paper, I would look it up and find out did they actually use it or did they just mention it and reference it?
[00:14:33] And so, there’s about 470 to 500 papers at the minute that actually used QuPath, not all of them citing it, but more than half of those were in 2020. There’s an intention for Google, for the Google Applied PI, there’s a sort of companion project I think from Buyer, where it’s like a Python library that links up with QuPath. So I know it’s also used in industry as well.
[00:14:59] I know, I can confirm that.
[00:15:02] Thank you.
[00:15:03] Wow, congratulations. So how did you know how to build this software? You used other stuff first, and you said commercial was a little bit restrictive, but you didn’t know? You didn’t have any experience, how does one start doing this, and also you said that you know how to code, but software development is a different story than just knowing how to code? I want to know this story, how did you know how to do it?
[00:15:30] I didn’t know how to do it and that’s why I spent two years not doing it. If I had known how to do it and had I had started it sooner, although also, if I had known how to do it, I would have known how much time it would take and I might have stopped sooner as well.
[00:15:41] So I’m not sure, it could have gone either way, but with the commercial software, I really didn’t actually use it very much because I knew that I didn’t want to use it because I would be locked into that, and I also didn’t really want to be influenced by it.
[00:15:57] So I’d rather be influenced by the problems and the people that I spoke to in the problems they wanted to solve as opposed to anything else that I’d seen, but the one software that absolutely did influence me hugely is ImageJ, and so-
[00:16:10] Which is open sourced as well.
[00:16:12] Yeah, and so, ImageJ was initially written by William Rasband from NIH, I think it was released around 1997, and it looks a little now. It’s only a couple of megabytes in size and just I think last week or last month, Nature included it in their list of 10 computer codes that transformed science because it’s everywhere. The impact is massive, for such a seemingly small thing it’s amazing what’s been achieved with it, and there was another project they called Fiji, which is image to A+ certain created plugins are particularly useful.
[00:16:50] And whenever I was in Heidelberg and I was using Fiji, I rode the Hamburg to try and teach image analysis for biologists because I thought [crosstalk] lots of individuals, if I can write in Hamburg then it becomes useful. That took me a lot deeper into the code and to figure out how it works and there was developer list at the time that I would read.
[00:17:11] I remember Johannes Schindelin and Curtis Rutten were very active on there, and so, I hadn’t met them at that point. I didn’t participate in the discussions. I didn’t think I would ever do any open source software myself, but I just read them out of interest and I got to see how they answered questions, how they thought about designing things and it’s only years later I realized how much that influenced me.
[00:17:37] I didn’t know that in the programming world you also have gurus and people that influence you.
[00:17:44] Well, I don’t think they tried to, they were just having those discussions on what they could do, but yeah [crosstalk] –
[00:17:52] I guess in every discipline there is the stereotype of a programmer that’s not really too outgoing, which the stereotype has been broken for me by many programmers already, but going back to QuPath, how are you addressing your improvements?
[00:18:15] So you said at the beginning you were just one step ahead, you didn’t know the whole picture. Now more or less you know the picture, and there are still new releases and you’re still improving, how do you decide what to do next?
[00:18:28] Yeah, so I’m motivated really by whatever I think is going to be the most useful thing to do, and so because I created QuPath not because I had a project to create new software, but because I had ISE analysis problems to solve. I was also a user of it. So I knew the things that I needed.
[00:18:49] You mentioned the annotation tools, so a lot of those were there because I had to make tens of thousands of annotations myself. And so, there’s lots of tricks and things that you can use and trump cards to make it a lot easier [inaudible] . Actually they are best documented on Twitter at the moment because I created a tutorial about them, but all of those-
[00:19:13] I know those from YouTube.
[00:19:15] Yes, [crosstalk] , but I probably added a bit more since then, but I added them because if something was annoying me in the software then I would change it. If it’s not annoying another user, then I might change it, but if I experience the annoyance myself it’s a bigger motivating factor, and so, a lot of the things that are planned for development in it are to do with projects that I’m working on.
[00:19:41] So if you want something in it, maybe it’s a good idea, and so I suffered the fact that it isn’t there, but as well because I have been supporting it, I have had literally thousands of conversations with users, and I get to see what comes up again and again. So I have an idea what people needed to do and what projects might be interesting, and I try and choose projects where I think if I solve this project then I can think of 20 more projects that are going to benefit from that.
[00:20:09] And so, I prioritize according to those things, but the one other thing that influences my priorities is, I think if there’s a really good business case for something, and if there are millions of dollars to be made in this application, I’m a lot less interested with it because I know other people will work on that, and there’s no really need for it. So I’m much more interested in the niche projects or the very specific questions that people are going to ask and try and work really broadly across a wide range of things.
[00:20:40] And I think that, that means you come up with new ideas and you have a broader range of tools when a new problem comes along you can rely upon those as opposed to being very focused on say, tumor recognition in HME or something that is incredibly common, because that leads to work on a massive scale and there’s people who do that better. And they have their resources and the time and the skills to do it, and that’s not really as interesting for me. I’m interested in a lot of things, but not as interested in that.
[00:21:08] So I think you said something super-important, when it bothers you, you start changing it, and I think this is the experience, well I don’t have the experience to change the software, but because I work in an environment where I would give feedback to software developers on this kind of software, I have this organic reaction when something sucks.
[00:21:31] Yeah.
[00:21:31] And I give very strong feedback, for the people that know where I come from, it’s natural, but for others I might maybe want to be a little bit more diplomatic, but the thing that you said is you are using it, and many software developers of digital image analysis software are not really using this tool.
[00:21:54] So you give feedback as a user and you don’t see understanding in their eyes because they don’t use it, which is natural.So that’s why I am trying to establish this bridge between computer scientists and pathologists or life scientists who are using it.
[00:22:15] It’s interesting that.
[00:22:15] But you incorporate this and I interviewed Andrew, and he wrote his DocQC, he wrote his DocQC for himself because he had to do QC [inaudible] like you did with QuPath you wanted to use it. So this is fantastic.
[00:22:31] I think that’s really important and it helps a lot and it’s one of the things that I really don’t want to get away from that because I think I would get totally out of touch and I would end up solving what I think is important, but which really isn’t. And whenever I was at Queens in Belfast, there was a pathologist that I worked with, Morris Loughrey, he had a very charming way of telling me what stuff sucked. And so, I still speak with him a lot and meet with him a lot, but it helps to have somebody who they are positive enough about the project that they care it and they support it, but they’re not going to pretend that something is good if it’s not.
[00:23:10] Yeah, I think a person like this should be in any digital image pathology software company.
[00:23:18] Yeah, I think you need to build a bit of a relationship with them first though because if you get hit [crosstalk] it might be harder.
[00:23:26] Yeah, you should be nice to people, right?
[00:23:31] Yeah. [crosstalk] .
[00:23:32] Exactly. So once on Twitter I saw that you had to fight for keeping QuPath open source, and this brings us to that open source question. First, why is QuPath open source and what is the philosophy behind it? You said a little bit about it that first you didn’t want to be restricted or locked into something, other thing you wanted to share it. Why open source, and how did you learn open source also?
[00:24:00] Well, there was the Masters project in Munich where I was developing open source, but to be honest it just felt like a project that I needed to do. It was really interesting, but I didn’t think very much more broadly about the fact that it was open source or not because I didn’t really know enough. So whenever I was developing QuPath because I developed it in a project that wasn’t to a degree open source software, and it looked like there was probably more direct ways to do ISE analysis than to build something as new and big as this. And so, it wasn’t necessarily the obvious way to approach it, but in the end I thought it was the right way to do it. It was the only way in which I could do it.
[00:24:46] Then that was slightly complicated things, and then we had to face the question as to whether it should be open source or not. I always wanted it to be open source because when I wrote this handbook for biologists, I made that openly available. I would rather maximize the usefulness of the software that I do, and my impression was that if we didn’t make it open source it would be a bad commercial alternative because there is commercial software out there, and there’s no reason we’d have to log this. Fine, there’s no reason why anyone would need me to make another commercial platform, there are better ones out there, but I thought that open source was where there was really a gap and there was a need for it.
[00:25:31] I think back to the two years I spent not having the tools to do the work that I wanted to do, and being really inefficient and thinking, okay I could publish a paper on ER and another paper on K67, but these are problems that there are so many papers on them, but don’t come with code. They don’t come with software that you can actually use in general unless it is commercial platform, and you can’t really build any standardization around that and there are so many people who are doing really hard work. It might be fantastic, but kind of reinventing the wheel, and so many postdocs and PhD students in a position that I was working quite inefficiently because the tools don’t exist in an open way.
[00:26:13] I thought if I could make these available then that would save them a lot of time, they’d be able to work more effectively, and they would also be able to share their algorithms in a way that becomes meaningful and other people can test them because if I made a paper and I see this method claims to be really good, I don’t know, it’s like a code. But if I can actually run it then I can see those will work in my images or those will not work in my images, and I think that is a big important thing for advancing the field. The ability to share and standardize, and as well to improve communications between NH analysis and pathologists because I know that initially whenever, I would develop an analysis method I would run it ahead.
[00:26:54] The input image, I would put a mock up image what’s detected, then a pathologist would look at it and say, “Well, it’s got all of those cells wrong. So you need to fix that.” And then if I can only show them a month later what my new results are after all the changes I have made that’s not a good way to work, whereas if I can show them in the software we can allocate around it. We can even retrain a new machine learning classifier in seconds then we can really work efficiently and effectively together and put those tools in everybody’s hand kind of raise the baseline for what everybody should be able to do because everyone has these tools, and if you’re a researcher developing something, you build on top of it and you do something better.
[00:27:32] So who did you have to fight, and what did you have to fight?
[00:27:37] Yeah, I need to be careful what I post on Twitter because I’m sure we all have different views on how that experience was, but I suppose what I can say is that whenever you develop software at a university, at least universities where I know, if you’re a postdoc you don’t own it. So it’s not your decision what happens with it. And so, if it hasn’t been built into the grant that it’s going to be open source there’s potentially a lot of people you need to convince, and that can be PIs, commercialization departments, funders and so on. And so, each has their own target, responsibilities, values, goals, and for perfectly good reasons they might not think the open source is the right thing to do.
[00:28:21] In my case I thought tat it was, but even though I wanted it open source from the beginning, I ended up spending a couple of years developing and it still hasn’t released and I wasn’t sure that was ever going to change, and so, ultimately the one that I thought I can do is I can leave. And so, I ended up, I handed in my notice thinking that maybe I lost the last few years and I’ll never become anything, but whenever I did that then it was released open source. And so, from my point of view it was worth it because otherwise, yeah I didn’t put in all of the work with developing it for it just to become a commercial thing or else just to be an in-house tool.
[00:29:01] I wanted it to be open and I thought leaving was the only thing I could do, and so, that was the way in which I tried to get it to happen, and fortunately it did. The addition to that is that because I couldn’t publish it and make it open source, I also couldn’t have much of a future in academia because no one had saw what I’d done and so I ended up in a company then, and after I joined that I was told that actually I couldn’t do any open source. I couldn’t continue it and support it when I was there.
[00:29:28] So I lost the next year when I couldn’t update QuPath at all, and so, it was release, and then I could just look at the use increasing, but I couldn’t actually update it. And so, I left that job as well to be unemployed for the next eight months or so, and travel around, and I did some talks and workshops to figure out what to do next and looking for the right atmosphere to be able to do the kind of open science that I wanted to do and that’s what eventually led me to Edinburgh.
[00:29:55] I had lots of conversations with them first just to make sure I could do the open science there, and so I started there with a new PI position. My first PI position in September 2018 and that’s when QuPath has really been a chance to revive it, but it’s very different I find now as opposed to when I was before developing it fairly closed and a very small number of users in the first few weeks when I was still at university. And then I go back and suddenly there’s quite a few thousand users and I’m trying to update it and know that any change that I make is potentially quite disruptive for these users. It’s a very different experience and it’s quite challenging.
[00:30:35] So now you are able to do QuPath, this is part of your job now, again?
[00:30:40] Yeah, and I make sure that when I apply for funding I put it in that it’s going to be open source just to make sure that everyone knows from the start that it is the key part of the reason for doing it, and then that helps just to make sure that we don’t have to figure that out later.
[00:30:56] So is there still a thread that someday when you stop working on it somebody is going to block this being open source?
[00:31:05] I don’t think so. I don’t think that’s really possible, because one thing I had to learn a lot on my open source license, it’s much more than I ever really wanted to, but the fact that it is open source now means that it can’t really be taken back. So even if I was to stop, the codes are out there, somebody else can take it and they can build on that. So, yeah.
[00:31:26] So the way it’s licensed it kind of protects it.
[00:31:28] Yeah, and it helps ensure that anything built on it is also going to be open source, and it’s one of the important things that there’s a big difference between freeware or open source, and it always frustrates me a little bit when I see people describe QuPath as freeware or open source software ass freeware because the fact that you don’t have to pay for it is only one small part of it.
[00:31:53] The open source itself, the real benefit is these terms in which it’s shared under when you can see all of the source code, you can change it even if you might not want to or you might not be somebody who develops software yourself.
[00:32:06] In principle, you have access to all of that and you can also it makes it clear the terms that you can share with others from. And so, that really is so much more than the software being free because the software could be free, but you don’t get the source code and you don’t have any of those rights, but with open source you have much more non-free software.
[00:32:25] All right, so basically if I was working with some developer and didn’t like something I could ask, please change.
[00:32:31] Yeah.
[00:32:31] And they could change, this is fantastic. So you mentioned machine learning.
[00:32:37] Mm-hmm [affirmative].
[00:32:38] So in what capacity is machine learning and artificial intelligence incorporated in QuPath, and also is it just machine learning or do you also have deep learning?
[00:32:49] Yeah. So already back in 2015, I remember having conversations about deep learning, it was clear that was something we would need to have in QuPath and it was something that we wanted to be able to work on because of various other things in life getting in the way, but it’s not officially there yet. But from the beginning, QuPath had conventional machine learning like random forests, artificial neural networks and so on, and basically it kind of means that you’ve got things like cell classification in QuPath. I would give it a few examples, and then QuPath would immediately train up the cell classifier to identify tumor versus non-tumor cells, and that is almost instantaneous.
[00:33:35] And so, you can apply that to 100,000 maybe even a million cells within seconds, and if you’re not happy with it and it says something else and it will reclassify all of your cells, and you can start to do that across multiple images and try and train up a classifier that’s quite powerful. And so, all of that’s conventional machine learning and all of that existed from the start. I’ve been thinking about deep learning for a long time, as it became more popular I certainly became convinced that it can do incredibly powerful things, but I also became convinced that on its own you still need ways to sort of, you are deep learning and it sounds great, but you need to convert it into something meaningful, some kind of measurement.
[00:34:19] And if you want to say quantify numbers of cells or something, deep learning alone is not going to give you that or maybe you want to measure something like distributions of cells, distance between them, if you want to look at your margins and so on, you need so much more as well as the deep learning component, and possibly in some cases, you can replace the deep learning with conventional techniques and it’ll do almost as well. And so, I thought okay, if I go straight into deep learning early and try and get that into QuPath, it’s going to look a lot more modern and fashionable and up to date and maybe now be more attractive, but it’s not going to solve nearly as many problems, as if I think of what through these thousands of conversations with users, I know what people need to do, and most projects that’s not yet deep learning.
[00:35:06] If deep learning was there it would help, but if you have deep learning and nothing else you haven’t solved the problem, and so, I’ve worked mostly on the things around it like being able to represent your million cells, classify them one way or another, handled multiplexed images and look at distance metrics and all of this kind of stuff, so all of the infrastructure. I even have a pixel classifier in there that I am looking at a few regions and based upon textures and colors [inaudible] start to generate predictions from right across your image, and all of it is designed in such a way that at some point, hopefully this year, you can just slot deep learning instead, and then suddenly you have deep learning and everything else.
[00:35:48] And so, I tried to do it in such a way that deep learning would be fun when it arrives, but when it arrives you actually have the other stuff already in place that you can rely upon. And so, yeah that was a long way of saying it. It doesn’t yet have deep learning although actually I can add to that, immediately before lockdown, I was at a conference in Bordeaux and I met Martin Yagonet, who has developed this [inaudible] , I think. A fantastic new place segmentation algorithm called Stardust, and I liked it so much that I developed a way to run it through QuPath whenever I got back. So if you want to get a deep learning via second test in QuPath, there’s instructions online for how you can do that. So that’s the first official example, the first one official example when I used deep learning in it, but they’re not publicly advertised or accessible yet.
[00:36:40] I kind of experienced a similar thing like you say deep learning, there is a lot of hype and it’s powerful, but I only see it powerful for segmentation and detection, but I always end up asking myself, okay what are going to be my endpoints? And when you decide for your endpoints it’s going to be, okay this you have to subtract even though you are already trained and detected, the cells that are at that distance have to go. So you end up setting thresholds and doing the classical sorting through your data. It’s not anymore for detection, I think deep learning at least for me, as a pathologist for detection is a revolution.
[00:37:24] Yeah.
[00:37:25] I don’t have to set any thresholds. I don’t have to balance between different characteristics. Okay, which one do I optimize for, no. I take my mouse, pen or whatever, I draw and it trains on what I draw, but downstream I still have to think, okay what am I going to do with this data and it comes to setting threshold in a different way.
[00:37:49] Yeah.
[00:37:49] The number is, anyway.
[00:37:51] So this is maybe going little bit away from pathology back to my old fluorescence days, but whenever you’ve got a fluorescence image there are certain things that you kind of know. If it’s from a microscope there’s a point of the function of the microscope, the noise or the possum distribution. You might have a focused light and you need to subtract. There are all these principles to do with where the image comes from, but if you want to quantitatively analyze it, you kind of need to know it, and you can’t rally take a shortcut. Ignore all of that years of stuff to get the learning and actually expect that you potentially lose a lot, and you potentially get things pretty badly wrong.
[00:38:27] In pathology, it will be perhaps stain separation with color de-convolution, if you ignore all of the principles there is a chance that eventually it will lead somewhere quite bad. So it helps to have the combination of this fantastically powerful tool of deep learning, but also still keep in mind that the image comes from somewhere. The pixels I use means something, they’re not just these arbitrary numbers that somehow you find a pattern in. They do mean something and you need to keep that in mind and try and always relate back to it, and not just trust the results.
[00:38:59] I think this is becoming a trend in the industry as well, many companies come from the classical machine learning, classical image analysis and computer vision and they are now adding deep learning on top of that. Others just start with deep learning, but all the ones that have been in the market a little bit longer they already have the non-deep learning tool; and they’re adding this on top of that.
[00:39:24] So if I wanted to learn QuPath, I know that there are some tutorials on YouTube. I took them for at least for annotations. I know there is a user manual on GitHub and that you were giving classes before like live lectures. What are you doing now in the COVID times, how do you spread the word and how do you teach people QuPath?
[00:39:47] Yes, there are some tutorials online that I recorded eventually at a time the documentation is no longer on GitHub, it’s on Read The Docs, it’s completely rewritten for the latest version, but what these COVID times have taught me is I have been saying yes to a lot of stuff. I was teaching, I was going a lot speaking about it, and not having the chance to do that has made me focus a little bit more on the software materials and to what’s perhaps a more effective way to do it. And so, increasingly if I want to create a tutorial and try and make it as usable as possible, so I will put it online.
[00:40:29] I do think there’s definitely a benefit from in-person teaching, but now that I have tried to establish a research group, and I have a couple of people working in it, I need to think how am I going to keep them? And so, yeah I’m thinking that there may in the future be in-person training in Edinburgh. I’d be very interested to know what the interest is on that, but as opposed to I go to lots of places and I teach all these free workshops, I need to start try and think of how can I fit this in so that it’s as useful as possible and helps as many people as possible, and that now I have to start to think about how it can be self-supportive.
[00:41:07] So yeah, I’m interested in feedback as to what people would like in that kind of regard. There is one online workshop, the entire workshop from the [inaudible] Institute where I taught just before lockdown again, and they have put the entire thing online. So that’s all on YouTube as well, but [crosstalk] –
[00:41:26] It’s available, can we share with the listeners that are linked to this?
[00:41:28] Yes, the QuPath is you put YouTube channel, and I link to all of the things there, and you can just [crosstalk] –
[00:41:34] Okay, great. I will definitely link that also in the show notes. So a couple of practical questions, you mentioned that you come from the IF world, are IF images [inaudible] images supported in QuPath?
[00:41:49] Yeah, they always were from the start more or less. That was one of those things that I had to sort of… Because I know projects that worked well, I had to sort of try and sneakily make it work without anyone really paying attention to it. So when I left Queens the latest version was 0.1 or 0.2 and that was the up to date one for a few years. Now we’re at 0.2, 0.3, I was very conservative with my conversion numbers, which I now start to regret, but the multiplex tools have improved a lot. So I can, for example, I have opened a mini image with 44 different channels and I can view them all simultaneously in QuPath, and there’s even a little channel viewer that I can move my mouse over and see them all side by side, so it’s a lot easier to interpret.
[00:42:39] Again, because I find multiplexed imaging even with 303 channels are hard to interpret. The shortcuts [inaudible] that I can toggle a channel on and off, I just type the number of the channel and then that will immediately toggle [inaudible] and it’s a lot easier to see what’s within the image. There’s this multichannel viewer, there’s cell detection should work as well as it ever works, which is variable, but it will also measure all of your channels through multiplexed image and you can train that machine and then classify as one marker at a time and then apply them sequentially in order to get whether a cell is positive for various different markers and then visualize all of that information. So all of that exists at the moment.
[00:43:19] Mm-hmm [affirmative], next question is exporting annotations an option, and I know it is because I googled it, but I kind of would look for it something like in a pure image scope where they are exported automatically and you can send them via email. I did not find that in QuPath, is this an option?
[00:43:42] Not yet. So Melvyn was the first person ever employed, whenever I joined Edinburgh, I got a small grant from the University of Edinburgh Wellcome Trust, and that brought Melvyn on board and so, he’s actually working on adding that as a menu item. So the trouble with exporting annotations is that it means something different for everybody. So in some cases it might be like a labeled image, where your image for the pixel values correspond to what you are looking at, and that’s in machine learning or deep learning, people usually want.
[00:44:12] For some people it can be the contours, like the coordinates of the boundaries, and so there’s a different representation for that, and then it can depend are there pixels or micrometers, is the origin the top left of the image or elsewhere? And so, there’s so many different things in there. So what I’ve tried to do in QuPath is to adopt an open standard, as far as I know the XML, the open microscopy viewer is not an open standard. I would support a QuPath if it was. Actually I tried to ask whether or not it could support a QuPath, but I never got an answer.
[00:44:52] And so, if it’s not an open standard I’m not going to do something that’s going to annoy a company with a [crosstalk] . I’ll just stay away from it, but yeah. So QuPath didn’t let you explore annotations because I was trying to figure out what would be the right formula in order to promote these kinds of open standards and make it possible to exchange information. And currently the way around that is that there are scripting recipes in QuPath for any kind of export you might want. You just type the one that you want, adapt it, and you can export in a totally flexible way, but if want a command on the menu, Melvyn will bring that to you for the next version.
[00:45:30] Okay. So you would need to know some coding to do that.
[00:45:35] Or just a copy and paste for most of it.
[00:45:35] I can do copy and paste.
[00:45:36] And there is a forum, so if you run it and it doesn’t work, you can ask in the forum, and there’s a lot of advice and discussions on the forum already.
[00:45:48] I will link the forum as well in the show notes, and next practical question, can you turn image analysis results into annotations and use them for deep learning training? So let’s say I do the K67 segmentation and it looks nice and I want this to be my annotations for deep learning, how do I do that?
[00:46:09] So, that’s going to be one of those annotation export recipes. So what I try to do with QuPath is it’s really easy to add a lot of extra buttons to the user interface and then it becomes intimidating and confusing. So I try to be relatively selective about the stuff that’s there and if anything seems highly customized it’s all through a script and we’ve loads of scripts that exist.
[00:46:38] I know a lot of people who don’t write code don’t like the sound of scripts. Going back to Morris, the pathologist that I know, I described a script to him once, and he said, “Oh, it’s like a macro, macro is fine.” So he was fine with a macro, but if I call it a script.
[00:46:53] So if it helps think about it as a macro and a lot of them you can copy and paste the one that you want, but the reason that it exists is because normally it’s specialist and niche enough that adding a button to do that is, I think probably are going to confuse people more than help them, but if it turns out that everybody wants the same sort then there will be button in the future version.
[00:47:14] Mm-hmm [affirmative], great. So what was your greatest challenge? This you probably had plenty, but one that you would have to choose.
[00:47:26] So there’s a really good book that I read recently from Nadia Eghbal, Working in Public: The Making and Maintenance of Open Source Software. So she had a really nice point in that about how software is viewed quite differently. So, for example, if I was making a chair and I had one customer for my chair, then that’s fine. If I have a million customers, that’s going to take a lot more work and a lot more resources to create a lot more chairs, but if I have mixed software, and I share that with one person then that’s fine, but if I upload it and a million people download it, it doesn’t really cost any more.
[00:48:02] There’s no more, it’s pretty much negligible. So it seems like it scales really a lot, but she pointed out in her book that actually the finite resource becomes the time of the maintainers and the people who develop it because suddenly if you have a lot of people using something, there might each suddenly be a question or two, but that really starts to add up at these kind of scales. And so, you can think that because it’s open source some of the community looks after it, but with most open source projects it really is only a couple of people who are really at the center of the storm. And so, that becomes quite a challenge to manage, but I guess it’s also worth it because without any users the entire thing is pointless, but I find that there’s different ways to engage with an open source project.
[00:48:55] You can go, you can ask your questions, you can leave that’s fine, or you can just not ask questions at all, but it’s really nice whenever I start to see people who have used the software they might ask some questions, but then they come back and they start answering questions for other people and helping [inaudible] and exchanging ideas, and that’s the exciting part for me. It doesn’t happen an awful lot yet, but there’s definitely some people who really do that and that’s one of the best ways to contribute and support an open source project. It’s often thought that if you want to support it you have to write code. Actually my heart sinks whenever I see someone want to change the code in QuPath, because I think I like the writing the code, and that means for the next few hours I need to review it and think for the implications and think how is this going to affect a thousand other users on different projects if I integrate this change?
[00:49:46] So writing code isn’t the most effective way to help. The most effective way to help is to engage and to protect and help other users and so on. And so, it’s trying to create that kind of community and ways of supporting this sustainably is the biggest challenge, and I think there’s one thing specific to academia I think about this is that none of this is really valued in academia, and that’s a bit of a problem because I kind of get the sense that software isn’t viewed as being research, and actually if I develop the machine learning algorithms, the cell detection algorithms that are in QuPath at the minute, and I just publish them as papers with no software that would be research.
[00:50:25] No one would use them, but it would be research, but if I do 10 times as much work, turn it into software, put it out there, support it and so on, then it becomes software. So you are not really a researcher, you’re a software developer and that’s a whole different funding, it’s hard to get a job and so on. So it seems that actively dis-incentivize people are actually sharing their stuff. Andrew Donawick, actually mentioned this in his podcast. He described that the worst thing that can happen with open source is that nobody uses it, but I would argue that actually the worse thing is more that people use it, and they find bugs and that’s embarrassing and maybe they find bugs that undermine your results and you need to retract your paper or maybe they demand support and it takes all of your time.
[00:51:05] There’s so many worse things could happen to people with people not using it, and so I think that open source is scary. It makes you feel really exposed, and it’s stressful and it’s time consuming. I think the way academia is working at the minute it makes it harder, and that means that it’s harder to create the tools and to support that other people in academia and in research actually want. And so, that’s the challenge is the culture of the thing, being able to cope with the project, keep it going, make it sustainable and hopefully get as useful as possible [crosstalk] .
[00:51:46] Academia values publications and citations, right?
[00:51:49] Yeah.
[00:51:49] So citations, I assume help bring this project forward. So here a big call for action to everybody who is listening, if you are using for any of your research please cite, and I think there are instructions on the website how to exactly cite or which paper to cite and I am going to also include that in my show notes.
[00:52:09] Thank you.
[00:52:12] Pete, what are the next steps in your research and in your exciting projects? Is it new exciting things within QuPath? You are basically employed by QuPath wherever you go.
[00:52:26] Yeah. So for me QuPath is only ever a means to an end, and so, I hope it will be useful. Yeah, occasionally I hear from companies and they suggest, “Oh, you can do this and then you’ll get more users.” I think, I don’t want more users, I want it to be as useful as possible. That could be a small number of users who do fantastic things with it or that could be a large number of users who maybe don’t do fantastic things with it. I would much rather have the smaller number who use it as useful as possible, and so, the next steps for me are to try and make it sustainable because I think there’s a lot more potential, and it’s limited at the minute by time.
[00:53:05] We’re a tiny group, there are two people, their funding runs out this year, and it’s me. So we’re a little bit stuck. I struggle with this software research tension that everything you have to do is collaborative because nobody really wants a computer scientist applying for a pathology grant because they’re on their own, because that’s not, I don’t know it. And so, it’s finding the collaborations and the research projects and how to recognize that if we work together we not only try and solve the AI analysis problem, but we also then have a [inaudible] platform that we integrate it into so that all other groups who care get the benefits off it, and I’m trying to share that message with people where we collaborate it more and we start to build those relationship that make it possible.
[00:53:53] At the minute a lot of projects where I’m usually the bottleneck because they’re all side projects for specific applications, but they are feeding into the next versions of the software. I don’t want to say too much about individual ones because it’s always someone else’s images, and I don’t want them to be annoyed, but yeah. There is deep learning coming, there’s lots of interesting research projects and the challenge is to get it altogether and to make sure it’s as useful as possible.
[00:54:22] Mm-hmm [affirmative], it’s the sustainability I get was a joke that you were employed by QuPath if you make it sustainable you can leave this job, and your mission is going to be fulfilled, at least.
[00:54:35] Yeah, I would be quite happy if I could stay in academia, and maybe if somebody requests something that [inaudible] then I’ll do something else other than QuPath. It doesn’t have to be QuPath, but at the minute I think there is something that it can probably offer.
[00:54:49] There’s a lot more than it can offer and I would like to keep it going and there’s a chance that yeah, [inaudible] and they all become better at doing it better than I am doing now, and I become useless and I can do something else instead, but at the minute I am still quite in the middle of it.
[00:55:07] Thank you so much. Thank you so much for coming and talking about it, and thank you so much for QuPath. I mean it’s fantastic, I love it. I only use it in a limited capacity, but you know what I’m going to learn to use more.
[00:55:22] Thank you. Thanks very much.
[00:55:25] Have a great day.
[00:55:27] Thanks, you too.
[00:55:28] And take care.