Top 5 Mistakes you must AVOID in using Machine Learning for pathology w/ Heather Couture, Pixel Scientia Labs

Top 5 Mistakes you must AVOID in using Machine Learning for pathology w/ Heather Couture, Pixel Scientia Labs

The field of pathology has been revolutionized by the introduction of machine learning techniques, which enable more efficient and accurate diagnoses and have the potential to some day even eliminate or reduce the number of expensive molecular tests. However, the model development is a complex process and there are certain mistakes that must be avoided when using machine learning for pathology.

In this informative discussion with Heather Couture, an expert in machine learning for pathology, she highlights the top 5 MISTAKES THAT YOU MUSTAVOID to ensure the best possible machine learning and deep learning project outcomes.

Through her insights, you will learn about the importance of the following 5 most common mistakes:

Not understanding your data and its challenges.
Diving in without researching prior work (academic research and open-source code) that is similar to what you’re trying to model.
Starting with too complex a model.
Not thinking ahead towards validation.
Not fully understanding how the technology will ultimately be used.

By avoiding these common mistakes, you can maximize the benefits of machine learning for pathology and ensure accurate and timely project results and product launches. Whether you are new to machine learning or an experienced practitioner, this discussion is a valuable resource for anyone interested in using machine learning (including deep learning) for pathology.

And if we are not connected already, let’s connect on LinkedIn!

watch on youtube

this episode’s resources

📰 Heather’s amazing newsletter (Computer Vision Insights)
🎧 Heather’s fantastic podcast “Impact AI”
🎙️ Aleks’ previous podcast with Heather (1) – Why machine learning expertise is needed for digital pathology projects
🎙️ Aleks’ previous podcast with Heather – How to make machine learning models more robust

Digital Pathology Place Resources

transcript

Aleksandra: [00:00:00] If you wanna learn about the top five mistakes that are being made in machine learning project for pathology, this is for you. Be sure to stay till the end because the most impactful mistake is actually at the end of the episode.

Welcome my digital pathology trailblazers. Today for the third time, and it’s not common that we have somebody, the same person for the third time on the podcast. I’m here with Heather Couture, the founder of Pixel Scientia Labs. Hi Heather. How are you today?

Heather: Good, thanks. Great to be here, Aleks. Thanks for having me again.

Aleksandra: Thank you for joining me for the third time. Before I let you introduce yourself, I’m gonna tell a couple of things about you. First of all, I refer to your newsletter, and if you guys are not subscribed yet and you’re in the computational pathology space, this is the only newsletter I’m regularly reading. It’s a fantastic recap of scientific literature. So, this is a resource Heather has another thing in addition or like to market her business is the podcast. She’s a fellow [00:01:00] podcaster. Now, the last time when we were talking, you didn’t have a podcast yet, but now she is releasing an episode every week about different aspects of AI and health care, and I’m gonna link to that as well.

So Heather, I have you for the third time as a lady, as a woman, very active in the digital and computational pathology space because March is the month of International Women’s Day, and I decided to feature a few of active female members of this community. So, give us a couple of words about you, your background, and about your business. And of course, I’m gonna link to all the episodes we already had before.

Heather: I’m Heather Couture. My background is in computer science, so I’m definitely not from the medical and pathology side, but I have grown to enjoy the medical aspect of it. My background is in computer vision, machine learning, deep learning once it came on the scene.

So, I’ve been working in those areas for almost 20 years now. I have a undergrad, master’s and a PhD in that area, and my PhD is where I came across the pathology. So I [00:02:00] was trying to use computer vision and machine learning to do interesting things, and I discovered h and e whole slide images and initially skin cancer and then breast cancer and trying to predict different properties of them from those images.

So that’s what got me into the pathology space. Since I’ve finished my PhD, I’ve been a full-time solo consultant, so I work with startups using machine learning and help them reduce the trial and error of their machine learning projects. So, these projects to analyze images for pathology are very complex and then require a lot of experimentation to figure out what types of models you should use, what you shouldn’t.

So, I work with startups to help them understand the latest research and best practices in order to get them their products to market faster.

Aleksandra: And I know you’re a very busy person, so there is a lot of demand for this type of service, for this type of assistance because deep learning, machine learning, image analysis, and now deep learning took pathology by storm, and now everybody wants to benefit from this method.

[00:03:00] Everybody wants to use this method because it’s a very powerful method, whether they are using coding and from scratch approaches or tools that are already built and ready to use. I wanted to ask you in this episode about the top five mistakes that you see when working on deep learning,
machine learning projects for digital pathology, for pathology.

What are the top five mistakes and how to avoid them?

Heather: That’s a great and very interesting topic, Alex. Let’s start at the top of the list. The ones that I came up with aren’t in any particular order, but we’ll hit on each of them. Yeah. The first one is just not understanding your data and its challenges.

A lot of engineers coming outta school with having done some machine learning. They’re used to clean benchmark academic data sets. That are, that they’ve had a lot of work done to clean them up to make sure the labels are right to make sure the classes are balanced. And that’s unfortunate. It’s not the real
world.

Images are complex. Data is complex, so any real-world data set you work with will have its own unique [00:04:00] challenges, and you need to understand those in order to choose the most appropriate model to get your data ready to go into that model for all the different purposes of machine learning. Need to
understand how big your data set is.

How it was annotated. Did the annotators even agree on the annotations? Maybe there’s some disagreement that you need to handle there. Are there artifacts in particular with pathology images, so things like bubbles and blurriness and tissue folds that you might need to deal with and it’s best to exclude those if you can clean them out of both your training and your inference data set.

It’s important to do the same process for both because otherwise these things aren’t what your machine learning model is necessarily trained to predict, and so it, they can very easily confuse it.

Aleksandra: I see it still not that often anymore where it’s still regarded. a little bit as magic that you, scientists are hoping to feed a couple of examples and have a very fantastic, robust, generalizable model on [00:05:00] data that was not shown to the model in the training set.

So, I guess that’s part of this data understanding problem as well. My question here is, how do you understand this data? Do you have a checklist? Do you have an understanding already what has to be checked always, or is it case by case basis? How do you approach that? How do you go from clean data sets with all the metadata fields filled and in the right place to working with real life images and getting results?

Heather: So, there’s a few different things there. One is to understand what the common challenges. In the particular type of data, you’re dealing with. So, if it’s H and E, whole slide images, there’s certain common challenges like artifacts, as I mentioned, like variations in staining from different labs, different scanners changes over time.

Those are just common things that you see with this type of imagery. So, you’re looking for those, but it’s also a case of talking to those who collected the data, those who annotated it, and those who are much more familiar with the patient [00:06:00] population. And the thing that you’re trying to use the data for than you as a machine learning expert we’re, I have a limited knowledge of pathology.

I have an increasing knowledge every time I talk to a pathologist, but I’m not the expert there. It’s pathologists and the clinicians and anybody else who is much more familiar with the data that can tell me more about the patients it was collected on how it was collected.

Are they different batch effects with some collected from some medical centers and others? From others. Was there a different process followed for annotation for some of the data and something in the annotation process changed part by through, and so the other portion of the data sets has a different process that was followed.

Anything like that, that can bring context around what the data’s composed of. And what the upcoming challenges are.

Aleksandra: Okay, so let’s say we know that already we understand our data.
What’s another problem that we can encounter?

Heather: The second one that I see is just diving in without understanding what’s been done before.

And by that, I mean what’s been done in the academic research, what’s been [00:07:00] done by other organizations? What code has already been produced for different, similar data sets? That can tell you a whole lot about what your challenges are and where you might start in tackling the problem on your data. So rather than reinventing the wheel, you get to understand some of that upfront, so it makes things a whole lot easier.

Aleksandra: I’m smiling, I’m gonna tell you why I’m smiling. I recently had a webinar series with Andrew Janik, who’s an author of series of open-source tools for you mentioned identifying batch effect basically for a quality control of Histo slides. And that was our recurring being stop reinventing the wheel, check what’s already out there, check what code is there.

And my question was, can the vendors capitalize on that? Can they use it or how can they use it? And you’re partially answering the question because my maybe naive approach was like, oh, it’s open source. You take it, you plug it in into your software and it works. I don’t think it works that way, but definitely what you mentioned to see what problems were already solved with it and what challenges [00:08:00] people encountered that are described in the literature or I don’t know, on those forums.

And then this is where I think your newsletter is an invaluable resource. I’m gonna keep emphasizing this because I like it a lot. It’s a super cool recap of literature and what has already been done. So when you work on those projects and do you see people doing something that was already done, reinventing the wheel, how do you approach that? How can. Get yourself out of this trap?

Heather: It definitely happens some of the time because it’s hard to keep track of what code is out there and what code is not just out there but is usable. Not everything is well documented. Not everything is easy to use out of the box on a new data set. So, it’s not just as something exists, it’s also it usable, but it’s also understanding what’s been done in the academic literature, what’s been published on similar data sets. To understand what things your model might fail on. So, things like staining variations if you’re not very familiar with that, learning about it to understand.

Aleksandra: And we have a podcast episode of that on adjusting the models to
the [00:09:00] domain shift.

Heather: Yeah.

Aleksandra: I got a link to that as well.

Heather: Good. But it’s, it’s also things like understanding how much data you might need in order to train a sufficiently accurate model. If somebody’s done something similar before that can give you a lot of information on how much data you might need to collect, what’s the best way to annotate it? What challenges have come up in that process? How have they handled it? We won’t give you all the answers, but it’ll give you a giant head start.

If somebody’s done something similar, it doesn’t even need to be the same type of cancer, doesn’t even need to be the same, exact same type of imagery. Just anything that’s related in some way can give you information on where to start and what tools to use and what’s some data sets and code and models are out there that you might be able to start from if they’re usable and if they have an
appropriate license.

Aleksandra: So, Heather, how much time do you dedicate per week to studying literature and for the business in general?

Heather: I think it varies. I’m always reading and I’m always writing about what I read, so I’m always sharing a couple publications every week. I’m probably spending a few hours a week [00:10:00] minimum if I’m helping a client with a particular challenge.

Maybe I’m spending quite a bit more in a particular week on reading and trying to understand their challenge, but it will vary from week to week, but minimum few hours reading papers every week.

Aleksandra: That’s a significant amount of time, and this is where this knowledge comes from, and I guess already this part is worth consulting them. I would say if I was in a situation to start a project that has possibly, you know, some financial implications or potential to change something in healthcare, to
have this as a consultation would be huge value to me. Okay. So, let’s say we more or less know what has been done, we understand our data.

Are there any other traps that we can fall into?

Heather: The next one that I would say is starting with two complex a model, and that means trying to implement the latest thing from the latest computer vision paper as your first try. Instead of starting with the very simple baseline to get an idea of how your model performance, what the challenges are before [00:11:00] putting all the extra effort into training something complex and implementing something complex.

I think that simple baseline is very important as well and can tell you down the line whether you’re improving upon it. You know something that you can keep rerunning if you change your data set, if you add additional data. For some classes or get additional annotations, something like that, you can always rerun your baseline and see whether the more complex model that you develop later on can improve upon it.

But without knowing how the simple model does, you don’t know what you don’t know.

Aleksandra: What would be a simple model in contrast to a complex model?

Do you have an example or something that you worked on?

Heather: It’s really gonna depend on what it is you’re trying to do.

Aleksandra: These are here words of wisdom. It depends on what you’re trying to do with it.

Heather: That simple baseline will depend on what it is you’re trying to model.

But let’s come up with a simple scenario. Let’s try, let’s say you’re trying to classify patches of it, image from H and E by tissue type of tumor, not tumor stromer, whatever your classes are. So, you’re just trying to classify image patches.

[00:12:00] So a simple baseline in this scenario, maybe that is grabbing a pre- trained model doesn’t even have to be pre-trained on histology. So that means it’s been trained for some other purpose on some other type of, It’s not tuned to histology necessarily, but you can still apply it to extract features.

And then so you’ll get some features from each image patch, and you have your annotations for each image patch, and maybe you apply a simple linear model. Logistic regression for anybody who’s familiar with classifiers is about the simplest there. So, from those features, you try and predict the class and you see how it does, you are using deep learning, but you’re not learning all the features in that very complex and heavyweight deep learning model.

You’re just using how, see how it does out of the box? Yeah. That. One of the simplest space lines that you could try for a classification scenario like that.

Aleksandra: So, I have it’s not deep learning, but it’s an example of the same principle. When I used to work in the classical computer vision space for a tissue image analysis company, I was on the other side where you are now.

I was the pathologist working with computer scientist in image analysis scientist, [00:13:00] and there was this push for always having to segment cells for quantifying I H C, where many times it would deliver same information.

Usually some percentage of area stained, if you would just analyze pixels, but somehow it was regarded as too basic and then you would have to run into all those problems in cell segmentations with the cells didn’t match what the cells look on the tissue.

And that would be like my comments for qc. No, these are not the cells. You’re over segmenting. How about we just quantify the stain? And there was a big pushback because this was too simply regarded as too simplest. Even though you could do the same correlations if it was analyzed in the same area, you would get basically the same amount of information.

So, I guess the principle doesn’t change, even if the methods change.

Heather: Yeah. Start simple and from there, you know how it performs and chances are you will need to develop something more complex to get a good model. But you at least have some direction on where you might go and what your [00:14:00] challenges are.

Are you dealing with class imbalances where a small portion of it is tumor and most of it is non-tumor? And how are you gonna handle that in your next iteration? So, it just, it starts to bring up some of the challenges that you can think about it for the next step.

Aleksandra: Question to this. Part of the model development, how do you track your progress?

And a question to you as well, do you have tools or do you recommend, or do you use tools for tracking progress in machine slash deep learning model development for pathology? Is it just a simple spreadsheet or? Do you need some specific tools? What’s your approach to that?

Heather: Any of the above can work. It depends on the complexity of what you’re getting at.

If you’re working with a team, and this is a long ongoing project and you’re gonna be training many models, then you probably don’t wanna be putting in a spreadsheet. Maybe you do that for your first simple baseline and then you find a better tool as things get more complex. There’s a number of tools out there.

There’s weights and biases, there’s Comet, there’s a number of other ones. [00:15:00] I don’t necessarily have a favorite other than find something that works for you and your team. Yeah. Depending on what your needs are and the rest of the ML operations. Side and integrating the tools there something is better than nothing and it keeps track of all your metrics over many different iterations and maybe even links it to a particular version of your data set, a particular version of your codes.

Everything is tied together and could be rerun if needed.

Aleksandra: So, have you helped with such a big project as well, and I assume you have used those tools?

Heather: Some of my clients definitely use them. It depends on what level of engagement I am with them as whether I’m involved in that side. For any even moderate size project, it’s definitely easier to be using a tool to keep track of things.

Aleksandra: Yeah, I think this type of software is real, let’s say, maybe it’s not new because machine learning has been around for a longer time, but has not yet been brought to the attention of computational pathology and digital pathology scientists, especially on the non-computer science side.

[00:16:00] I just recently learned about those tools and I’m like, wow. You know those tools have been around, but I compare it to image management systems. Image manages now. Now nobody questions the necessity of an image management system, but for a long time it was putting those images in folders manually and naming them manually, and now nobody even plays that anymore. I think we’re gonna get to using the machine learning slash deep learning project management tools more often in the computational pathology space as well.

Heather: Yeah, and those tools for machine learning are at least the ones to manage models, and the results are much more recent with the development of deep learning, because prior to that, there was a limited number of parameters to tweak.

So, I’m talking about hyper parameter. Like the size of your model and some other configuration like that you might need to do a search over to find what the best type of parameters are for your particular model. And so, when you’re searching over many different hyper parameters like that’s where using the tools makes a lot more [00:17:00] sense than a spreadsheet.

Aleksandra: So what’s the number of possible parameters? What’s the ballpark
in the complex model in a long-term project. How many of these would we be
trying to monitor?

Heather: There’s parameters and there’s hyper parameters. So parameters are
the weights in the model that deep learning model would need to learn and that’s
in the millions, tens, or hundreds of millions.

Aleksandra: Okay.

Heather: And that is not something we’re learning by hand. That’s something that. You give it data in order to learn those. So hyper parameters are things like the size of your model, you know how wide the layers are, how deep they are, what size individual filters are in it.

So, like a receptive field, what size of window to look at as it’s capturing different patterns in your image. It’s also things like what optimizer, how much image augmentation, so how much random transformations to apply to your image, which transformations to apply. All those can be regarded as a, as hyper parameters.

That you can take a guess at them, and if you’re experienced, you might guess in the right ballpark, but generally you need to tweak them at least somewhat to get [00:18:00] to an optimal model. And you need to be looking at the results from the previous model. You trained to figure out how to tweak them if you’re doing it by hand or to throw an optim a Parameter optimizer.

Add it in order to learn over a large range what those should be. And it can do it by randomly. Let’s pick a set, see how it does randomly pick another set, see if it’s better, or it can be a more systematic way to learn them based on the results of the previous hyper parameters. But it depends on the model.

So which ones. Irrelevant which ones you even want to spend the time and the
computation to learn.

Aleksandra: Yeah. I think this is something that is already out of the scope of a
non-computational scientist or out of the to me, I’m overwhelmed with this. Once I realized, so for classical computer vision, you are on par with the model developer and you’re tweaking this together.

And then with deep learning, the moment I realize how many different variables are there to be changed. And now you just [00:19:00] mentioned that you can actually automate it and do a random way of doing this. That’s I need an expert.

How about? Is there an expert to help me with that?

And this is basically where this bridge between computer science and pathology needs to be built. So okay, we know those. Biggest pitfalls. What else? Are we good to go or is there anything else?

Heather: The next one that comes to mind is not thinking ahead towards validation. Sure. It’s great if you understand your data.

You’ve trained some model, but how are you gonna know whether it’s a good model? The first thing in the machine learning world is that you need to be sure your data, even your training set, you wanna split it into different subsets. We typically call it the training, the validation, and the test set. So, the training set, you train your model on your validation set is where you evaluate the hyper parameters that you set.

So, with the optimization we just talked about, and then the test set you set aside, and you don’t look at it until later. So that’s. The set that hasn’t been touched, but you can evaluate at the end how your model’s doing on there. So, you wanna be sure you have split out those sets before you start training and [00:20:00] that you have made those divisions very carefully in that if you’ve got multiple images from the same patient, they’re not spread across the sets, they’re in one of those.

And maybe even if you’ve got data from different medical centers, maybe those all should be in only one of those sets instead of spreading across ’em as well.

Because that will allow you to evaluate how well your model does on a medical center that is different than the one you trained on. So those are very key in order to set up model training in order to evaluate how well you’re doing. But you also wanna think a step further, which is how are you going to evaluate this on maybe an external cohort of data, maybe images from a different scanner, different patient populations, if those are things that your model will encounter once it’s deployed, you need to be validating on those.

And so, you need to be thinking ahead about how to collect that data, how to annotate it and have it ready for when you need it, cuz you don’t wanna be doing that at the last minute. And you wanna be able to understand whether your model can accommodate the potential domain [00:21:00] shifts, which hopefully after understanding your data, you know what they are and you wanna validate against.

Aleksandra: So, I remember, I think it was 2018, maybe I was the conference with my computer scientist friends. It was a pathology conference where pathologists were presenting their data. And I remember this friend of mine coming to me and tell your colleagues that you cannot presents stuff that was only run on your training set, where’s your validation set so that there was no awareness that you cannot just tweak fantastically your image analysis solution, whether classical computer vision or whatever to this data set where it has seen.

All the samples and then presented as results. But because this was not a conference for computer scientists, and I only had this friend of mine there because we worked together, I didn’t catch it. Nobody caught it. Only the people who had background in this. Now I think it’s more mainstream knowledge.

Everybody has heard was [00:22:00] doing this, has heard at least once, training, test, and validation set in whichever combination it is used, but I can imagine they’re also similar little traps like this, like different ways of validation and things where you should where ask an expert how to do it so that it’s not only scientifically sound, but like really performs, or I guess.

Heather: The stratifying by medical center or whatever that subgroup is that you’re concerned about, whether your model can generalize to that part’s not obvious to an outsider. It’s not obvious to somebody who hasn’t touched medical data before. And so that’s.

Aleksandra: See, I was actually, I was thinking, oh no, you should not do that. When you mentioned, oh, one patient in one day said, I was like, why? How is it supposed to perform? Why don’t you spread it? What’s the reason? It’s not obvious. It wasn’t obvious to me. Maybe I can comment on that.

Heather: It comes down to what’s called overfitting. So, if your model sees a set of patients, it can learn properties of the images and in that, those particular [00:23:00] patients. And so, if you give it an another image from one of those patients, it will probably do better on that image than on images from patients it hasn’t seen.

And the same would be true of different scanners if the colors have changed or there might be something unique to a particular medical center, maybe the pattern of artifacts in your image is different from one medical center than another. And so, the model might pick up on that and learn something about the
medical center instead of about what’s in the patient’s image.

Aleksandra: See, I didn’t know that. I thought you should mix them all
together. Spread equally and there you go.

It’s gonna be generalizable. I totally didn’t think that by putting similar or like
potential batch. Together, you exclude good performance of model. Trained on
this data, on data that doesn’t have dispatch. Thank you for this insight.

Heather: You’re trying to measure the realistic performance. That’s the way to
think about it.

If you’re in the real world, your model is gonna see images from different patients or different medical centers or different characteristics that I can’t even come up with. [00:24:00] That’s what you need to be validating against to see how it performs in that scenario. This has become that much more important with deep learning, with models, with millions of parameters.

It’s that much easier to overfit in the more traditional machine learning days where maybe you’re looking at the size of nuclei and shape and intensity, different things like that. It would be harder to overfit to a particular patient or to a particular medical center, but you throw a deep learning model with millions of parameters at it, and I guarantee you it’s capable of overfitting to anything that’s in there. If there’s pen markings circling the tumors, It’ll learn that.

Aleksandra: Yeah. I guess often the batch effect causing features are pretty strong and they get reworded because you can easily distinguish your images by that very feature, so we have four. Four biggest mistakes. What’s your fifth one?

Heather: And the last one I have is definitely not the last one of importance. Perhaps the first one, importance, but it’s not fully understanding how the technology will ultimately be [00:25:00] used. Maybe this should actually come first. This is how will the model be integrated in the clinical workflow or whatever its use case, UN understanding who will need to work with it, how they will use it. Is it the AI going first and then the clinician screening the results or vice versa?

Is there a clinician even looking at the results, or is a model expected to be fully automated? All of that needs to be understood so that you can come up with a proper way to validate it so that you understand what challenges might be present in that real world scenario so that you are actually solving a relevant problem.

So, a, as a machine learning engineer, I could probably train a model on an image to do something doesn’t mean it’s going to be useful. I cannot determine what’s useful. I’m outside the medical world, but those who need to use the technology and who will be working with it on a day-to-day basis, sometime in the future, hopefully it needs to be, solve a relevant problem to them and it needs to be integrated in a way that will help their workflow. [00:26:00]

Aleksandra: Definitely super important that the intended use defines how you’re gonna deal with validation and with whatever. It used to be simpler. Now all the aspects have to be taken into account, like also integrating into existing systems. Like some of those models can be created in a software. Some like how do you integrate one that was not in a software into a software?

How do you like. Deploy it in the workflow. The more powerful the technology, the more complex it gets. So, the more the main experts or subject matter expert you need to involve, and that’s fascinating about digital pathology, even though it’s called pathology pathologist is just one member of this team, machine learning scientist.

Is another one, but there are few other ones that have to be consulted and work together. So Heather, thank you so much for giving us those five biggest mistakes. I hope this is gonna make your job easier for those who have listened to this [00:27:00] episode, that they will be aware because it’s margin. Because I’m interviewing female leaders.

I wanted to ask you about your experience as a female machine learning, computational pathology scientist. When you go to conferences, be it pathology conference or I assume machine learning engineer conference even more so you will see more men than women. Just tell me about your experience in this field.

Did you ever made conscious of yourself being a female in this field? Other than just looking around and seeing mostly guys around.

Heather: I’ve always seen mostly guys around in my field from undergrad on, so I went to an all girls high school, so I went from all females. To mostly male. Starting in undergrad. I was in a computer science undergrad within the math department, so I don’t know what the percentage female was, but it was pretty low, and it depended on the department within that.

But going from undergrad to masters to working in this field it’s always been mostly men, but I haven’t come from, maybe it’s because I came from an all girls high school. I was just used to [00:28:00] doing what I do. It never occurred to me. Girls shouldn’t do math or computers or anything like that. I just, I did what I wanted to do.

I did what I’m interested in, and I continue to do that. I think within computer science and even within sub parts of machine learning and computer vision, the percentage female varies depending on what you’re working on. There are some applications that are much more abstract and that in the math department.

Things like peer math is very heavily male, but as you get into more medical applications or I think it’s things where you get closer to impact, and this is just my observation, I don’t have any numbers to back it up, but there tends to be somewhat higher percentage female in some of these application areas closer to helping patients or closer to whatever they, the impact of your work.

Aleksandra: Sounds great. This is the best experience one can have, as a female researcher, as a female, just not to ever be made conscious or aware or, feel different because of your gender.

Heather: Yeah.

Aleksandra: This is great.

Heather: I’m aware of it, but I. I don’t think I do anything different [00:29:00] because of it necessarily. Some of my courses in grad school, I was the only female in, but it just depended on the class size and didn’t stop me from taking the course that I wanted to take. I know what I wanna do in my career and I go for that.

Aleksandra: Heather, where can people find you?

Heather: Two places, one is my website. pixelscient.com. So, P I X E L S C I E N T I A.COM. And on there you can find the newsletter that Alex mentioned, as well as the podcast, which is Impact ai. And you can also look me up on LinkedIn, Heather de Couture. You can connect with me there. I post regularly and I’m happy to chat with anybody on LinkedIn.

Aleksandra: And who are your perfect customers? Who can you help most in their digital computational pathology or product development journey in this. If your phone should start ringing after this episode, who would you wanna have phone calls from?

Heather: My ideal clients are startups who are applying machine learning to some type of pathology images, and they might just be getting started, or they might have a team established, but they’re hitting on certain [00:30:00] challenges.

So, in those teams, they often bring on machine learning engineers who haven’t touched pathology images before, just because it is so hard to hire in this space. So, bringing in the context of somebody who has worked with this type of imagery can be very beneficial. Or if you’re just getting started and don’t know what you don’t know, that’s also a great time to start talking and see how we can work together.

Aleksandra: So, what is the framework that you usually use? Like you mentioned that there are different levels of involvement. How can you get involved with those business?

Heather: So mostly I work in an advisory capacity. It will depend on exactly what the client needs do they have a great team and they just need help on a certain issue.

And so, we set up a one-time project, or do they wanna talk once a month at a high level, just to make sure they’re going in the right direction and keeping on top with the latest research and best practices.

Or is their team much more junior and they need more regular meetings to review results, to make suggestions on what they should do next to guide them in their [00:31:00] literature search, even the beginning, to ask questions about their data so that they are so their engineers. Know what to look for and know how to analyze it and different things like that.

Aleksandra: Thank you so much. I’m gonna link to all the resources that we mentioned, your website, our previous episodes in the show notes of this episode. And for those who are not subscribed yet, Heather has a newsletter, and she has a podcast as well.

So go ahead and check those resources. Thank you so much, Heather. Have a
great day.

Heather: Great. Thank you for having me, Alex.

Thank you for staying till the end. You rock and you’re a real digital pathology trailblazer. So, no matter if you’re ready for Heather’s services and consulting, or you are doing this on your own, now you know the five mistakes.

So, let’s avoid them and let’s advance digital and computational pathology in the healthcare system, and I talk to you in the next episode.

Top 5 Mistakes you must AVOID in using Machine Learning for pathology w/ Heather Couture, Pixel Scientia Labs