Have you ever done a bunch of annotations for your deep learning model and realized that your model was confused and that you had to redo all of them, and you were not sure why?
I know I have! There was one project I was advising for where an external annotator was hired to do annotations. It was a pathologist with a lot of work experience and the annotations of pathological structures he did were scientifically correct from the pathology point of view, but not necessarily from the computer vision point of view. He annotated loosely circled structures of interest. “Loosely circled” in this context means, that the object of interest was within the annotation, but so was a bunch of pixels surrounding the object of interest that did not belong to the class he was training for. These annotations had to be redone not to confuse the model. This brings us to the:
1st rule of annotations for deep learning model development
Keep your classes clean
Are you ready to see the real-life demonstration of how to keep your classes clean, and some more annotation tips and tricks?
Register to view the on-demand webinar here
Maybe you had the situation where you did annotations for a deep learning model development, and they were very detailed. Your classes were clean, no confusing pixels were included within your annotation, but your model still didn’t perform?
Something similar happened to me in one of my first deep learning projects, where we were trying to work on a colitis model. The first set of annotations we did was really detailed, but the annotations were too small. They didn’t show the overall heterogeneity of the colon tissue, nor did they show the transitions of one appearance of the tissue into the other. This brings us to the:
2nd rule of annotations for deep learning model development:
Size matters
The size of the annotations will be partially determined by the heterogeneity of the tissue you are working with.
If you are ready to see the real-life demonstration on how the size of your annotations matters, and some more annotation tips and tricks
Register to view the on-demand webinar here
You now know the first (keep your classes clean) and the second (size matters) rules of annotating for deep learning model development. But is there a rule on how many annotations do you have to make to develop a robust deep learning AI model for tissue image analysis? How many annotations is enough? Will a few annotated regions be sufficient or do we need to annotate a few thousand? And is more always better?
Before I had learned the computer vision principles relevant to tissue image analysis I thought that annotating proportionally to what I see in the tissue will be good enough, so I annotated many healthy structures and the few diseased structures I could find. I was convinced that if the numbers of my annotations correlated to the numbers of structures of interest in the tissue, I should have an adequate amount of data annotated. Unfortunately for deep learning model development, the rules were slightly different. The smaller amount of diseased structures in medical (including pathology) data sets is known as class imbalance which is a problem in deep learning model development. We have to correct for this imbalance with the number of annotations we make. This brings us to the:
3rd rule of annotations for deep learning model development:
Look out for class imbalance
The class imbalance can be accounted for with annotations by over-annotating the underrepresented class or under-annotating the overrepresented class.
If you are ready to see the real-life demonstration on how to take care of class imbalance with annotations, and some more annotation tips and tricks
Register to view the on-demand webinar here
If you stuck to the first three rules but still drifted with your ground truth generation, I feel you. I’ve been there. At the beginning of the project, I would have a plan. I knew exactly what I wanted to annotate and how I wanted to do it. I wanted the annotation plan to be fixed before the beginning of the project and followed during the project. However, in the deep learning tissue image analysis projects, I had to change my approaches in the midst of the model development and I was annoyed with that.
Then I realized that the “fixed annotation plan” approach worked well for classical computer vision projects but not necessarily for the deep learning AI-based projects. For deep learning model development, you end up adjusting your ground truth during the model development and it’s OK. This brings us to the:
4th rule of annotations for deep learning model development:
Being wrong is OK but staying wrong is not
Adjusting your ground truth along the model development is a normal part of the development itself. We are giving the model examples with our annotations and are seeking feedback from the model by evaluating its performance and later adjusting the ground truth again. However, at the end of the project, after completing all the necessary annotation cycles, we need to go back and make everything consistent with our final version of how we want the ground truth to be annotated. Even though there will be a drift and shift during the process, we need to go back and clean up before we finalize the model training and move on to the testing phase.
If you are you ready to see the real-life demonstration on how to approach the ground truth generation drift and keep your final annotations consistent
Comments are closed.