Digital Pathology 101 Chapter 3 | Image Analysis, Artificial Intelligence, and Machined Learning in Pathology

Digital Pathology 101 Chapter 3 | Image Analysis, Artificial Intelligence, and Machined Learning in Pathology

Get the PDF of “Digital Pathology 101” Book here

Get the paper copy of “Digital Pathology 101” on AMAZON

Image analysis has supported pathology since the introduction of whole slide scanners to the market, and when deep learning entered the scene of computer vision tissue image analysis gained superpowers.

There are regulatory compliant AI-based image analysis tools available for practicing pathology around the globe.

So what shall you do, just embrace them and start using?

I would learn a bit about image analysis and AI first, to be able to make an informed decision.

Good news, you can get all the information needed for this informed decision from this very chapter of the “Digital Pathology 101” book that I have published for you.

From Chapter 3 you will learn the fundamentals of tissue image analysis and how it helps extract meaningful data from digital pathology images.

We break it down into basic concepts like

regions and objects of interest,
matching computer vision techniques to pathology tasks, and the
differences between classical machine learning and AI-based deep learning approaches.

Understanding these foundations sets the stage for appreciating how image analysis is applied in regulated clinical settings versus exploratory research environments. You will learn the importance of quality control, because flawed data inputs inevitably lead to faulty outputs, regardless of the analysis method used.

Moving on, you will familiarize yourself with the key terminology from the world of artificial intelligence and machine learning.

The chapter clarifies the meaning of concepts like

supervised learning,
GPUs,
data augmentation, and
heat maps.

It emphasizes how techniques like

patching and
data augmentation

enable the training of machine learning algorithms on large datasets.

Ultimately, by comprehending this terminology and the basics of tissue image analysis, you’ll gain clarity on how these tools can provide decision support to pathologists through computer-aided diagnosis. Rather than seeing AI as a black box, you’ll have insight into how it arrives at its outputs.

With this balanced understanding, you’ll be equipped to make discerning choices about embracing AI tools in your pathology practice, leveraging their benefits while being aware of current limitations.

Stay tuned as we continue unpacking the transformative potential of digital pathology!
Talk to you in chapter 4!

Support the show

Get the PDF of “Digital Pathology 101” Book here

Get the paper copy of “Digital Pathology 101” on AMAZON

Become a Digital Pathology Trailblazer and See you inside the club: Digital Pathology Club Membership

watch on YouTube

DIGITAL PATHOLOGY RESOURCES

transcript

CHAPTER 3: IMAGE ANALYSIS, ARTIFICIAL INTELLIGENCE, AND MACHINE LEARNING IN PATHOLOGY

I. Introduction

As we continue our exploration of the transformative potential of digital
pathology, we are met with two complementary fields: image analysis and artificial
intelligence . Together, these disciplines have created a fertile ground for the application of machine learning in the field of pathology. This chapter explores the essence of these fields, the ways they interlink, and how they are being utilized in modern pathology.

Digital pathology, enabled by the high-resolution imaging of pathology slides, has
made pathology data accessible for computational analysis. Traditional
histopathological evaluation, while robust and time-tested, is not without its limitations, which include subjectivity, variability, and time consumption. With the advent of powerful computing systems and algorithms, a new approach evolved – one that uses the principles of image analysis, machine learning, and AI to augment traditional diagnostic methods.

This chapter offers an overview of these concepts, delving into the intricacies of
tissue image analysis, and how classical and AI-based approaches are being adopted. It
also touches upon the importance of quality control in an environment inundated by
data. Furthermore, it details how these technologies are used in both regulated and
non-regulated environments, as well as the challenges and opportunities that lie therein.

The aim of this chapter is to form a connection between these high-level
computational domains and everyday pathology practice, shedding light on their integral part in the continual digital evolution of this field. Let’s commence our exploration of the dynamic interface where pathology, image analysis, and artificial intelligence converge.

II. Tissue Image Analysis

A. Introduction to Tissue Image Analysis

Tissue image analysis is an integral component of digital pathology, serving as the interpreter of the vast amounts of data contained within digital images of tissue samples. This process involves applying computer vision algorithms to translate visual data into quantifiable measurements. By doing so, we can extract, organize, and analyze a wealth of intricate details that could otherwise be overlooked or inconsistently interpreted by the human eye. This, in turn, enhances the precision and reliability of diagnostic and research outputs. Tissue image analysis doesn’t just speed up the manual evaluation; it significantly augments our ability to perceive and understand what’s on the image. By enabling the extraction of intricate patterns and details that might otherwise go unnoticed in traditional visual evaluation, it unravels new insights, propelling our comprehension and handling of diseases to a new level.

Despite the significant capabilities of tissue image analysis, it’s crucial to remember that this powerful tool is designed to resolve specified, narrow tasks. The process demands upfront investment in model or algorithm development, requiring the defined task of the tissue image analysis solution to be well-outlined at the project’s inception. A digital pathology scientist or a pathologist performs a wide range of duties – a spectrum of tasks far beyond the scope of current image analysis capabilities. As such, it is unrealistic to expect AI – powered image analysis to replace these professionals. However, AI can undoubtedly assist in performing well-defined, routine tasks like detection, classification, counting, and measurement of structures and areas within the tissue.

For instance, tissue image analysis can aid in quantifying positive IHC-stained cells or detecting different types of special stains. However, it’s essential to be aware that it isn’t a one-size-fits-all solution. When addressing a problem that is not simply binary (for example, a question that extends beyond distinguishing a marker positive cell from a marker negative cell), the development of the model requires considerable planning and effort. You might need to build several image analysis algorithms that answer different binary questions in a sequence. Consequently, while tissue image analysis is an extraordinarily powerful tool in pathology, it comes with its set of limitations that users must understand and consider in their implementation plans.

B. Computer Vision to Pathology Vision Translation

Our vision system effortlessly identifies and makes sense of the shapes, structures, and objects around us. The automaticity of this process often leaves us oblivious to the complexity of these operations. The field of computer vision aims to emulate this sophisticated human process of vision in machines, translating the holistic perception of the world into a structured, stepwise analysis that computers can execute.

Consider the example of identifying giraffes in a photograph. As humans, we intuitively perceive all the giraffes concurrently and discern each one from the other. Translating this capability into computer vision terms, the most advanced approach would be ‘instance segmentation,’ where each giraffe is detected, segmented, and considered a separate entity, allowing us to capture nuanced differences among them.

A slightly less intricate approach would be ‘semantic segmentation,’ where all giraffes are classified as a single category, losing individual distinction but simplifying the process. The most rudimentary method would involve simply drawing a ‘bounding box’ around each giraffe, labeling them, thus giving a general sense of location and identity, but little else.

These diverse methods represent different levels of sophistication in computer vision, mirroring the complexity of human visual perception to varying degrees. This translation from human vision to computer vision—a concept we refer to as ‘Computer Vision to Pathology Vision Translation’—enables us to communicate the rich and intricate details of our visual perception in a language that machines understand and process. The ultimate goal is to use this powerful tool to facilitate more accurate, efficient, and insightful analyses in fields like digital pathology.

C. Pathology Vision Problems: Structures of Interest and Tasks of Interest

As we apply image analysis to the realm of tissue pathology, we’re tasked with clarifying the problems we aim to solve, informed by how a pathologist or a scientist perceives the image. This intricate perception, referred to as “Pathology vision,” involves the identification of structures and tasks of interest.

Structures of interest, elements central to the work of pathologists, are broadly categorized into regions and objects. Regions are distinct areas within a tissue slide that merit focused examination due to their unique characteristics. They may embody different types of tissue, each of which possess their own set of traits. Conversely, objects refer to discrete structures or components within the tissue, such as cells or nuclei, that require specific identification and analysis.

These structures often inform the tasks of interest, the analysis endpoints we aim to achieve. These tasks may range from identifying disease markers within a tissue sample, to evaluating the effectiveness of a particular treatment. Tissue samples themselves can be divided into different compartments based on their characteristics. A tumor region, for instance, can be further broken down into tumor stromal and tumor epithelial regions.

In pathology, the basic unit of analysis is often the cell. However, interest may also extend to non-cell objects, like Alzheimer’s plaques, or to cell assemblies treated as a single unit, like the kidney glomerulus. By discerning and analyzing these distinct tissue compartments and objects of interest, pathologists can extract information out of images in a structured and logical way Choosing the appropriate method for analysis hinges on the specific task at hand.

For instance, if our task is to determine the area of Alzheimer’s plaques, we would need to identify the exact boundaries of the plaques. Alternatively, if we aim for a comprehensive analysis involving the identification, localization, delineation, and counting of plaques, the computer needs to be instructed to use a different, more suitable approach.

Tasks refer to the specific analysis endpoints we want to design our analysis for. For example, we may want to identify and count certain objects or regions of interest within the sample.

Various computer vision techniques can be applied to pathology allowing us to detect, localize, and quantify objects and regions of interest. Depending on our goal, we might utilize:

● object detection,
● semantic segmentation,
● instance segmentation, or
● panoptic segmentation.

Each comes with its strengths, limitations, and computational costs.

The selection of the most efficient and suitable technique is a critical step. As computational power remains a finite and costly resource, it’s essential to align the choice of the approach with the specific needs of our pathology image analysis. This will result in an accurate and fit for purpose image analysis solution, that will deliver the appropriate quantitative data from pathology images.

Therefore, understanding the various tasks involved in pathology analysis is essential for leveraging the power of tissue image analysis.

In the case of counting Alzheimer’s plaques, the task is to accurately determine the number of plaques present in the sample. Unlike other tasks that may require more detailed information about the individual objects, such as delineating their boundaries, the goal here is simply to obtain an accurate count. This can be achieved by using bounding boxes to identify the location of each plaque within the sample.

This approach is called object detection.

If we want to identify and quantify the exact region where the plaques are present, we may take a different approach. In this case, we are interested in calculating the area of the plaques.

To achieve this, we need to obtain the exact boundaries of the plaques, as this information is necessary for accurately calculating their area. While it may be helpful to know whether multiple plaques are clustered together, for the purposes of calculating the area, it is not necessary to separate them. Instead, we can focus on obtaining the correct boundaries and area measurements to achieve our goal.

This approach is called semantic segmentation.

In some cases, we may require a comprehensive analysis that involves identifying objects, localizing, quantifying, and counting them. For instance, we may need to count the number of plaques and calculate the area of each plaque. What do we need to achieve this task? We require all the information, including the correct boundaries of each individual plaque. Unlike the previous example where multiple plaques were considered as one region, in this case, we want each plaque to be counted separately to obtain accurate results. Therefore, depending on our analysis goals, we need to instruct the computer to use an appropriate approach.

This approach is called instance segmentation.

Computer vision has revolutionized the field of pathology by allowing us to detect, localize, and quantify objects and regions of interest in pathology images. As described above, there are several computer vision tasks that can help us solve pathology problems. One of the tasks is object detection, which involves identifying and counting specific objects, such as Alzheimer’s plaques. Another task is semantic segmentation, which focuses on detecting regions of interest, such as tumor regions or tissue compartments. Instance segmentation takes this a step further by not only detecting but also delineating and counting each instance of an object or region.

Choosing the appropriate computer vision approach depends on the specific problem at hand. For example, panoptic segmentation is a newer approach that classifies and segments each pixel, regardless of the chosen class. However, this approach can be computationally expensive (this should not be underestimated as processing can take even up to several hours per slide) and may not always be necessary. By selecting the most efficient and appropriate approach, we can save both time and computational power.

As computational power is still a finite resource and expensive, it is crucial to choose the correct approach to manage costs and ensure timely analysis of pathology images. Understanding the different computer vision approaches and how they match our pathology image analysis needs can help us make informed decisions in selecting the most effective and efficient approach.

III. Classical and AI-Based Approaches to Tissue Image Analysis

In the discipline of tissue image analysis, two primary methodologies prevail: classical, machine learning method also known as a method using hand-crafted features, and deep learning – based (also generally called AI-based). The approach one chooses hinges upon the specific requirements of the project, the available resources, and the degree of control desired over the analytical process.

The classical (aka traditional) approach utilizes hand-crafted features, where the designer consciously instructs the algorithm on the specific properties it should quantify in the image, such as the length, size, shape, and color intensity of the cells. In this scenario, the algorithm operates within a predefined framework, which yields predictable, reproducible outcomes.

Deep learning, a subset of AI-based approach, however, involves training a model through the provision of examples and annotations. Instead of explicitly defining the features, this method feeds the model with a diverse array of examples that embody the desired features, and the model learns to discern intricate patterns independently. Feature extraction and classification are combined; the model learns the relevant features based on the given examples, thereby eliminating the need for manual feature definition.

In essence, the classical method offers a deterministic, hands-on approach, with direct control over the elements that the algorithm focuses on. In contrast, AI-based approaches, particularly deep learning, promise adaptability and learning capacity, enabling models to discern intricate patterns that may escape the human eye or remain unconsidered by human operators. However, deep learning requires more computational resources and a larger dataset for training compared to its classical counterpart. Both approaches, when appropriately utilized, can yield valuable insights from tissue images and contribute significantly to advancements in digital pathology.

These techniques can also be used in conjunction, augmenting their individual strengths. For instance, we can initially employ a deep learning approach to detect and categorize elements within an image – such as the glomeruli in kidneys – based on annotated examples. Once these elements are detected, we can then use classical machine learning methods to further classify them based on visible, hand-crafted features such as size or color.

A. Application in Regulated and Non-Regulated Environments

The domain of tissue image analysis sees its applications spread across both regulated and non-regulated environments. To understand this, it’s essential to define these two environments:

Regulated environments refer to those that are under the purview of strict governmental or regulatory bodies, such as the FDA in the United States. These environments are subject to adherence to rigorous guidelines and rules, often necessitating detailed processes for approvals and certifications.

For example, in the U.S., laboratories that perform health-related testing on human specimens are required to adhere to the Clinical Laboratory Improvement Amendments (CLIA) regulations. This regulatory framework is designed to ensure the accuracy, reliability, and timeliness of patient test results, regardless of where the test was performed.

Moreover, for preclinical trials, labs must comply with Good Laboratory Practice (GLP) regulations, a set of principles intended to ensure the quality and integrity of non-clinical laboratory studies that support research or marketing permits for products regulated by government agencies.

On a more generalized level, labs involved in pharmaceutical, medical device, and biotechnology product development also abide by Good Practices (GxP) regulations. GxP encompasses a broad range of compliance-related activities such as Good Clinical Practices (GCP) and Good Manufacturing Practices (GMP), ensuring that these products are safe, meet their intended use, and adhere to quality processes during their development lifecycle.

In the specific context of tissue image analysis, these regulated environments become particularly pertinent when the analysis informs therapeutic decisions or when used in clinical diagnostics. Thus, regulated environments ensure a high level of compliance, quality, and safety, thereby guaranteeing reliable results that can guide accurate diagnosis and effective therapeutic decisions.

Non-regulated environments, on the other hand, typically encompass research or academic settings, where the constraints are not as stringent. The primary objective in these environments is to advance scientific understanding, contributing to the pool of knowledge which may eventually lead to clinical applications.

Most of the software tools available for tissue image analysis today are catered towards non-regulated environments, like research laboratories and academic institutions. These tools provide researchers with the freedom to experiment and innovate without the necessity for regulatory clearance. For instance, researchers might employ these tools to study the characteristics of different cell types, or to understand the growth patterns of a particular cancer type. This freedom allows them to push boundaries and make significant strides in understanding disease pathologies.

However, there are several image analysis algorithms that have received approval or clearance from regulatory bodies like the U.S. FDA or the European Medicines Agency. These algorithms are applied in regulated environments such as hospitals and clinics in both the United States and Europe. Specifically designed to assist in medical decision-making processes, these algorithms undergo rigorous testing and verification procedures under the strict guidelines of these regulatory bodies to ensure their accuracy and reliability.

One key application of such algorithms is in the realm of biomarker quantification. Biomarkers, which are measurable indicators of a biological state or condition, play an essential role in determining disease progression and therapeutic responses. For instance, Her2/neu is a protein that is overexpressed in some types of breast cancer. FDA-cleared algorithms can accurately quantify the level of Her2/neu expression in tissue samples, providing valuable insights that guide the course of treatment.

Furthermore, these FDA-approved or cleared algorithms are often paired with what is known as a companion diagnostic laboratory test. A companion diagnostic is a medical device that provides information essential for the safe and effective use of a corresponding therapeutic product.

For instance, in the case of cancer patients undergoing targeted therapy, the expression levels of a specific protein in tissue might determine the efficacy of a certain drug. A companion diagnostic test is then used to ascertain the presence and quantify the levels of this protein in patients. One notable example of this is the PD-L1 immunohistochemistry test that is used alongside certain immunotherapy drugs. The test measures the level of PD-L1 protein expression in cancer cells, and this result assists doctors in deciding whether or not the patient is likely to benefit from the specific immunotherapy drug.

In summary, the applications of tissue image analysis span a broad spectrum, from the exploration and discovery in research settings to the precision and regulatory adherence required in clinical diagnostics. With advancements in technology and deeper understanding of disease mechanisms, the potential for new applications is vast and constantly growing and the tools are becoming more available, better integrated in the pathology workflow and less costly with every development iteration.

B. Importance of Quality Control: Garbage In, Garbage Out

The quality control of tissue image analysis is absolutely crucial. This is a truth that extends across all fields of scientific endeavor, encapsulated by the well-known maxim, “garbage in, garbage out”. This adage is particularly poignant in the realm of tissue image analysis. If the algorithms are irresponsibly designed or based on inferior or inadequate images (for instance, images with an excess of artifacts), or if inconsistent and poor-quality annotations are provided, the result will be fundamentally flawed, rendering it essentially unreliable.

Developing algorithms for image analysis is not simply a matter of plugging in data and letting it run. Rather, it involves meticulously curating the data used, ensuring that each image is of high quality, devoid of distortive artifacts, and correctly annotated. Concurrently, performing thorough control checks on the quality of the output is not an optional step, but an essential part of the process.

Skipping these stages is equivalent to shirking our responsibility as scientists. It not only compromises the integrity of the research but also has potential ramifications for patients whose diagnoses and treatments may depend on the accuracy of these analyses.

Our responsibility goes beyond just creating and using image analysis algorithms. We also need to maintain high standards in our scientific methods and work to improve patient results. By strictly controlling the quality of our image analysis, we promote solid scientific practices and contribute to improving patient care in the future.

IV. Introduction to Artificial Intelligence and Machine Learning

We have already referred to AI and machine learning several times, and explained some aspects of it, but in this part, we will structure and systematize this knowledge, and delve into the key terminology that underpins this field.

This area of study involves a significant amount of technical language. By understanding this terminology, we can develop a strong foundation for the rest of our exploration into this exciting and rapidly evolving field.

It is worth noting that computer scientists who wish to be active partners in this discussion will also need to acquire a basic understanding of pathology, as this is a highly interdisciplinary area. As such, we must work towards bridging the gap between life scientists, pathologists, and computer scientists to facilitate collaboration and drive innovation.

All the pathology information necessary for computer scientists and non-pathologists to thrive in the digital pathology space can be found in the “Pathology 101 for Tissue Image analysis” course or in the Digital Pathology Club.

So, let us begin our journey by embracing the challenge of learning this technical language and laying the groundwork for a deeper understanding of artificial intelligence.

V. What is AI?

Artificial intelligence is a fascinating and rapidly growing field within computer science that involves creating machines that appear to have intelligence.

That’s it. That’s the definition. As you can see it is very broad.

AI has a wide range of applications, including problem-solving and task completion, where it can suggest the next sentence or complete a task. Additionally, AI is used in areas such as visual perception, speech recognition, decision-making, and language translation. For example, Google Translate is a widely used tool that utilizes AI to translate text from one language to another.

Over the years, AI has made significant improvements in accuracy and efficiency. For instance, Google Translate has greatly improved from its early days, when it would often produce clumsy sentences that needed to be corrected. As a result of these improvements, more and more people are relying on AI to help them in many everyday tasks, such as navigation when traveling, text to speech and speech to text transformation, task completion and many others.

As we dive deeper into this field, it is important to understand the different approaches to AI, such as machine learning and deep learning, as well as the ethical and societal considerations associated with the use of AI. By learning more about AI, we can appreciate its potential to transform various industries and pave the way for new discoveries and innovations.

There are different ways to categorize AI, and one common approach is to divide it based on capability. We can classify AI into three categories: narrow AI, general AI, and strong AI.

Narrow AI, as the name suggests, is capable of performing a single task, such as image recognition, speech recognition, or language translation. For example, when you use Google Photos, the algorithm recognizes faces and sorts images into albums based on the people depicted in them.

General AI, on the other hand, would have human-level intelligence and be able to perform a wide range of tasks.

Strong AI would surpass human intelligence and be capable of solving complex problems beyond human capabilities.

Currently, we are still in the narrow AI stage, and while there are concerns about AI taking over jobs, the reality is that it is mainly used to assist humans in completing tasks. For example, AI can assist in counting brown cells stained with IHC, highlighting the neoplastic cells in tissue samples or performing other repetitive tasks on pathology images, but it is not yet advanced enough to replace human expertise in more complex diagnostic processes. And when AI predicts something from the image, that’s not visible with the human eye, which is the case when predicting molecular properties of tissue. It currently merely replaces a different test – a genetic test so its scope is still pretty narrow.

AI is a rapidly developing field that offers great potential in various industries. By understanding the different types of AI and their capabilities, we can appreciate its strengths and limitations and make informed decisions about how best to leverage it to improve our lives and the practice of pathology.

In pathology AI has found significant applications especially in the area of image analysis. AI automated image analysis algorithms can assist pathologists with detecting and classifying abnormalities in tissue samples. These AI algorithms can handle a massive volume of images, performing tasks like segmenting specific structures, quantifying biomarkers, or even predicting disease progression based on patterns in the tissue.

However, the use of AI in pathology is not limited to image analysis. With the development of sophisticated large language models like ChatGPT by OpenAI, AI is beginning to transform other aspects of pathology as well. One such area is pathology reporting. Traditionally, pathologists generate text-based reports outlining their findings, which can be a complex and time-consuming task. AI models can assist in streamlining this process by helping to generate preliminary reports based on the pathologist’s notes, voice recordings or potentially even directly from the analyzed images.

Moreover, AI models can assist in navigating text-based patient data, a task that can be overwhelming given the sheer volume of data in electronic health records. AI can be used to quickly search and extract relevant information from these records, improving the efficiency of data retrieval and potentially uncovering vital information that could assist in diagnosis and treatment planning. As such, the integration of AI in pathology is transforming the field, offering more efficient processes, greater accuracy, and ultimately, improved patient care.

VI. Pathology Informatics Terminology

Pathology informatics is a multidisciplinary field that combines pathology, computer science, and information technology. It focuses on the application and integration of these fields to process, analyze, manage, and interpret biological, pathological, and medical information.

In the context of a pathology laboratory, informatics may involve numerous applications. For example, it could encompass the management of laboratory information systems (LIS), digital pathology and image analysis, electronic health records (EHR), and decision support systems.

In this chapter we will focus on explaining the terms and vocabulary related to digital pathology and image analysis. While the list included here is not exhaustive and each of the terms would probably deserve a separate chapter, it is a great starting point for those who are just entering the field of digital pathology and a good recap and teaching material for those who are already involved in the field.

So let’s dive into it!

Machine learning: Machine learning is an integral part of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming. The learning occurs as the machine identifies patterns and makes decisions from data. This process is somewhat akin to how humans learn from their experiences. There are primarily three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning, with a fourth type known as semi-supervised learning.

1. Supervised Learning: This is the most common type of machine learning. In supervised learning, the machine is trained using labeled data. This means that each piece of training data comes with a corresponding output value, often referred to as a label. For example, in a task to categorize images of cats and dogs, each image would be labeled either as ‘cat’ or ‘dog’. The machine learning model will use these examples to learn the features that distinguish a cat from a dog, and then apply this learning to unlabeled images in the future.

2. Unsupervised Learning: Unlike supervised learning, unsupervised learning involves training the machine using unlabeled data. The goal is for the machine to find underlying patterns or structure within the data itself. One common use of unsupervised learning is in clustering algorithms, where the machine groups similar data together. For example, given a large dataset of patient data, an unsupervised learning algorithm could potentially cluster patients based on common characteristics, even though no predefined categories were given.

3. Reinforcement Learning: Reinforcement learning is a unique type of machine learning where the model learns to make decisions by performing actions and receiving rewards or penalties in return. The model’s aim is to make the best choices that give the most reward over time. For example, a model used for medical image analysis could learn how to best find issues in scans. The model’s actions could be moving around the scan, and its reward would be correctly finding a problem. Reinforcement learning can also play a crucial role in the process of providing annotations for an image analysis model. In this scenario, the model, acting as the learning agent, interacts with the data, making annotation decisions based on the given images. For every correct annotation provided, the model receives a positive reward, reinforcing the behavior that led to the correct identification. On the contrary, when it provides an incorrect annotation, it gets a penalty, encouraging the model to avoid the steps that led to the error. An example could be a model that’s being trained to recognize cancerous cells in histopathological images. The model scans the image and decides whether to label each pixel or a group of pixels as cancerous or non-cancerous. If it correctly identifies and labels a cancerous region, it gets a positive reward. If it mislabels a region, it gets a penalty. Over time, by learning from its actions and the corresponding rewards or penalties, the model can improve its ability to provide accurate annotations, ultimately leading to a more precise image analysis. This approach can be especially useful in the presence of large datasets that would be time-consuming and expensive to annotate everything manually.

4. Semi-Supervised Learning: Semi-supervised learning is an approach that combines aspects of both supervised and unsupervised learning. In this scenario, the machine learning model is trained using a small amount of labeled data supplemented by a large amount of unlabeled data. The idea is that the machine can use the unlabeled data to better understand the underlying structure of the data and make better predictions on the labeled data. This approach can be beneficial when labeled data is scarce or expensive to obtain, as it can improve model performance with less need for manual labeling effort.

Each of these machine learning types has its strengths and specific applications, and the choice of which to use depends on the problem at hand and the data available.

5. Deep learning is a branch of machine learning that uses models resembling neuronal networks, with artificial neural networks being the models used in deep learning. Convolutional neural networks (CNN) are a popular type of neural network used in tissue image analysis and are particularly effective for tasks such as semantic segmentation and object detection. Another type of network used in digital pathology is generative adversarial networks (GANs), which are used for tasks such as stain normalization and virtual staining.

6. Explainable AI (XAI) is an emerging aspect in the field of artificial intelligence that aims to address the opaqueness of certain AI models, often described as ‘black boxes.’ The core idea is to make AI decisions transparent, understandable, and interpretable to human users, hence allowing them to gain insights into the logic that drives the AI’s decision-making process.

The need for XAI becomes crucial in areas such as healthcare, and more specifically, in pathology where AI models are increasingly being used to analyze tissue images and inform patient care decisions. When a deep learning model, for instance, identifies malignant cells within a tissue slide, it’s important for the pathologist to understand why and how the model came to that conclusion. Was it the shape of the cells, their clustering pattern, color variation, or some other feature that led the model to its conclusion? With XAI, these questions can be addressed, offering pathologists a more comprehensive understanding of the algorithm’s diagnostic process.

Additionally, XAI helps build trust in AI systems. It’s only natural for clinicians to be skeptical of a system that gives results without explaining its reasoning, especially when patient health is at stake. By presenting understandable reasons behind its decisions, XAI can reassure clinicians of the AI’s reliability, helping them feel more comfortable incorporating AI technology into their clinical decision-making processes.

Another crucial aspect of XAI is its contribution to the improvement and debugging of AI models. If a model makes a mistake, understanding the reasoning behind that mistake can guide data scientists and developers in refining the model, improving its accuracy, and ensuring it makes more reliable predictions in the future.

Overall, XAI is about bridging the gap between human understanding and machine intelligence, ensuring that AI technology is not only high-performing but also understandable and trustworthy, which is particularly essential in sensitive domains like healthcare.

7. Data augmentation: Data augmentation plays a pivotal role in artificial intelligence and machine learning, serving as a strategy to increase the quantity of available data without the exhaustive process of manually collecting and labeling new examples. By introducing minor variations to the original data, the model gains exposure to a wider array of scenarios, promoting its generalization capabilities and mitigating overfitting.

Consider a tissue sample image in pathology with four mitotic figures. By applying data augmentation, we could generate an array of new training examples from this single image. Simple transformations such as rotations and flipping can easily quadruple our dataset. For instance, a 90-degree rotation of the image three times results in four data points out of one, while horizontal and vertical flips increase these again to sixteen.

Beyond rotations and flips, there are many other data augmentation techniques that we can use, such as scaling (increasing or decreasing the size of the image), translation (shifting the image left, right, up, or down), and noise injection (adding random noise to the image) among others.

Zooming in and out on the image can help the model better identify mitotic figures of varying sizes. Similarly, translation could help the model learn to recognize mitotic figures irrespective of their location within the image.

In color pathology images, color jittering, which includes random changes to the image’s brightness, contrast, and saturation, could also be applied. This ensures that the model performs well even under varying lighting conditions or due to staining inconsistencies.

In essence, each augmentation technique presents a unique perspective of the original image, and collectively, they help the machine learning model develop a robust understanding of the features to look for, improving its performance on unseen, real-world data.

8. Heat map: Heat maps serve as a powerful visualization tool in the machine learning domain, particularly when interpreting the decision-making process of algorithms. By highlighting areas in an image that the algorithm deems significant for classification, heat maps offer an intuitive way to understand where the model focuses its attention.

Consider an image of a giraffe used in an animal classification task. The model might concentrate on specific regions, such as the long neck, the specific shape of the head, the distinct pattern of spots, or the elongated legs, to recognize the animal as a giraffe. A heat map applied to this image would emphasize these key areas, enabling us to visualize which features the model utilizes for its decisions.
.
Heat maps can also serve additional purposes, like generating probability heat maps. These maps provide a pixel-by-pixel probability of the image belonging to a specific class, offering a more granular view of the algorithm’s classification process.

Similarly, attention heatmaps shed light on the areas where the algorithm is directing most of its focus during the analysis. This is particularly valuable in tasks that involve complex images, where understanding the model’s attention distribution can provide insights into its functioning.

In conclusion, heat maps, whether used for probability visualization or attention tracking, are an indispensable tool in deep learning and machine learning, providing a visual interpretation of an algorithm’s decision-making process.

9. Computer Vision: Computer vision is a branch of computer science that focuses on enabling computers to understand and process visual data. It is fundamentally about teaching machines to “see” and interpret images in the same way humans do, but with greater accuracy and consistency. In the field of pathology, computer vision techniques are applied to analyze microscopic images of tissue samples, enabling automated disease detection and diagnosis.

10. Graphics Processing Units (GPU): Graphics Processing Units, more commonly known as GPUs, are specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. These units are particularly useful in computer vision tasks as they enable swift and efficient image processing. Given their high computational capabilities, GPUs are critical when managing large images with millions of pixels, such as whole slide images.

Patching: In the realm of image analysis, especially with whole slide images, the volume of data can be vast and overwhelming for a computer program to process all at once. Therefore, these large images are often divided into smaller sections called patches or tiles. This technique, known as patching, facilitates the handling and processing of large images. Once a model is trained on these patches, it can conclude about objects and regions in the whole image.

Computer-Aided Diagnosis: Computer-aided diagnosis, sometimes referred to as decision support systems, refers to the use of artificial intelligence algorithms to assist medical professionals in detecting and diagnosing diseases. In the context of image analysis, these systems can highlight areas in a tissue image that appear suspicious or that warrant further examination. While they do not replace the pathologist, they significantly streamline the diagnostic process by making it more efficient and accurate. With reliable image analysis tools in place, pathologists can provide more timely and precise diagnoses, thereby enhancing patient care.

VII. Summary

AI is a vast field encompassing various sub-disciplines, including machine learning. Machine learning includes techniques like patching and deep learning, and employs artificial neural networks like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs).

Patching is a strategy used to manage large image datasets by dividing images into smaller, more manageable sections. These patches then become the foundation for model training in the realm of computer vision, a subfield that applies AI and machine learning to analyze and interpret images.

The process of analyzing these images necessitates high computational capacity, often fulfilled by Graphics Processing Units (GPUs). These advanced processors enable the swift processing of massive data volumes, essential for image analysis.

To bolster the precision of image analysis algorithms, data augmentation is often employed. This technique enables the expansion of labeled data by creating variations from a smaller original set by e.g. rotating or flipping images.

Among the visualization techniques used in deep learning, heat maps stand out. These graphical representations highlight areas of interest or focus in an image, indicating regions with high probability of certain features. Heat maps can take the form of probability or attention heat maps, offering insights into model focus and decision-making.

The culmination of these techniques is the development of computer-aided diagnosis or decision support systems, which are invaluable tools for pathologists and scientists.

Although this summary provides just a brief explanation of these concepts, it is a strong starting point for future discussions surrounding digital pathology.

Digital pathology is a young and dynamic discipline. To be part of the discussion and contribute to its development, we must understand the pathology informatics terminology, which is the language used in the digital pathology world, as well as AI and image analysis principles.

It’s important to remember that while AI and image analysis are powerful tools, they are not magic, and we must be aware of their limitations. Also, digital pathology is a team effort that involves different disciplines, mainly life scientists and computer scientists, but also regulatory experts and other subject matter experts. Pathologists play a crucial role in the life science group, but they too need to work in this multidisciplinary team and understand the language of the other involved parties.

Digital Pathology 101 Chapter 3 | Image Analysis, Artificial Intelligence, and Machined Learning in Pathology