Since several years the pathology conferences host more and more lectures about digitalization and artificial intelligence (AI) applications in pathology. The accuracy of the presented AI models often comes close and sometimes exceeds the inter and intra-pathologist variability. Terms such as machine learning, deep learning, neural networks, and other computer science terminology are being increasingly used in pathology context.
As the discussion about AI increases, so does the confusion… Are we really approaching a new era where AI will dominate the domain of pathology or is it still a long way off? How will the pathologist’s role change if AI is applied? Are we ready?
Pathology informatics terms every self-respecting pathologist (and life scientist) should be familiar with
To be able to follow the digital pathology image analysis discussions and in order to be up to date with the newest artificial intelligence achievements, it is crucial to understand the terminology used in pathology informatics. Below is a list of most relevant and most frequently used terms necessary to understand the field and their explanations for non-informaticians.
This infographic illustrates the dependencies between the pathology informatics terms every pathologist and life scientist should be familiar with. CV – computer vision, GPU – Graphics Processing Unit, CADx – Computer-Aided Diagnosis, ANN – Artificial Neural Networks, CNN – Convolutional Neural Networks
Artificial intelligence (AI)
Even though the term sounds very modern, it was first used in 1956 already. It refers to a branch of computer science which deals with creating systems which can function intelligently and independently by learning from already available data. Some of the things based on AI that we encounter every day are:
- Google predictive searches when we start typing and a list of possible queries appears
- Product recommendations when Amazon suggests “you might also like…” or “other customers who bought this, also bought…”
- Music recommendations by Spotify, based on the songs you already like listening to
- Google maps calculating the fastest route based on the available traffic information
…and many many others. In many aspects of our life AI is already fully integrated, but in others, there is still a long way to go.
To get an explanation of AI for people in a hurry with an overview of the sub-disciplines included under the AI umbrella, watch this youtube video by Raj Ramesh.
Machine learning
This term started being used in the late 1980s and early 1990s. It is the science of getting computers to solve problems without being programmed. This branch of AI is based on the idea that computers can learn from data, as humans learn from experience, and based on that make decisions about new data, without human interactions. There are different ways the machines can learn:
- supervised learning, when a big amount of labeled data (eg. annotations) is needed to train the system,
- unsupervised learning, when the machine “figures out” labels by grouping unlabeled data points with similar features together, and
- reinforcement learning where the user gives the feedback to the system, whenever the system labels data incorrectly, and the system learns from the feedback. The more feedback is given, the more accurate the system becomes.
To get a 7-minute introduction to machine learning, watch this youtube video from the Simplilearn YouTube channel.
Random forest
Random forest is one of the most popular supervised machine learning algorithms. It creates a “forest” out of a number of decision tree classifiers. It is great for many different types of data (often used in the financial sector), however, in complex pathology classification problems, random forest is usually outperformed by deep learning models [1].
Deep learning
Deep learning is a subfield of machine learning in which the models learning from input data resemble in their structure neural networks of the human brain.
Artificial neural networks (ANN)
ANNs are the models within deep learning, also called “nets”, “neural nets” or just “models”. They are used to classify objects into one of at least two different categories. They are highly structured and have many layers. The first layer is the input layer, and the last one is the output layer. All layers in between are called hidden layers.
Convolutional neural networks (CNN)
CNNs are a type of ANN that has dominated the deep learning space, widely used in the machine vision field. The best performing models in the CAMELYON challenge were using CNNs. These networks are very deep and require servers with GPUs for training. They are very powerful, but being a supervised learning method, they require a large set of labeled data for training, thus depend on their availability (i.e. no annotations no model).
Computer vision (CV)
CV is a field of computer science concerned with computers’ ability to extract high-level understanding from digital images, mimicking human vision. Deep learning algorithms, like CNNs perform well in computer vision tasks.
Data augmentation
Data augmentation is a way to get more input data for training our model when we don’t have more data. In the context of pathology image data, it’s augmentation means enlarging the available data set by minimally altering structures of interest eg. by rotating or flipping annotated structures.
The graphic shows how data augmentation is achieved. From just four mitotic figures, by rotating their images sixteen training data points could be obtained.
Probability heat maps
Probability heat maps are a color-coded way of visualizing the classification results of a deep learning model. The probability of a pixel in the image belonging to a particular class (e.g. epithelial cells) is depicted as a color on a color scale, with one end of the scale corresponding to 0% probability and the other end to 100% probability. The colors are superimposed on the whole slide image.
Patching
The size of whole slide images is huge (15 GB for a typical image, which corresponds to three 3-hour long high-quality movies on Netflix). To be processed by the algorithm, they need to be divided into smaller parts – patches. This process is called patching and a patch is a small, usually rectangular piece of an image.
Graphics Processing Unit (GPU)
GPU is the chip on the computer’s graphics card designed to rapidly manipulate graphics and process images, typically found in powerful gaming computers. For fast analysis of pathology images, especially when applying deep learning models, GPUs are required.
Computer-aided diagnosis (CADx)
CADx is the process of using computer defined regions of interest in the image, to assist the clinician in making the final diagnosis. It is already routinely used in radiology for the detection of breast cancer foci in mammograms [2]. Example from pathology would be pre-screening lymph node sections for cancer metastases. The computer would suggest regions with a high probability of metastasis and the pathologist would have to confirm.
References:
- Wen, Si, et al. “Comparison of Different Classifiers with Active Learning to Support Quality Control in Nucleus Segmentation in Pathology Images.” AMIA Summits on Translational Science Proceedings 2017 (2018): 227.
- Shiraishi, Junji, et al. “Computer-aided diagnosis and artificial intelligence in clinical imaging.” Seminars in nuclear medicine. Vol. 41. No. 6. WB Saunders, 2011.
Comments are closed.