Can GPT-4o Classify Tumors Better Than Us? AI-Powered Pathology Insights That Are Changing Our Workflows

Becoming a U.S. citizen and diving deep into AI-powered pathology all in one week—now that’s a story worth sharing.

Table of Contents

This livestream episode of DigiPath Digest turned out to be one of my most personal ones yet. It started with a courtroom, not a microscope, and ended with a fascinating discussion about whether GPT-4o and other large language models can actually classify tumors more accurately than we can. Spoiler: sometimes, they do.

A Personal Milestone, Powered by AI Tools

I didn’t plan for my naturalization ceremony to coincide with our live session. Even less did I expect the judge to call me (yes, me!) to deliver a speech on behalf of all new citizens.

I had just a few days to prepare, so naturally, I did what any pathologist who’s spent too much time experimenting with AI tools would do—I asked AI for help.

Between ChatGPT, Claude, and Perplexity, I crafted, refined, and personalized my speech until it truly felt like me. AI didn’t write it for me—it worked with me, guiding my thoughts and improving my expression.

And honestly, that’s the theme of this entire episode: AI working with us, not replacing us.

From AI in Text to AI in Tissue: The Next Step in Digital Pathology

Over the past few months, I’ve reviewed dozens of studies about how artificial intelligence and deep learning are reshaping pathology. This episode condensed some of the most interesting ones—focusing on LLMs, synthetic data, annotation bootstrapping, and quantitative toxicologic pathology.

Each of these studies shows a different angle of how AI can make our workflows more efficient, reproducible, and sometimes even more objective than the “gold standard” of human evaluation.

Let’s unpack what I found most fascinating. 👇

Paper 1: GPT-4 and Friends – When LLMs Read Pathology Reports

An Austrian research team recently asked a question that made me stop and think:
Can large language models like GPT-4 classify cancer reports better than humans or traditional algorithms?

Their setup was surprisingly elegant. They took 227 synthetic oncology reports—not real ones, but carefully generated to mimic real clinical texts—and tested multiple LLMs, including GPT-4 and LLaMA 33, against a traditional score-based algorithm.

The task? Detect malignancy and classify tumor types.

And guess what? GPT-4 won. 🎯

It achieved higher sensitivity and specificity in both malignancy detection and tumor classification.

As someone who’s spent hours coding tumor registries manually, I could almost hear the collective sigh of relief from data managers worldwide. Text mining, once painfully slow, is becoming almost effortless.

🧩 My Takeaway

What stood out most wasn’t just the accuracy, but the shift in mindset. We’re moving from “humans labeling data for machines” to “machines organizing data for humans.” It’s not replacing expertise—it’s freeing it from repetitive work.

Paper 2: Synthetic Data – When AI Generates Its Own Homework

The same Austrian group behind the LLM paper went a step further. Since access to real patient data is often restricted (and rightly so), they created their own synthetic pathology dataset.

This dataset included 227 pathology reports covering prostate, lung, and breast cancer, with balanced distributions between benign and malignant cases.

What I loved about this approach is how it flips the traditional AI research model on its head. Instead of humans painstakingly creating datasets for AI to learn from, AI now generates the data, and humans validate it.

This has two big implications:

Privacy Protection – No patient-identifiable data means fewer legal and ethical barriers to research.
Balanced Datasets – Synthetic generation can counteract class imbalance (for example, when there are fewer malignant than benign cases).

As someone who’s built and annotated datasets, this feels revolutionary. It’s the kind of innovation that lets smaller labs participate in AI research without waiting months for IRB approvals or data-sharing agreements.

🧠 Reflection

Synthetic data doesn’t replace real data—but it lets us prototype, test, and train models faster. It’s like using flight simulators for pilots before they take off in real skies.

Paper 3: Annotation Bootstrapping – Making AI Training Less Painful

Every pathologist knows that image annotation is one of the biggest bottlenecks in developing deep learning models. It’s slow, subjective, and expensive.

A research team from Poland (and yes, this one felt personally special to me!) tackled this issue in the context of follicular lymphoma cell identification.

They tested three strategies:

Traditional manual annotation (the “old school” way),
Active learning – where the model asks for help only when uncertain,
Weak supervision – using simpler, less precise labels.

The hybrid model they built, using custom encoders and foundation models, improved the detection algorithm’s performance by 18 percentage points and achieved an F1 score of 63%.

That might not sound like “AI glory,” but in context, it’s a big deal. Active learning doubled the minority class samples, meaning it helped detect underrepresented cell types much more effectively.

💬 My Perspective

When you think about it, AI asking us questions (active learning) is the most human-like collaboration we can have with technology. It’s the digital equivalent of a resident coming to your desk with a slide and saying, “Hey, I’m not sure about this one.” And that’s exactly how AI should work in pathology.

Paper 4: Deep Learning in Toxicologic Pathology – When AI Sees What We Miss

Finally, let’s talk about thyroid glands—rat thyroids, to be exact.

Researchers from Charles River Laboratories used deep learning to analyze thyroid gland sections for subtle morphological changes caused by endocrine-disrupting chemicals.

If you’ve ever scored follicular hypertrophy, you know how subjective it can be. The “is it slightly enlarged or not?” debate could go on for hours.

Their convolutional neural network (CNN) didn’t just replicate human scoring—it quantified it. By segmenting tissue structures and measuring follicular morphology, the AI produced consistent, reproducible scores that correlated more strongly with pathologists’ assessments than the old mean epithelial area methods.

This is what I like to call the “gold standard paradox”: we compare AI against human performance, yet humans don’t always agree with each other.

AI brings reproducibility where we need it most—around diagnostic thresholds.

Regulatory Reality Check: The Frameworks Already Exist

ne thing that often surprises people is how much regulatory groundwork already exists for AI in pathology.

A new European interoperability framework was just released to support secondary use of health data, which is crucial for AI projects.

And despite popular belief, regulators aren’t lagging—they’re publishing detailed guidance all the time. The challenge is not a lack of regulation but our awareness of it.

I always tell my audience: before reinventing the wheel, check what’s already there. Most of the frameworks we need to build compliant, interoperable AI systems already exist.

Connecting It All: From LLMs to Microscopes

When you step back and look at these studies together, a clear trend emerges:
AI in pathology is moving toward integration.

It’s no longer about a single algorithm that detects cancer—it’s about creating a workflow ecosystem where AI tools handle everything from report classification to annotation assistance and quantitative scoring.

We’re not talking about replacing pathologists but about enhancing our work:

LLMs automate text-heavy tasks.
Synthetic datasets accelerate research.
Active learning optimizes labeling.
Deep learning ensures consistency.

And in the middle of it all, we—the human experts—remain the interpreters, validators, and ethical compass of these systems.

How AI Helped Me Write a Speech (and Why That Matters)

You might be wondering how my citizenship speech ties into all this.

When I used AI tools to prepare for one of the most meaningful days of my life, it wasn’t about outsourcing creativity. It was about collaboration.

AI helped me refine my ideas, organize them clearly, and adjust my tone. The same way we now use AI to analyze data or write reports—it helps us see ourselves more clearly through reflection and iteration.

That’s why I find this technology exciting—not because it’s replacing what we do, but because it’s teaching us new ways to think, communicate, and even express gratitude.

Key Takeaways for Pathologists and Researchers

Here’s what I’d want every pathologist to remember from this episode:

🧬 GPT-4 and LLaMA 33 outperform traditional algorithms in tumor classification from text reports.
🔒 Synthetic datasets allow privacy-preserving, standardized AI benchmarking.
📊 Active learning and weak supervision reduce annotation fatigue and improve model accuracy.
🧠 Deep learning models bring reproducibility to subjective scoring tasks.
📑 Regulatory frameworks are evolving fast—don’t overlook them.

💬 AI is a partner, not a replacement, in our daily pathology practice.

Final Thoughts

This episode of DigiPath Digest wasn’t just about AI papers—it was about seeing technology as a partner in both work and life.

Whether it’s writing a naturalization speech or classifying a tumor, the principle is the same: we do our best work when we collaborate—with each other and with our tools.

If you’d like to hear the full discussion (and maybe my story about delivering that surprise speech 🗣️), check out the livestream replay on Digital Pathology Place. And if you’ve had your own experiences using AI—whether for pathology, presentations, or even personal milestones—I’d love to hear about it in the comments.

Because at the end of the day, the future of pathology isn’t AI versus us—it’s AI with us.

External Resource:

Learn more about synthetic data in healthcare AI from the European Commission’s AI Act Guidelines.

FAQs

What is GPT-4o, and how does it differ from other models?
GPT-4o is a multimodal AI model by OpenAI that can understand both text and images, making it particularly suited for pathology tasks involving structured and unstructured data.
How reliable is synthetic data in pathology AI research?
While it doesn’t replace real patient data, synthetic data is an excellent tool for prototyping and benchmarking, provided it’s validated by clinical experts.
What is “active learning” in AI annotation?
It’s a method where the model identifies uncertain samples and asks a human expert to label them—making annotation more efficient and targeted.
How can AI improve reproducibility in pathology?
AI models apply consistent criteria, reducing variability among human assessors—especially valuable near diagnostic thresholds.
Are there existing regulations for AI in pathology?
Yes, multiple frameworks exist both in Europe and the U.S. covering data privacy, interoperability, and AI ethics. The challenge is awareness, not absence.
How can pathologists start integrating AI tools into daily practice?
Begin small—experiment with LLMs for report summaries, try annotation tools for digital slides, or explore open datasets. Integration grows from experience.

Want to join the next DigiPath Digest? Subscribe to the Digital Pathology Place newsletter and never miss an episode.