The AI Orchestrator in Internal Audit

AI Orchestrator

On February 3rd, 2026, Anthropic announced a legal AI plugin for Claude Cowork. It automates contract review, compliance workflows, legal briefings, and templated responses.

The market’s response was immediate and brutal:

Thomson Reuters: Down 18% in a single day—its worst day ever
RELX: Down 15%
Wolters Kluwer: Down 13%
LegalZoom: Down 19.7%
Total damage: $300.6B in market value wiped from software and data analytics stocks

Jefferies dubbed it the “SaaSpocalypse”. Trading desks reported “get-me-out style selling.” The narrative flipped overnight: AI doesn’t help software companies anymore. It replaces them.

While Wall Street was having its reckoning, I was having mine. A week earlier, I’d built something similar for internal audit. And it worked.

Why the Market Panic Over a Folder of Prompts

Here’s what caught everyone off guard: the legal plugin isn’t a proprietary model fine-tuned on case law. It’s not a special legal reasoning engine. It’s essentially a set of prompts and workflow configurations. The same foundation model everyone has access to, but with structured instructions layered on top.

The real signal, analysts say, is that Anthropic moved from selling the model to owning the workflow. When Claude was just an API, companies like Thomson Reuters could build on top of it. Thomson Reuters literally runs CoCounsel on OpenAI. But when Anthropic starts publishing ready-made vertical solutions, the platform becomes the competitor.

And if a set of prompts can wipe out $285 billion in market value, then what I built—also essentially prompts and workflow—isn’t some technical breakthrough. Anyone can do this.

Which makes it even more urgent.

Let me show you what this looks like on the ground.

What You’re About to See

You probably think I’m exaggerating about the audit orchestrator I mentioned last week. I would too if I hadn’t built it.

This is messy. Early-stage. The interface is rough. But it’s functional.

I’m going to walk you through what it does conceptually. Not the technical details—those don’t matter. What matters is understanding why a structured workflow plus AI can automate large chunks of knowledge work that used to require years of professional training.

How Knowledge Work Gets Automated

Most professional work—audit, legal, consulting, analysis—follows a similar pattern:

Understand context: Read background, understand the domain
Identify issues: Find risks, gaps, problems, opportunities
Plan approach: Define scope, methodology, what to test
Execute work: Gather evidence, perform tests, analyze data
Synthesize findings: Identify patterns, draft observations
Report results: Document what you found and what it means

Humans do this through experience and judgment. But here’s the uncomfortable truth: most of this work is pattern matching against established frameworks.

You compare conditions to criteria. You check if controls exist and work. You test whether reality matches policy. You document deviations.

That’s not creative work. That’s execution.

And execution can be automated.

The Orchestrator Model

What I built isn’t “AI that does audits.” It’s a system that orchestrates AI through the entire methodology, phase by phase, with validation gates to ensure quality.

Phase 1: Understanding

The system reads background materials—policies, process maps, prior reports, organizational charts. It synthesizes this into structured knowledge: what the business does, how processes work, what systems are involved.

This used to take days of reading and interviews. Now the synthesis happens quickly.

Then it stops and waits for human review. Did it miss critical context? Does the analysis make sense? This review takes real time and expertise—you’re validating the foundation for everything that follows. Approve or correct before proceeding.

Phase 2: Risk Assessment

It analyzes historical data—prior audit findings, incidents, performance metrics—and identifies where things could go wrong. It maps these risks to control frameworks (the standards that define what “good” looks like).

This used to require deep expertise and significant analysis time. Now the initial assessment is automated.

Then it stops and waits for human approval. Are the risks accurate? Are there organizational factors the system can’t see? This isn’t a rubber stamp—you’re applying judgment the system doesn’t have. Sign off before moving forward.

Phase 3: Planning

It defines scope, sets objectives, and generates a testing approach. What will be reviewed, how it will be tested, what evidence is needed.

This used to be the senior auditor’s job. Now it’s configuration.

Then it stops and waits for scope sign-off. Does the approach make sense given organizational realities? Approve the plan before fieldwork begins.

Phase 4: Fieldwork

It executes the tests. Checks if controls exist. Validates whether they work. Analyzes data for exceptions. Flags anomalies. Documents evidence.

This used to be the bulk of hours—work that took significant time. Now the execution runs fast.

Then it stops and waits for human validation. Are the findings accurate? Is there context the system missed? This validation requires expertise and takes real time—you’re checking the quality of what will go into the final report. Review results before reporting.

Phase 5: Reporting

It synthesizes findings, rates severity, drafts observations, suggests remediation. Produces a management letter with executive summary.

This used to require significant time and multiple drafts. Now the initial draft is generated quickly.

Then it stops and waits for final approval. Is the message appropriate for this audience? Are the recommendations implementable? This final review is critical—you’re shaping how the work lands with stakeholders. Review before delivery.

What It Gets Right (Disturbingly)

The system is good—disturbingly good—at:

Consistency: It applies the same criteria to every test. No fatigue. No shortcuts. No “good enough” at 5 PM on Friday.
Completeness: It doesn’t forget steps. It doesn’t skip tests because they’re tedious. It follows the methodology exactly.
Pattern recognition: It spots anomalies instantly that would take humans hours of pivot tables and filters to find.
Documentation: It auto-generates evidence trails. No more “I tested this but forgot to document it.”
Speed: Work that used to take weeks happens in hours.

But here’s the critical qualifier: you cannot blindly rely on any of this yet.

Every output requires human review. Every input needs validation. The system has built-in gates where it stops and waits for human approval before proceeding. Not because I’m being cautious—because the outputs aren’t reliable enough to trust without oversight.

Which is exactly what you’d expect in early days.

What It Misses (And Why Human Review Is Critical)

The system struggles with:

Context it can’t see: Organizational politics, informal power structures, the backstory that explains why something looks irregular
Judgment under ambiguity: Distinguishing between an exception worth escalating and an isolated mistake that’s already been corrected
Human dynamics: The follow-up conversation where you probe and push back and read body language
Implementation reality: Recommendations that are theoretically correct but politically impossible or resource-constrained
Intuition: The “something feels off here” sense that comes from years of experience

These aren’t edge cases. They’re core to the work. Which is why the orchestrator currently requires human checkpoints at every phase: review the context analysis, approve the risk assessment, sign off on scope, validate fieldwork results, approve the final report.

It’s not autonomous. It’s assisted.

The Uncomfortable Reality

Let’s be honest about what this means:

Planning: Much of the mechanical work—reading documents, synthesizing context, mapping to frameworks—can be automated. But it requires substantive human review and approval.
Fieldwork: The execution of tests, the pattern recognition, the documentation—large portions can be automated. But human validation of results is critical.
Reporting: The drafting, structuring, and initial synthesis—significant parts can be automated. But human review before delivery is non-negotiable.

Right now, the human time shifts from doing the work to reviewing the work. That’s still a meaningful change in how the work gets done—reviewing requires different skills than executing—but it’s not elimination of the role.

Yet.

Because here’s what’s coming: better tooling, better models, better prompts, more context. The human checkpoints will move from “review every output” to “review exceptions only” to “audit the system periodically.”

If substantial execution work can be automated today with human oversight, what happens when oversight requirements decrease?

If AI can run continuous monitoring with human validation, what happens when validation becomes automated too?

If the system can draft outputs that require human review today, what happens when those outputs become reliable enough to trust?

The parts it can’t do yet:

Complex judgment under ambiguity. Reading organizational dynamics. Navigating internal politics. The “something feels off” intuition. Asking questions nobody else thought to ask.

The parts it will never do:

I wanted this list to be longer. And I’m not confident it won’t get shorter.

Why This Isn’t Just About Audit

If you’re reading this and thinking “well, that’s just audit”—you’re missing the point.

The pattern applies to any profession where:

Work follows an established methodology
Quality is defined by adherence to standards
Execution involves comparing reality to criteria
Output is structured and documentable

Legal work: Compare contracts to templates, check compliance with regulations, flag deviations.

Consulting work: Analyze business context, identify gaps, recommend solutions based on frameworks.

Financial analysis: Gather data, test assumptions, model scenarios, document findings.

Compliance work: Check whether policies exist, test whether they’re followed, report exceptions.

Anthropic didn’t just release a legal tool. They demonstrated that any knowledge work with codified methodology can be orchestrated.

That’s what the market priced in at $285 billion.

Why the Market Got It Right

Go back to Thomson Reuters, down 18% in one day.

The legal industry just learned what I learned in my living room: this isn’t theoretical disruption. It’s not “AI will change things eventually.” It’s “AI can do the work now, and the only question is adoption speed.”

SaaS companies sold software seats. What happens when AI does the work the software used to enable?

Law firms sold billable hours. What happens when AI does the hours?

Audit firms sold hours and headcount. What happens when a system can do continuous monitoring instead of annual sampling?

Consulting firms sold frameworks and analysis. What happens when AI can apply frameworks instantly?

The market didn’t panic because Anthropic released some breakthrough technology. The market panicked because investors finally understood: it’s just prompts and workflow, which means the moat just collapsed.

If Anthropic can publish a legal workflow in a few weeks, how long before someone publishes an audit workflow? A tax workflow? A compliance workflow? A due diligence workflow?

How long before professional services firms realize they can do the same work with far fewer people?

The analyst from Schroders put it plainly: AI tools allow businesses to “do more with fewer staff.”

Not in five years. Now.

What This Actually Means

Not “professionals are obsolete.”

But the job is fundamentally different now.

The old value proposition: I perform work, document findings, deliver reports.

The new value proposition: I design systems that perform work. I review outputs for quality and context. I intervene where judgment is required. I interpret what machines can’t see.

Right now, that means human-in-the-loop at every step. You’re not executing the work anymore—you’re orchestrating and validating it.

Over time, as models improve and workflows mature, the loop gets wider. From “review every output” to “review exceptions” to “audit the system.” But the core shift remains: the value moves from execution to oversight and judgment.

And that’s a very different skill set.

Most professionals I know are executors. They’re good at following methodologies, performing analysis, documenting results. That’s what we were trained to do. That’s what the certifications test for.

But if execution gets automated, professions need orchestrators. People who can design workflows, validate AI outputs, spot what the system missed, and exercise judgment on ambiguous findings.

The gap between those two skill sets is wider than most people realize.

And the window to build those skills is shorter than most people think.

What Comes Next

I’m not writing this to scare you. Or maybe I am—but only because I think the right kind of fear is useful.

The “Oh Fuck” moment I wrote about last week wasn’t paralyzing. It was clarifying.

Yes, the system I built requires human oversight at every step. Yes, you can’t blindly trust the outputs yet. Yes, it’s early days.

But early days don’t last long in AI.

Six months ago, this would have been exponentially harder to build. Six months from now, it’ll be exponentially easier—and more capable. The human checkpoints will move further apart. The oversight will become less intensive. The question isn’t whether automation increases, it’s how fast.

I can’t un-see what I built. I can’t pretend this isn’t happening. And neither can you, now that the market just priced it in at $285 billion.

So the question becomes: what does the orchestrator model actually look like in practice? What skills survive when execution gets automated? How do you make yourself valuable when the work changes this fast?

That’s what I’m figuring out. And that’s what I’ll write about next week.

If you’ve built something similar—or if you’re staring at your job wondering what percentage of it just became automatable—I want to hear from you. Get in touch.

Sources and further reading: