Operations room, 1943 — Edward Ardizzone

Field Engineering · Production AI Systems

Five Patterns That Keep AI Agents
From Burning Your Budget

Command-and-control discipline for production multi-agent systems

The agents are fine. The gaps between agents — idle loops, wrong models, unvalidated dispatches — that is where money goes.

The artwork on this page is drawn from the British War Artists programme of the Second World War — official commissions intended to document and bear witness, not to glorify. These paintings are shown in that same spirit: as records of human endurance and ingenuity under pressure. The author is aware that military imagery is not abstract for everyone, and offers this framing with genuine respect for those for whom it is not.

◆   INTRODUCTION   ◆

When I started building a production AI system to automate complex analytical work, I made the same mistake everyone makes. I trusted that the agents would behave. They did — most of the time. The other times, a validation agent hit an ambiguous case and started retrying. Not once. Not three times. It cycled through the same error for forty-five minutes, calling the API at Opus pricing on every loop. No alarm. No stop condition. Just a quiet spiral I discovered when I opened the billing console the next morning.

That was the day I stopped thinking about this as a software problem and started thinking about it as an operations problem. Wars are not lost to bad soldiers — they are lost to poor command discipline, wasted resources, and the failure to act on intelligence in time. The same is true for AI systems at scale. The patterns here are not clever hacks. They are the application of principles that have governed complex operations for a century.

These five patterns represent what I had to build to operate a multi-agent AI system with professional discipline. None of them are complicated. All of them are necessary. I am publishing them here so you do not have to discover them the same way I did.

◆ ◆ ◆
Pattern 01

The Idle Watcher

"A process that consumes without producing is an orphan. Orphans cost money and produce nothing."

The first lesson of resource management is simple: things that are not working should not be running. In any complex operation, processes go dark — not with an announcement, but with silence. An agent waiting on a response that will never arrive. A pod holding a connection to a service that has already moved on. The resource meter keeps running. The billing system does not know the difference between an agent doing excellent work and an agent staring at a wall.

The Idle Watcher solves this at the infrastructure level. Every thirty minutes — configurable — it scans the workspace directory. If nothing has changed, it starts a clock. When that clock expires, it issues a single command: scale this deployment to zero replicas. The pod terminates itself. Clean, automatic, no manual intervention required.

The elegance is in the self-sufficiency. There is no external watchdog, no cron job to maintain, no alerting system to configure. The pod watches itself. When it has been idle long enough, it disappears. This is what mature infrastructure looks like: systems that manage their own lifecycle rather than demanding constant supervision.

Diagram 01 — Lifecycle State Machine

WORKSPACE ACTIVE ACTIVE NO CHANGE 30 MIN CLOCK RUNNING IDLE TIMEOUT EXCEEDED SCALE = 0 REPLICAS TERMINATED
Eric Ravilious, The Operations Room, 1942 — geometric RAF interior with radar screens and plotting tables
Eric Ravilious, "The Operations Room," 1942. Watercolor. Public domain.
Implementation · Pattern 01
IMPLEMENTATION
──────────────
Pod runs idle_watcher.py as main process.
Configure via environment variables:

  IDLE_TIMEOUT_SECONDS=1800   # 30 minutes
  POD_NAMESPACE=your-namespace
  DEPLOYMENT_NAME=your-deployment

No kubectl access needed from outside the cluster.
The pod scales itself to zero.
            
◆ ◆ ◆
Pattern 02

Wake-the-Founder

"The admiral writes his Night Orders before he sleeps. Certain conditions warrant waking the commanding officer regardless of the hour."

In naval tradition, the captain's Night Orders are written every evening before the watch changes: a precise list of the conditions under which the officer of the watch is to wake the captain immediately. Not every deviation from course. Not every minor adjustment. Only those conditions — fog, a ship not responding, a change in the weather pattern — that require command authority.

The Wake-the-Founder rule is the same principle applied to AI systems. Most decisions can and should be made autonomously. But some conditions — by their nature, by their consequence — require a human in the loop. Not because the system cannot act, but because the action it would take cannot be undone.

The list is not long: compute spend crossing five dollars on a single run. An agent stuck for more than forty-five minutes without measurable progress. The same error recurring three times in sequence. Any operation touching production data. Any deployment without explicit approval. When any of these conditions trips, execution halts and notification is immediate.

This is not distrust of the system. It is architectural honesty about where autonomous judgment ends and command authority begins. The decision to burn more money, push to production, or write to live data is not a software decision — it is a business decision. Structure it accordingly.

Diagram 02 — Command Threshold Panel

COMMAND THRESHOLD PANEL — NIGHT ORDERS ! $5 BUDGET TRIPPED 45 MIN ARMED 3x ERROR ARMED DATA LOSS ARMED PRODUCTION ARMED MAIN BRANCH ARMED HALT · NOTIFY · AWAIT COMMAND AUTHORITY
Implementation · Pattern 02
POLICY TRIGGERS — HALT ON ANY
──────────────────────────────
  > $5.00 compute cost per run
  > 45 minutes agent runtime
  > 3 consecutive identical errors
  Any operation involving data loss
  Any production deployment
  Any push to main/master

Action: STOP. Notify. Wait for command authority.
This is not optional. It is architectural.
            
◆ ◆ ◆
Pattern 03

The Model Complexity Ladder

"Force economy: the right asset for the right task. You do not deploy the carrier group to scout a harbor."

The most expensive mistake in multi-agent AI systems is not building the wrong thing. It is running the wrong model. Every task has a natural level of complexity — a ceiling of capability it actually requires. When you run a task that requires Haiku at Opus pricing, you are not getting better results. You are paying ten times more for identical output.

The Model Complexity Ladder is a structured decision framework: three tiers, two dimensions. The first dimension is task complexity — how many layers of reasoning are required, how much original judgment, how much synthesis across conflicting information. The second dimension is downstream impact — does this work form the foundation others depend on, or is it an execution step?

Simple and low-impact: Haiku. Straightforward extraction, validation against checklists, file operations, pass/fail scoring. Haiku handles this cleanly, quickly, cheaply. Standard and medium-impact: Sonnet. Content creation with guidelines, coordination, moderate analysis, synthesis. The workhorse. Complex and high-impact: Opus. Deep pattern analysis across hundreds of samples, original research, high-stakes synthesis that others will build on. Reserve maximum capability for work that genuinely requires it.

In practice, roughly sixty percent of the work in any multi-agent pipeline is procedural. It is the kind of work that Haiku was designed for. If you are running it on Opus because Opus is the model you trust, you are not being careful — you are being wasteful. Trust the matrix. Upgrade only when you have a specific reason, not a general preference.

Diagram 03 — Model Selection Pyramid

HAIKU Extraction · Validation · File Ops · Pass/Fail ~60% of work SONNET Synthesis · Coordination · Analysis ~30% of work OPUS Deep Pattern · High-Stakes ~10% SIMPLE COMPLEX COMPLEXITY LOW HIGH IMPACT
Edward Ardizzone, Observation Post, 1944 — watercolour and ink, Imperial War Museum
Edward Ardizzone, "Observation Post," 1944. Watercolour and ink. Imperial War Museum. © IWM / Crown copyright expired.
Implementation · Pattern 03
DECISION MATRIX
───────────────
Task Complexity × Downstream Impact

  Simple + Low-impact    →  Haiku   (~60% of work)
  Standard + Mid-impact  →  Sonnet  (~30% of work)
  Complex + High-impact  →  Opus    (~10% of work)

Adjustment triggers:
  Haiku failing  → upgrade to Sonnet
  Opus on routine work → downgrade
  Cost > value   → re-evaluate
            
◆ ◆ ◆
Pattern 04

The Pre-Validation Gate

"Reconnaissance before assault. Confirm the objective exists before committing the regiment."

The Battle of Kasserine Pass in 1943 was a catastrophe of premature commitment. American forces, newly deployed to North Africa, attacked before they understood the terrain, the enemy disposition, or the objective. They had the force. They had the equipment. They did not have the intelligence to use either effectively. The lesson became foundational to American military doctrine: reconnaissance before assault. Confirm the ground before you commit.

The Pre-Validation Gate applies the same principle to parallel agent dispatch. The instinct in multi-agent systems is to parallelize immediately — the sooner the agents start, the sooner the work is done. This is true when the work is correctly defined. When the work is misframed, parallelism does not save time. It multiplies the waste.

The gate is simple. Before dispatching the full team, one agent runs on a small sample — two or three representative cases. It reports what it finds. The findings are reviewed. If the problem definition holds, the team deploys with a confirmed brief. If the findings reveal a misframing — an incorrect assumption, a missing constraint, an ambiguous requirement — the cost of discovering that is the price of one agent on a small sample, not six agents on the full dataset.

The secondary benefit is often overlooked: the pre-validation findings become the brief for the full dispatch. The agents that follow know more precisely what they are looking for because one agent already traced the shape of the answer. Convergence is faster. Signal quality is higher. Total cost — including the validation pass — is lower than skipping it.

Diagram 04 — Pre-Validation Gate Protocol

FULL DATASET N SAMPLES GATE 2–3 SAMPLE 2–3 cases SINGLE AGENT FINDINGS REVIEWED YES FULL TEAM DEPLOYED NO — REFRAME RECONNAISSANCE PHASE
Edward Ardizzone, Brigade H.Q., 1944 — watercolour and ink, Imperial War Museum
Edward Ardizzone, "Brigade H.Q.," 1944. Watercolour and ink. Imperial War Museum. © IWM / Crown copyright expired.
Implementation · Pattern 04
GATE PROTOCOL
─────────────
Before any parallel dispatch:

  1. Select 2–3 representative samples
  2. Dispatch ONE agent to analyze
  3. Review findings with human
  4. If confirmed: dispatch full team
     with validated problem definition
  5. If misframed: redefine and repeat

Cost of gate:     1 agent × small sample
Cost of skipping: N agents × wrong work
            
◆ ◆ ◆
Pattern 05

The Research Turn Budget

"The field commander does not wait for complete intelligence. He waits for sufficient intelligence. The difference is the battle."

At the height of the Pacific campaign, Admiral Nimitz's intelligence team at Pearl Harbor operated under an unspoken constraint that was nevertheless absolute: the briefing happened at 0600 regardless of what they knew. The analysts gathered what they had, marked what was confirmed and what was estimated, and presented it. The commander made decisions on the information available. Waiting for certainty in an uncertain environment is not caution — it is paralysis.

The Research Turn Budget encodes this discipline for AI agents. An agent conducting open-ended research will keep looking until it runs out of turns. There is always another source. There is always a deeper thread. There is always one more query that might change the picture. Left unconstrained, research does not converge — it spirals.

At turn eight of ten, the agent stops making tool calls. It assembles what it has gathered. It labels the result PARTIAL:, describes what is complete and what gaps remain, and returns. The work is usable. The decision can be made. What is missing is documented, not hidden.

The PARTIAL label is not failure. It is professional practice. It is the intelligence analyst saying: here is what we know for certain, here is what we estimate with confidence, and here is what we do not know in time for this briefing. A PARTIAL result delivered on schedule is worth more than a complete result delivered after the decision was already made by default.

Diagram 05 — Research Turn Budget Timeline

1 SEARCH 2 FETCH 3 SEARCH 4 ANALYZE 5 FETCH 6 ANALYZE 7 SEARCH 8 HALT TOOL CALLS ↑ RETURN PARTIAL: 9 10 Active research turns Return PARTIAL: at turn 8 Turns 9–10 unused
John Minton, Soldier Writing a Letter Home, c.1943 — ink drawing, Imperial War Museum
John Minton, "Soldier Writing a Letter Home," c.1943. Ink. Imperial War Museum. Public domain.
Implementation · Pattern 05
TURN BUDGET PROTOCOL
─────────────────────
Instruct in every research dispatch:

  "If you reach turn 8 of 10 without a
  complete answer, stop tool calls and
  return a partial summary labeled
  PARTIAL: with what you have gathered.
  Do not wait for perfect information."

PARTIAL is professional practice.
A result on time beats a perfect result
delivered after the decision was made.
            
◆   ABOUT THE ENGINEER   ◆
About the Engineer

Peter Simmons
AI Systems Engineer

These patterns did not emerge from theory. They emerged from building — from the specific experience of running multi-agent AI systems in production, watching them fail in expensive and instructive ways, and engineering the infrastructure to prevent each failure from recurring.

I have been building AI systems professionally since before the current wave made it fashionable. My work sits at the intersection of systems engineering and applied intelligence — designing the architecture that makes AI pipelines reliable, cost-controlled, and operable at scale. The accountability framework, the cost guardrails, the command-and-control patterns documented here: these are production artifacts from real systems, not demonstration projects.

If your organization is building multi-agent AI systems and needs someone who has already solved the problems you are about to encounter — I am available. The work speaks for itself.