Cost Control for Agent Fleets
Keep agent costs predictable with budgets, model routing, and free-tier strategies.
Why Agent Costs Spiral
Unlike single-call LLM usage, agents can iterate many times per task, spawn sub-agents, and run in parallel. A poorly bounded agent can consume thousands of tokens on what should be a 100-token task.
Cost control for agents requires: budgets, model selection logic, and circuit breakers.
Budget Architecture
Set budgets at multiple levels:
- Per task — maximum tokens per task invocation
- Per session — maximum cost per conversation
- Per day — daily spending limit with automatic cutoff
- Per agent — some agents cost more than others; budget them separately
In clawd: $10/day, $50/week. Sessions over $0.50 trigger an alert; over $2.00 are killed.
Model Routing
Not every task needs the most capable (most expensive) model. Route intelligently:
| Task Type | Model | Cost | |-----------|-------|------| | Bulk classification | Haiku | $ | | Code generation | Sonnet | $$ | | Architecture review | Opus | $$$ | | Factual Q&A | Groq/Llama | Free |
The free fleet (Groq, Ollama, Cloudflare Workers AI) handles a surprising fraction of tasks — try free first.
Caching Agent Outputs
Identical or near-identical agent invocations should return cached results. Cache at the task level, not just the LLM call level. A research agent querying the same topic twice within 24 hours should return the cached result.
Kill Switches
Every long-running agent needs a kill switch: a mechanism to stop execution gracefully. Test it before deploying. An agent you can't stop is an agent you shouldn't deploy.
Monitor cost in real time. Alerts at 50%, 80%, 100% of budget. Automated cutoff at 110%.