Loading learning content…
Loading learning content…
Keep agent costs predictable with budgets, model routing, and free-tier strategies.
Read through the lesson, mark it complete when the concept is clear, then move to the next lesson in the sequence or jump back to the module map.
Unlike single-call LLM usage, agents can iterate many times per task, spawn sub-agents, and run in parallel. A poorly bounded agent can consume thousands of tokens on what should be a 100-token task.
Cost control for agents requires: budgets, model selection logic, and circuit breakers.
Set budgets at multiple levels:
In clawd: $10/day, $50/week. Sessions over $0.50 trigger an alert; over $2.00 are killed.
Not every task needs the most capable (most expensive) model. Route intelligently:
| Task Type | Model | Cost |
|---|---|---|
| Bulk classification | Haiku | $ |
| Code generation | Sonnet | $$ |
| Architecture review | Opus | $$$ |
| Factual Q&A | Groq/Llama | Free |
The free fleet (Groq, Ollama, Cloudflare Workers AI) handles a surprising fraction of tasks — try free first.
Identical or near-identical agent invocations should return cached results. Cache at the task level, not just the LLM call level. A research agent querying the same topic twice within 24 hours should return the cached result.
Every long-running agent needs a kill switch: a mechanism to stop execution gracefully. Test it before deploying. An agent you can't stop is an agent you shouldn't deploy.
Monitor cost in real time. Alerts at 50%, 80%, 100% of budget. Automated cutoff at 110%.