Cost cap
Configurable per-agent USD budget over a rolling window. The hard wall that prevents $10K LLM invoices.
The Inference Cost Cap is Tenet's most-visible protection. You set a dollar amount and a window length; when an agent's accumulated LLM spend in that window exceeds the cap, the next tenet.execute() returns 429 cost_limit_exceeded. No invoice surprise.
The cap is always on and orthogonal to trust levels: a Trusted service can't spend past it.
Configuration
Edit via the Inference Cost Cap card at /protections:
- Cap (USD): the threshold. Default $50.
- Window: the rolling-window length. Default 24 hours. Allowed values: 1h, 6h, 12h, 24h, 7 days.
Saved values take effect on the next decision; in-flight calls aren't affected.
How spend is counted
Tenet tracks token usage from LLM-shaped service calls (Anthropic, OpenAI) by parsing the response's usage fields:
- Anthropic:
usage.input_tokens+usage.output_tokens - OpenAI:
usage.prompt_tokens+usage.completion_tokens
Each model has a price-per-1K-token coefficient (configurable, with conservative defaults for unknown models). Cost is computed at response time and accumulated against your agent's rolling window.
Non-LLM service calls (GitHub, Stripe) don't accrue against the cost cap. Those are governed by other policies.
What happens when the cap fires
Your agent receives an HTTP 429 with body:
{
"error": "cost_limit_exceeded",
"currentCostUsd": 50.42,
"thresholdUsd": 50,
"windowMs": 86400000
}
The downstream call is NOT made. No side effects on Anthropic / OpenAI / etc. Tenet refuses the call before the credential gets injected.
The cap doesn't auto-extend. Once it fires, calls keep returning 429 until enough time passes that older spend drops out of the window. At that point the rolling window's accumulated cost is below the threshold again and the next call passes.
Per-agent vs per-tenant
The cap is per-agent. If your tenant has one agent it's effectively a tenant cap. If you split into multiple agents (e.g., one for production deploys, one for QA), each agent has its own budget.
There's also a platform-level aggregate ceiling per account — a safety net we operate; it's not customer-editable.
Seeing your spend
The dashboard home page's cost cap card shows your current window spend, and every cap-triggered block lands in the audit log like any other decision.
Tradeoffs
The cap is a hard wall, not a soft limit. There's no overage tier; overage pricing is not yet available. If your agent legitimately needs to spend more than the cap, raise it at /protections ahead of time.
One honest caveat on precision: the cap is a rolling-window circuit breaker, not a transactional ledger. A burst of parallel calls can overshoot the cap by the cost of whatever was already in flight when the window tripped, plus up to a few minutes of caching lag in the spend aggregate. In practice the overshoot is bounded by your agent's concurrency multiplied by its most expensive single call. If your tolerance is strictly zero, set the cap below your true ceiling by that margin.
What's next
- The approvals guide covers the soft-gate equivalent for irreversible tools.
- The tenet.execute reference lists the
cost_limit_exceedederror.