Retry & Fallback¶
AgentLang has first-class syntax for handling transient task failures. You can declare a retry budget and a fallback policy directly in the run statement — no try/except in Python, no wrapper logic.
Syntax¶
All clauses are optional. Defaults: retries 0, on_fail abort.
Retry budget¶
retries N means the task will be attempted up to N + 1 times total.
retries N |
Total attempts |
|---|---|
retries 0 (default) |
1 |
retries 1 |
2 |
retries 2 |
3 |
Failure policies¶
on_fail abort (default)¶
If the task exhausts its retry budget, a runtime error is raised and the pipeline stops:
on_fail use <expr>¶
If the task fails after all retries, the fallback expression is evaluated and bound to x instead:
let f = run flaky_fetch
with { key: topic, failures_before_success: fail_count }
by ops
retries 2
on_fail use { data: "fallback for " + topic };
Type must match
The fallback expression type must match the task's declared return type. This is enforced statically by the type checker.
Timeout and retries¶
When a task has both retries and timeout clauses, a timeout skips all remaining retries. The runtime raises HandlerTimeoutError and does not attempt the task again. This is because the timed-out handler thread is still running in the background — retrying would overlap invocations, risking duplicated or reordered side-effects.
The on_fail use fallback still applies after a timeout. If the task times out and a fallback expression is declared, the fallback value is used instead of raising an error.
Full example¶
From examples/reliability.agent:
pipeline resilient_brief(topic: String, fail_count: Number) -> String {
let f = run flaky_fetch
with { key: topic, failures_before_success: fail_count }
by ops
retries 2
on_fail use { data: "fallback for " + topic };
if f.data == "fallback for " + topic {
let d = run draft with { notes: "Fallback path: " + f.data } by ops;
return d.article;
} else {
let d = run draft with { notes: "Fresh path: " + f.data } by ops;
return d.article;
}
}
Run it — succeeds within retry budget:
python main.py examples/reliability.agent resilient_brief \
--input '{"topic":"api-status","fail_count":1}'
Run it — exhausts retries, uses fallback:
python main.py examples/reliability.agent resilient_brief \
--input '{"topic":"api-status","fail_count":5}'
Detecting fallback use¶
After on_fail use, the bound variable holds the fallback value. You can inspect it with == to detect which path was taken:
This is a common pattern — compute a sentinel fallback value and check for it downstream.