Skip to content

Retry & Fallback

AgentLang has first-class syntax for handling transient task failures. You can declare a retry budget and a fallback policy directly in the run statement — no try/except in Python, no wrapper logic.

Syntax

let x = run task_name
  with { key: expr }
  by agent_name
  retries N
  on_fail abort;
let x = run task_name
  with { key: expr }
  by agent_name
  retries N
  on_fail use <fallback_expr>;

All clauses are optional. Defaults: retries 0, on_fail abort.

Retry budget

retries N means the task will be attempted up to N + 1 times total.

retries N Total attempts
retries 0 (default) 1
retries 1 2
retries 2 3

Failure policies

on_fail abort (default)

If the task exhausts its retry budget, a runtime error is raised and the pipeline stops:

Execution error: Task 'flaky_fetch' failed after 3 attempts.

on_fail use <expr>

If the task fails after all retries, the fallback expression is evaluated and bound to x instead:

let f = run flaky_fetch
  with { key: topic, failures_before_success: fail_count }
  by ops
  retries 2
  on_fail use { data: "fallback for " + topic };

Type must match

The fallback expression type must match the task's declared return type. This is enforced statically by the type checker.

Timeout and retries

When a task has both retries and timeout clauses, a timeout skips all remaining retries. The runtime raises HandlerTimeoutError and does not attempt the task again. This is because the timed-out handler thread is still running in the background — retrying would overlap invocations, risking duplicated or reordered side-effects.

The on_fail use fallback still applies after a timeout. If the task times out and a fallback expression is declared, the fallback value is used instead of raising an error.

Full example

From examples/reliability.agent:

pipeline resilient_brief(topic: String, fail_count: Number) -> String {
  let f = run flaky_fetch
    with { key: topic, failures_before_success: fail_count }
    by ops
    retries 2
    on_fail use { data: "fallback for " + topic };

  if f.data == "fallback for " + topic {
    let d = run draft with { notes: "Fallback path: " + f.data } by ops;
    return d.article;
  } else {
    let d = run draft with { notes: "Fresh path: " + f.data } by ops;
    return d.article;
  }
}

Run it — succeeds within retry budget:

python main.py examples/reliability.agent resilient_brief \
  --input '{"topic":"api-status","fail_count":1}'
{
  "result": "[ops] Draft article:\nFresh path: [ops] fetched payload for api-status"
}

Run it — exhausts retries, uses fallback:

python main.py examples/reliability.agent resilient_brief \
  --input '{"topic":"api-status","fail_count":5}'
{
  "result": "[ops] Draft article:\nFallback path: fallback for api-status"
}

Detecting fallback use

After on_fail use, the bound variable holds the fallback value. You can inspect it with == to detect which path was taken:

if f.data == "fallback for " + topic {
  -- fallback path
} else {
  -- success path
}

This is a common pattern — compute a sentinel fallback value and check for it downstream.

Next: Error Handling