Retry Policy¶
techrevati.runtime.retry_policy ¶
Retry Policy — Failure classification and recipe lookup.
Maps failure scenarios to structured recovery steps with bounded attempts and an escalation policy. The caller decides whether and how to retry; this module provides the recipe + bookkeeping.
classify_exception() bridges Python exceptions to failure scenarios.
FailureScenario ¶
Bases: str, Enum
Failure types that can be automatically recovered.
RecoveryStep ¶
Bases: str, Enum
Actions that can be taken to recover from failure.
EscalationPolicy ¶
Bases: str, Enum
What to do when max recovery attempts are exhausted.
RecoveryRecipe
dataclass
¶
Recovery plan for a failure scenario.
step_retries is an optional per-step retry budget. When a step
fails (the recovery context's _fail_at_attempt hook returns
True), the executor retries that same step up to step_retries[step]
times before declaring it a failure and moving on to the next step
(which becomes remaining_steps in partial recovery). Missing keys
default to a budget of 1 (single attempt) — preserving 0.1.0 / 0.2.0
semantics.
Example::
RecoveryRecipe(
scenario=FailureScenario.LLM_ERROR,
steps=(RecoveryStep.RETRY_WITH_BACKOFF, RecoveryStep.SWITCH_PROVIDER),
max_attempts=2,
escalation_policy=EscalationPolicy.ALERT_HUMAN,
step_retries={RecoveryStep.RETRY_WITH_BACKOFF: 3},
)
fires the backoff step up to three times before failing over to the provider switch.
RecoveryResult
dataclass
¶
Outcome of a recovery attempt.
RecoveryEvent
dataclass
¶
Structured record of a recovery action.
attempt_recovery ¶
Attempt recovery for a failure scenario.
Returns RecoveryResult with outcome: recovered, partial_recovery, or escalation_required.
aattempt_recovery
async
¶
Async variant of attempt_recovery.
Behavior matches the sync version step-for-step. The sleeper
parameter is reserved for future steps that need to await a delay
(e.g. backoff). Pass asyncio.sleep in production code; pass a
no-op or a fake in tests for determinism. Today no step in
RecoveryRecipe actually sleeps, so sleeper is unused in
practice — but the contract is established now so 0.1.0 callers
can rely on it.
backoff_delay ¶
Calculate backoff delay in seconds with selectable jitter algorithm.
Algorithms follow Marc Brooker / AWS Architecture Blog (https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/):
"none"— pure exponentialbase ** attempt(capped)."full"—uniform(0, cap_exp): maximum spread, lowest contention."equal"—cap_exp/2 + uniform(0, cap_exp/2): half deterministic."decorrelated"(default) —uniform(base, prev_delay * 3): AWS's fastest algorithm. Callers passing 0 forprev_delaygetbase.
Backwards compatibility: jitter=True (bool) maps to "full" and
jitter=False maps to "none". The base ** attempt + 25% noise
formula from 0.0.0 is gone — use "equal" for similar behavior.
next_provider ¶
Select the next fallback provider, skipping the current one.
smaller_context_budget ¶
Calculate a reduced context budget (75% of current by default).
classify_exception ¶
Map a Python exception to a FailureScenario for recovery.
Two-pass dispatch:
- Type-based —
isinstanceagainst well-known stdlib classes (TimeoutError,ConnectionErrorfamily,JSONDecodeError). Walks the exception chain via__cause__/__context__so aRuntimeErrorwrapping aConnectionErroris still classified asPROVIDER_FAILURE. - String match — provider SDKs that don't expose stdlib types fall through to substring matching on the rendered message.