Guardrails¶
techrevati.runtime.guardrails ¶
Guardrails — content-level checks around tool execution.
A Guardrail is a small object that inspects either the call site
(role + tool name, before invocation) or the result (after invocation)
and reports an outcome. The orchestrator runs all registered guardrails
automatically around run_tool / arun_tool and raises
GuardrailViolatedError on the first violation.
This is content gating — orthogonal to PermissionEnforcer which
answers "is this role allowed to use this tool at all?". Permissions
are role × tool; guardrails are value × context.
Inspired by the OpenAI Agents SDK guardrail model. Output checks are
mandatory; input/pre-call checks are optional and default to
GuardrailOutcome(allowed=True) if a guardrail does not implement
them, matching the structural Protocol pattern.
GuardrailOutcome
dataclass
¶
Result of a guardrail check.
allowed=False blocks the operation. Provide reason so the
raised GuardrailViolatedError carries actionable context.
Guardrail ¶
Bases: Protocol
Structural protocol for tool-level guardrails.
Implementations should be small, deterministic, and side-effect-free. Heavy checks (e.g. calling out to a moderation model) belong behind a separate service the guardrail consults.
name lets the orchestrator label events and errors; default to
the class name if you don't override it.
AsyncGuardrail ¶
Bases: Protocol
Async sibling of Guardrail — for checks that need I/O.
When a heavy guardrail must call out to a moderation model, a vector
store, or another service over the network, sync Guardrail
would block the event loop. AsyncGuardrail lets the check be
awaited.
AsyncOrchestrationSession.arun_tool accepts a mixed list of
sync and async guardrails: it detects AsyncGuardrail instances
via isinstance and awaits them; sync Guardrail instances
run synchronously in place. Sync sessions silently skip
AsyncGuardrail instances (with a one-shot logger warning) since
there's no event loop to await on.
GuardrailViolation
dataclass
¶
One violation entry in a GuardrailViolatedError.
A single tool invocation can violate multiple guardrails at the same stage; the orchestrator collects them all before raising so that audit logs (EU AI Act Article 12 record-keeping) see the full picture instead of just the first hit.
GuardrailViolatedError ¶
Bases: Exception
Raised when one or more guardrails block tool invocation or its result.
Carries a tuple of violations (every guardrail that blocked at
the same stage). The single-violation attributes outcome,
guardrail, and stage mirror the first violation so existing
handlers that read them keep working unchanged.
AllowAllGuardrail
dataclass
¶
Reference no-op guardrail. Useful as a baseline in tests.
PatternGuardrail ¶
Regex deny-list guardrail. Sub-200ms per check for ~100 patterns.
Composes one compiled regex from the deny-list (alternation) so a
check is one regex search, not N searches. stages selects which
side of the tool call to gate; pass ("pre", "post") for both.
Used standalone for caller-defined deny-lists (e.g., "block any
tool name matching rm.*") and as the substrate for
PromptInjectionGuardrail below.
PromptInjectionGuardrail ¶
Bases: PatternGuardrail
First-line heuristic prompt-injection detector. Zero deps.
Specialization of PatternGuardrail with a built-in list of
canonical prompt-injection signatures. Documented as a first line
of defense, not a replacement for a specialized moderation model:
sophisticated attackers will defeat this. Pair with a model-backed
moderation guardrail behind the same orchestrator for layered
defense.
Default stages=("post",) catches injections in tool outputs
(the most common indirect-injection vector — malicious content
retrieved from RAG, scraped pages, etc.). Add "pre" to also
scrutinize tool names.
Mirrors EU AI Act Article 15 cybersecurity expectations for "resilience against attempts by unauthorised third parties to alter [an AI system's] use, outputs or performance".
run_pre_checks ¶
Run every pre-call guardrail; raise once with all violations collected.
AsyncGuardrail instances are skipped with a one-shot logger warning
(sync path has no event loop to await on).
run_post_checks ¶
Run every post-call guardrail; raise once with all violations collected.
AsyncGuardrail instances are skipped with a one-shot logger warning.
arun_pre_checks
async
¶
Run every pre-call guardrail; await async ones, call sync ones inline.
arun_post_checks
async
¶
Run every post-call guardrail; await async ones, call sync ones inline.