Skip to content

Guardrails

techrevati.runtime.guardrails

Guardrails — content-level checks around tool execution.

A Guardrail is a small object that inspects either the call site (role + tool name, before invocation) or the result (after invocation) and reports an outcome. The orchestrator runs all registered guardrails automatically around run_tool / arun_tool and raises GuardrailViolatedError on the first violation.

This is content gating — orthogonal to PermissionEnforcer which answers "is this role allowed to use this tool at all?". Permissions are role × tool; guardrails are value × context.

Inspired by the OpenAI Agents SDK guardrail model. Output checks are mandatory; input/pre-call checks are optional and default to GuardrailOutcome(allowed=True) if a guardrail does not implement them, matching the structural Protocol pattern.

GuardrailOutcome dataclass

GuardrailOutcome(allowed, reason=None)

Result of a guardrail check.

allowed=False blocks the operation. Provide reason so the raised GuardrailViolatedError carries actionable context.

Guardrail

Bases: Protocol

Structural protocol for tool-level guardrails.

Implementations should be small, deterministic, and side-effect-free. Heavy checks (e.g. calling out to a moderation model) belong behind a separate service the guardrail consults.

name lets the orchestrator label events and errors; default to the class name if you don't override it.

AsyncGuardrail

Bases: Protocol

Async sibling of Guardrail — for checks that need I/O.

When a heavy guardrail must call out to a moderation model, a vector store, or another service over the network, sync Guardrail would block the event loop. AsyncGuardrail lets the check be awaited.

AsyncOrchestrationSession.arun_tool accepts a mixed list of sync and async guardrails: it detects AsyncGuardrail instances via isinstance and awaits them; sync Guardrail instances run synchronously in place. Sync sessions silently skip AsyncGuardrail instances (with a one-shot logger warning) since there's no event loop to await on.

GuardrailViolation dataclass

GuardrailViolation(outcome, guardrail, stage)

One violation entry in a GuardrailViolatedError.

A single tool invocation can violate multiple guardrails at the same stage; the orchestrator collects them all before raising so that audit logs (EU AI Act Article 12 record-keeping) see the full picture instead of just the first hit.

GuardrailViolatedError

GuardrailViolatedError(violations, *, guardrail=None, role, tool, stage=None)

Bases: Exception

Raised when one or more guardrails block tool invocation or its result.

Carries a tuple of violations (every guardrail that blocked at the same stage). The single-violation attributes outcome, guardrail, and stage mirror the first violation so existing handlers that read them keep working unchanged.

AllowAllGuardrail dataclass

AllowAllGuardrail(name='allow_all')

Reference no-op guardrail. Useful as a baseline in tests.

PatternGuardrail

PatternGuardrail(deny_patterns, *, stages=('pre', 'post'), name='pattern', flags=re.IGNORECASE)

Regex deny-list guardrail. Sub-200ms per check for ~100 patterns.

Composes one compiled regex from the deny-list (alternation) so a check is one regex search, not N searches. stages selects which side of the tool call to gate; pass ("pre", "post") for both.

Used standalone for caller-defined deny-lists (e.g., "block any tool name matching rm.*") and as the substrate for PromptInjectionGuardrail below.

PromptInjectionGuardrail

PromptInjectionGuardrail(*, stages=('post',), extra_patterns=(), name='prompt_injection')

Bases: PatternGuardrail

First-line heuristic prompt-injection detector. Zero deps.

Specialization of PatternGuardrail with a built-in list of canonical prompt-injection signatures. Documented as a first line of defense, not a replacement for a specialized moderation model: sophisticated attackers will defeat this. Pair with a model-backed moderation guardrail behind the same orchestrator for layered defense.

Default stages=("post",) catches injections in tool outputs (the most common indirect-injection vector — malicious content retrieved from RAG, scraped pages, etc.). Add "pre" to also scrutinize tool names.

Mirrors EU AI Act Article 15 cybersecurity expectations for "resilience against attempts by unauthorised third parties to alter [an AI system's] use, outputs or performance".

run_pre_checks

run_pre_checks(guardrails, *, role, tool)

Run every pre-call guardrail; raise once with all violations collected.

AsyncGuardrail instances are skipped with a one-shot logger warning (sync path has no event loop to await on).

run_post_checks

run_post_checks(guardrails, value, *, role, tool)

Run every post-call guardrail; raise once with all violations collected.

AsyncGuardrail instances are skipped with a one-shot logger warning.

arun_pre_checks async

arun_pre_checks(guardrails, *, role, tool)

Run every pre-call guardrail; await async ones, call sync ones inline.

arun_post_checks async

arun_post_checks(guardrails, value, *, role, tool)

Run every post-call guardrail; await async ones, call sync ones inline.