Rate limit¶
techrevati.runtime.rate_limit ¶
Rate limiting — Token-bucket primitives for sync and async call paths.
Token-aware throttling is the modern shape for LLM-provider rate limits
because providers themselves are token-based (TPM, RPM, daily caps). A
single TokenBucket admits or delays one resource (e.g. input
tokens-per-minute); a RateLimiter composes three named buckets so
typical provider limits (input TPM, output TPM, request RPM) can be
expressed as one object.
Both TokenBucket (sync, threading.Lock) and AsyncTokenBucket
(asyncio.Lock + asyncio.sleep) implement the same conceptual
algorithm; async wins by yielding the event loop while waiting for
refill instead of blocking it.
Clock is injectable on both variants (Callable[[], float] returning
monotonic seconds). Tests pass a ManualClock to make timing-dependent
behavior deterministic; production code uses time.monotonic by
default.
Zero new runtime dependencies — stdlib only.
TokenBucket
dataclass
¶
Classic token-bucket limiter — sync variant.
try_acquire is non-blocking and returns True only when the
bucket has enough tokens. acquire sleeps until the bucket
refills, capped by an optional wait timeout; on timeout it raises
RateLimitExceededError rather than silently exceeding the
bound.
Parameters¶
name:
Human-readable identifier used in error messages.
capacity:
Maximum tokens the bucket holds. Bursts up to this many
requests can pass immediately.
refill_per_second:
Steady-state admission rate.
clock:
Monotonic time source. Defaults to time.monotonic.
AsyncTokenBucket
dataclass
¶
Async sibling of TokenBucket.
Uses asyncio.Lock so refill bookkeeping is coroutine-safe, and
asyncio.sleep so waiting yields control to the event loop
instead of pinning the thread. State is independent from the sync
variant — choose one per downstream.
RateLimiter
dataclass
¶
Composite of named token buckets, one per dimension.
Typical LLM-provider shape: rpm for requests-per-minute,
input_tpm for input tokens-per-minute, output_tpm for
output tokens-per-minute. Each bucket is independent; an empty
buckets mapping is a valid no-op limiter.
acquire_pre_call spends RPM up front. After the call returns
and UsageSnapshot is known, acquire_usage spends the input
+ output token budgets. This split mirrors how providers actually
enforce limits.
AsyncRateLimiter
dataclass
¶
Async sibling of RateLimiter — same semantics, async buckets.
RateLimitExceededError ¶
Bases: Exception
Raised when an acquire call exceeds the bucket's wait budget.
Carries the bucket name and the cost the caller tried to spend so
the error message tells the caller which dimension blocked (input
TPM vs RPM) and how big the request was. classify_exception
maps this onto FailureScenario.LLM_ERROR (the rate-limit
bucket) so existing recovery recipes pick it up unchanged.