Skip to content

Checkpoint

techrevati.runtime.checkpoint

Checkpoint — Durable execution primitives for restart-resumable agent sessions.

A CheckpointSaver persists per-turn snapshots of a session so that a crashed or paused agent loop can resume from the last committed turn instead of re-running every step from the beginning. Two reference implementations ship in this module:

  • InMemorySaver — fast, process-local, lost on exit. Default when a thread_id is supplied without an explicit saver.
  • SqliteSaver — durable across process restarts, uses stdlib sqlite3 only (no new runtime dependency), WAL mode for concurrent readers + one writer.

The CheckpointSaver protocol mirrors the LangGraph get / put / list / delete shape so downstream code that already knows that contract reads naturally here.

Caveat: this is restart-resumable execution, not Temporal-style durable execution. Step-level replay (re-running a half-finished turn against a recorded history) is NOT in scope; checkpoints only fire between turns. For workflow-engine semantics, pair this with Temporal / Restate / DBOS behind a wrapping CheckpointSaver implementation. See docs/patterns/durability.md for the trade-off.

Checkpoint dataclass

Checkpoint(id, thread_id, created_at, state, parent_id=None, metadata=dict())

A point-in-time snapshot of a session's serializable state.

state and metadata must be JSON-serializable end to end. The saver is allowed to round-trip them through json.dumps and expects to get a structurally equal mapping back; non-serializable values (callables, sockets, dataclass instances that don't define to_dict) must be coerced by the caller before put.

CheckpointSaver

Bases: Protocol

Persistence contract for session checkpoints.

Implementations should be safe for concurrent reads; writes may be serialized internally (the in-memory and sqlite reference impls both serialize writes under a lock).

get

get(thread_id, checkpoint_id=None)

Return the requested checkpoint, or the latest if id is None.

put

put(thread_id, state, *, parent_id=None, metadata=None)

Persist a new checkpoint. Returns the materialized record.

list

list(thread_id, *, before=None, limit=10)

Return checkpoints for the thread, newest first.

If before is given (a checkpoint id), only checkpoints created strictly earlier are returned. limit is treated as a hard cap.

delete

delete(thread_id)

Remove every checkpoint for the given thread.

InMemorySaver

InMemorySaver()

Process-local checkpoint store. Lost on exit; thread-safe.

Useful for tests, dev loops, and any session that does not need to survive a process restart.

SqliteSaver

SqliteSaver(path)

Stdlib-sqlite3 checkpoint store. Survives process restart.

Uses WAL mode so concurrent readers don't block the writer. Writes are serialized inside a single connection per saver instance (the standard sqlite3 recommendation: one connection per writer). All state and metadata payloads are JSON-encoded.

Pass ":memory:" as the path for a fully in-memory database; that variant is reset on garbage collection (use InMemorySaver for tests that don't need the sqlite execution path under coverage).