flowchart TD
S["Shared Store (dict)"] -.->|"prep() reads"| N1["Node A"]
N1 -->|"action: default"| N2["Node B"]
N1 -->|"action: retry"| N1
N2 -->|"action: done"| N3["Node C"]
N3 -.->|"post() writes"| S
style S fill:#dbeafe,stroke:#1e40af,color:#1e40af
style N1 fill:#dcfce7,stroke:#166534,color:#166534
style N2 fill:#fef3c7,stroke:#92400e,color:#92400e
style N3 fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
7 Building Agents from Scratch — Then Shipping Them
PocketFlow’s 100-Line Framework and Burr + Pydantic’s Production Guardrails
Chapters 5 and 6 showed how to embed and build Claude’s loop — but it’s still Claude’s loop. What if you want to build agents with any LLM, any tools, any orchestration logic? PocketFlow proves this takes ~100 lines of code. Its minimalist Graph + Shared Store model strips agents to their essence: nodes that prep, execute, and post-process, connected by action edges. This chapter builds three patterns (workflow, agent, RAG) in PocketFlow, showing that the hard part was never the framework — it was the design. Then it graduates to a production-grade version of the same model: Burr + Pydantic replaces the mutable shared dict with typed state, immutable snapshots, and fork-from-state replay — the guardrails that turn a learning exercise into a deployable system. Burr is one well-chosen option, not the only one; the chapter ends with an honest comparison to LangGraph, the larger-ecosystem alternative for the same job.
7.1 Why Build from Scratch?
Chapters 2–6 explained how agents work — conceptually (Chapters 2–4), via the SDK (Chapter 5), and by building a clone from scratch (Chapter 6). But all of those are Claude-specific. What about:
- Using a different LLM (Llama, Gemini, Mistral)?
- Building a custom orchestration pattern that doesn’t fit the ReAct mold?
- Understanding what’s really happening underneath the abstractions?
PocketFlow answers all three. It’s a ~100-line Python framework with zero dependencies that implements the same patterns as LangChain/LangGraph — agents, workflows, RAG, multi-agent systems — but with the machinery fully visible. No magic, no hidden state, no vendor lock-in.
PocketFlow has 10,000+ GitHub stars and an active community. It’s not a toy — it’s a deliberately minimal framework that proves sophisticated agent patterns don’t require sophisticated infrastructure.
7.2 The Core Model: Graph + Shared Store
PocketFlow has exactly two concepts:
- Node — a unit of work with three steps:
prep()→exec()→post() - Flow — a directed graph of nodes connected by labeled edges (actions)
Nodes communicate through a shared store — a dictionary that all nodes can read and write. That’s the entire framework.
7.2.1 The Node lifecycle
Every node follows the same three-step pattern:
| Step | Purpose | Has access to |
|---|---|---|
prep(shared) |
Read data from shared store | Shared store (read) |
exec(prep_res) |
Do the work (LLM call, API call, computation) | Only prep’s output |
post(shared, prep_res, exec_res) |
Write results back, decide next action | Shared store (write) |
The critical design constraint: exec() cannot access the shared store. This separation means exec() is pure computation — idempotent, retriable, testable in isolation.
from pocketflow import Node, Flow
class Summarize(Node):
def prep(self, shared):
return shared["document"] # Read from store
def exec(self, text):
return call_llm(f"Summarize: {text}") # Pure computation
def post(self, shared, prep_res, exec_res):
shared["summary"] = exec_res # Write to store
return "default" # Next action (edge label)This isn’t arbitrary — it enables automatic retries. If an LLM call fails (rate limit, network timeout), PocketFlow can re-run exec() without fear of corrupting state. Since exec() only sees prep_res (a snapshot), retrying it is safe — it’s a pure function over its input. If exec() could write to the shared store, a failed-then-retried call might write partial results twice.
This is the same principle as database transactions: separate the read (prep), the computation (exec), and the write (post) so failures at any stage are recoverable.
The string returned by post() is the routing decision — it selects which outgoing edge to follow. return "default" follows the >> edge. return "error" follows the - "error" >> edge. return None also means "default". If you return a string that has no matching edge, the flow stops — that’s how terminal nodes work (no outgoing edges means “done”).
This is PocketFlow’s equivalent of Chapter 6’s stop_reason check. Instead of if response.stop_reason != "tool_use", the graph structure itself encodes when to stop.
7.2.2 Connecting nodes into flows
Nodes connect with >> (default edge) or - "action" >> (named edge):
node_a >> node_b # default transition
node_a - "error" >> error_handler # named transition
node_a - "retry" >> node_a # self-loopA Flow starts at a designated node and follows edges until there’s nowhere to go:
flow = Flow(start=node_a)
flow.run({"document": "...", "summary": None})The >> syntax is Python operator overloading — PocketFlow’s Node class defines __rshift__ to register edges. Similarly, node_a - "error" overloads __sub__ to create a temporary edge-builder that waits for >>. Under the hood, this just builds an adjacency list: {node_a: {"default": node_b, "error": error_handler, "retry": node_a}}. The Flow walks this adjacency list at runtime.
The self-loop (node_a - "retry" >> node_a) is how you create retry behavior without while True. The node’s post() returns "retry" when it wants another attempt, and the graph routes back to itself. After success, it returns "default" and moves forward.
7.3 Pattern 1: Workflow (Prompt Chaining)
The simplest pattern — nodes execute sequentially, each transforming the shared state. Let’s trace how data flows through three nodes:
from pocketflow import Node, Flow
def call_llm(prompt):
"""Your LLM wrapper — any provider, any model."""
... # OpenAI, Anthropic, Ollama, etc.
class GenerateOutline(Node):
def prep(self, shared):
1 return shared["topic"]
2 def exec(self, topic):
return call_llm(f"Create a 5-point outline for: {topic}")
def post(self, shared, prep_res, exec_res):
3 shared["outline"] = exec_res- 1
- Reads the topic from the shared store — this is the only data the node needs.
- 2
-
Computes —
execreceives"Why context engineering matters"(whateverprepreturned), calls the LLM, and returns the outline. It has no idea about the shared store. - 3
-
Writes the outline back — now
shared["outline"]exists for the next node. No explicit return means"default"→ follow the>>edge.
- 1
-
Reads the outline that
GenerateOutlinejust wrote — nodes communicate only through the shared store, never directly. - 2
- Different LLM call, different prompt — each node is a focused task with a single responsibility.
- 3
- Writes the draft for the next node to consume.
class ReviewDraft(Node):
def prep(self, shared):
return shared["draft"]
def exec(self, draft):
return call_llm(f"Review this draft. List 3 improvements:\n{draft}")
def post(self, shared, prep_res, exec_res):
shared["review"] = exec_resNow wire them into a sequential flow:
- 1
-
>>chains nodes with default edges — afteroutlinecompletes, the flow moves towrite, thenreview. - 2
-
Flow(start=...)designates where execution begins. The flow follows edges until it reaches a node with no outgoing edge. - 3
-
run(shared)executes the full pipeline — all three nodes run in sequence, each reading what the previous one wrote. - 4
-
Results live in the shared dict — after
run()completes,sharedcontains"topic","outline","draft", and"review". The entire pipeline’s intermediate state is inspectable.
This is prompt chaining — the same pattern from Chapter 2, but now you’re building the pipeline yourself. Each node is a focused LLM call. Data flows through the shared store rather than through the message array.
7.4 Pattern 2: Agent (ReAct Loop)
The agent pattern adds branching and looping. A decision node evaluates context and returns an action label that routes the flow — exactly the “text or tool call?” decision from Chapter 2, but explicit. Let’s build it piece by piece.
7.4.1 The decision node — “should I act or respond?”
This is the brain of the agent. It reads the current context and decides what to do next:
import yaml
from pocketflow import Node, Flow
class DecideAction(Node):
1 def prep(self, shared):
return shared["query"], shared.get("context", "No results yet")
2 def exec(self, inputs):
query, context = inputs
prompt = f"""Given this question: {query}
Previous search results: {context}
Should I: (1) search the web for more info, or (2) answer with current knowledge?
Return YAML:
```yaml
action: search or answer
search_term: phrase to search (if action is search)
```"""
3 resp = call_llm(prompt)
yaml_str = resp.split("```yaml")[1].split("```")[0].strip()
4 return yaml.safe_load(yaml_str)
5 def post(self, shared, prep_res, exec_res):
if exec_res["action"] == "search":
shared["search_term"] = exec_res["search_term"]
6 return exec_res["action"]- 1
-
prepgathers context — pulls the original query and any search results accumulated so far. On the first iteration, context is empty. - 2
-
execreceives only whatprepreturned — it can’t see the shared store directly. This tuple(query, context)is all it knows. - 3
-
The LLM makes the routing decision — the prompt asks it to choose between “search” and “answer.” This is the equivalent of
stop_reasonin Chapter 6, but the model decides explicitly rather than implicitly. - 4
-
Structured output via YAML parsing — since PocketFlow is LLM-agnostic, we can’t rely on Claude’s native
tool_use. Instead, we prompt for YAML and parse it. Less reliable than native tool calls, but works with any model. - 5
-
postwrites side effects and routes — if the model chose “search,” we stash the search term in the store so the next node can find it. - 6
-
The return value IS the routing decision — returning
"search"follows the- "search" >>edge. Returning"answer"follows the- "answer" >>edge. This one line controls the entire flow.
7.4.2 The action node — executing the tool
When the decision node returns "search", the flow routes here:
class SearchWeb(Node):
def prep(self, shared):
1 return shared["search_term"]
def exec(self, term):
2 return search_web(term)
def post(self, shared, prep_res, exec_res):
3 prev = shared.get("context", [])
shared["context"] = prev + [{"term": shared["search_term"], "result": exec_res}]
4 return "decide"- 1
-
Reads the search term that
DecideAction.post()just wrote to the store. - 2
- Pure execution — calls your search utility. Could be a web API, a local database, a vector store. The node doesn’t care.
- 3
-
Accumulates results — appends this search result to the context list. Next time
DecideActionruns, it will see all previous searches inprep(). - 4
-
Routes back to the decision node — this creates the loop. The agent will keep searching until
DecideActiondecides it has enough context and returns"answer"instead.
7.4.3 The terminal node — producing the final answer
When the decision node returns "answer", the flow routes here instead:
class Answer(Node):
def prep(self, shared):
1 return shared["query"], shared.get("context", "")
def exec(self, inputs):
query, context = inputs
2 return call_llm(f"Based on this context:\n{context}\n\nAnswer: {query}")
def post(self, shared, prep_res, exec_res):
3 shared["answer"] = exec_res
# No return — defaults to "default", but there's no outgoing edge, so flow stops- 1
- Gathers everything — the original query plus all accumulated search results.
- 2
- Final LLM call — synthesizes an answer from the gathered context. This is the “text response” equivalent from Chapter 2.
- 3
-
Writes the answer and stops — no return value means
"default", but sinceAnswerhas no outgoing edges, the flow terminates here. This is how PocketFlow encodes “done” — not withif stop_reason, but with graph topology.
7.4.4 Wiring it together
- 1
-
If
decide.post()returns"search"→ go toSearchWeb - 2
-
If
decide.post()returns"answer"→ go toAnswer(terminal — no outgoing edges) - 3
-
After
search.post()returns"decide"→ loop back toDecideAction
Three lines define the entire control flow. The loop, the exit condition, and the branching are all encoded in the graph edges — not in if statements or while loops.
7.4.5 Map to the ReAct loop
| ReAct concept (Chapter 2) | PocketFlow equivalent |
|---|---|
| “Text or tool call?” decision | DecideAction.exec() returns action label |
| Tool execution | SearchWeb.exec() runs the tool |
| Tool result appended to context | SearchWeb.post() updates shared store |
| Loop continues | Edge search - "decide" >> decide routes back |
| Final text response | Answer node — no outgoing edges, flow ends |
The mechanism is identical. PocketFlow just makes every piece explicit — there’s no hidden framework magic deciding what happens next. The post() return value IS the routing decision.
In Chapter 6, the agent loop was a while True with an if stop_reason check. Here, the same behavior emerges from graph structure: decide → search → decide creates the loop, and decide → answer (with no outgoing edge from answer) creates the exit. The LLM doesn’t know it’s in a graph — it just sees its prompt and returns YAML. The post() method parses that YAML and returns the action string that routes the flow.
This is more powerful than while True because you can have multiple loop shapes in the same graph — a search loop, a verification loop, a refinement loop — all as different subgraphs with their own entry and exit points.
Chapter 6’s approach uses the model’s native tool_use mechanism — the API returns structured tool calls directly. PocketFlow is LLM-agnostic, so it can’t rely on any provider’s tool-calling API. Instead, it prompts the model to output structured text (YAML here) and parses it in exec(). The tradeoff: tool_use is more reliable (the API enforces the schema), but YAML/JSON output works with any model, including local ones that don’t support function calling.
In production, you’d add validation (assert result["action"] in ["search", "answer"]) and retry logic to handle malformed outputs — which PocketFlow’s built-in retry mechanism supports via max_retries.
7.5 Pattern 3: RAG (Retrieval-Augmented Generation)
RAG splits into two flows that share the same store: offline indexing (run once) and online retrieval (run per query).
Indexing is expensive — chunking, embedding, and storing thousands of documents. Querying is cheap — embed one question, search, generate. By splitting them into separate Flow objects that share the same shared dictionary, you index once and query many times. The shared store acts as the bridge: offline_flow writes shared["index"], then online_flow reads it. This is the same pattern as building a database (expensive, once) and querying it (cheap, many times).
7.5.1 Offline flow: chunking and indexing
- 1
-
Plain
Node— chunking is a single operation on a single input (the raw text). The output is a list, but the input isn’t, so this is not the batch case. - 2
-
exec()does the work — keeps I/O (sharedaccess) inprep/post, computation inexec. The retry/error semantics PocketFlow promises only apply to what’s insideexec(). - 3
-
Write to
sharedinpost()— never write tosharedfrompreporexec. The next node will readshared["chunks"](see callout below — this implicit cross-node contract is what §7.8 calls out as PocketFlow’s main liability).
- 1
-
BatchNode(notNode) — signals that this node iteratesexec()over a list. The input is already a list of chunks; we want one embedding call per chunk, not one call with all of them. - 2
-
prepreturns the list —shared["chunks"]was written byChunkDocs.post()in the previous step. This is the implicit contract: nothing in the type system saysEmbedChunksneedschunksto exist. Rename a key upstream and this breaks silently at runtime. The Burr section (Section 7.9) replaces this string-keyed handoff with declaredreads/writes. - 3
-
exec()receives ONE chunk at a time — PocketFlow handles the iteration. This is the “map” in map/reduce. - 4
-
exec_resis the list of ALL results — after allexec()calls complete, PocketFlow collects them into a list and passes it topost(). This is the “reduce.”
1class BuildIndex(Node):
def prep(self, shared):
return shared["chunks"], shared["embeddings"]
def exec(self, inputs):
chunks, embeddings = inputs
2 return create_vector_index(chunks, embeddings)
def post(self, shared, prep_res, exec_res):
3 shared["index"] = exec_res
chunk = ChunkDocs()
embed = EmbedChunks()
index = BuildIndex()
chunk >> embed >> index
4offline_flow = Flow(start=chunk)- 1
-
Back to regular
Node— building the index is a single operation, not per-item. - 2
- Creates a searchable structure — your vector DB, FAISS index, or in-memory cosine similarity store.
- 3
- The index lives in the shared store — available for the online flow to read.
- 4
- This flow runs once — indexing is expensive. You do it once, then query many times.
7.5.2 Online flow: retrieve and answer
class RetrieveContext(Node):
def prep(self, shared):
1 return shared["question"], shared["index"]
def exec(self, inputs):
question, index = inputs
2 q_embedding = get_embedding(question)
3 return search_index(index, q_embedding, top_k=3)
def post(self, shared, prep_res, exec_res):
shared["retrieved"] = exec_res
class GenerateAnswer(Node):
def prep(self, shared):
4 return shared["question"], shared["retrieved"]
def exec(self, inputs):
question, context = inputs
5 return call_llm(f"Context:\n{context}\n\nAnswer: {question}")
def post(self, shared, prep_res, exec_res):
shared["answer"] = exec_res
retrieve = RetrieveContext()
generate = GenerateAnswer()
retrieve >> generate
online_flow = Flow(start=retrieve)- 1
- Reads the index that the offline flow built — the shared store bridges the two flows.
- 2
- Embeds the question using the same embedding function as the chunks — this ensures the vectors are in the same space.
- 3
- Semantic search — finds the 3 most similar chunks to the question.
- 4
- Reads retrieved context — the chunks most relevant to the question.
- 5
- Grounded generation — the LLM answers using only the retrieved context, not its training data. This is what makes RAG reliable: the answer is traceable to specific source chunks.
7.5.3 Running both flows
- 1
-
Index once — after this,
sharedcontains"chunks","embeddings", and"index". - 2
-
Query many times — each query adds
"question","retrieved", and"answer"to the shared store. The index is reused.
The shared dict lives in memory — if the process dies, the index is lost. In production, you’d serialize shared["index"] to disk in BuildIndex.post() and deserialize in RetrieveContext.prep(). PocketFlow also has an AsyncParallelBatchNode that runs exec_async() calls concurrently — useful when embedding 1000 chunks against a rate-limited API, but it requires an AsyncFlow and async def exec_async() overrides on the node.
7.6 The Design Is the Hard Part
PocketFlow’s creator makes a point that applies directly to the SE tutorial’s thesis:
“If Humans can’t specify the flow, AI Agents can’t automate it!”
Their Agentic Coding methodology maps human vs. AI responsibility:
| Step | Human | AI | Why |
|---|---|---|---|
| Requirements | ★★★ | ★☆☆ | Humans understand the problem |
| Flow design | ★★☆ | ★★☆ | Humans specify structure, AI fills details |
| Utilities | ★★☆ | ★★☆ | Humans know the APIs, AI implements |
| Implementation | ★☆☆ | ★★★ | AI writes the node code |
| Testing | ★☆☆ | ★★★ | AI generates test cases |
This is the same argument from Chapter 1: development is commoditized, engineering is not. The framework is 100 lines. The hard part is deciding what nodes to build, what edges to draw, and what data flows through the shared store. That’s design — and it requires human judgment.
7.7 Comparison: PocketFlow vs. Agent SDK
| Agent SDK (Chapter 5) | PocketFlow | |
|---|---|---|
| Model | Claude only | Any LLM |
| Loop | Built-in ReAct (you configure it) | You build the loop yourself |
| Tools | Built-in (Read, Edit, Bash, etc.) | You implement utilities |
| Context management | Automatic (compaction, sessions) | Manual (shared store) |
| Production readiness | High (Anthropic-hosted) | You own the infrastructure |
| Learning value | “How to use an agent” | “How an agent works inside” |
| When to use | Claude-powered applications | Custom agents, other LLMs, learning |
The Agent SDK is a car — you drive it. PocketFlow is a kit car — you build it, so you understand every bolt.
7.8 When PocketFlow Stops Being Enough
PocketFlow’s mutable shared dictionary is great for learning but a liability at scale. Three pressures emerge between the toy and the deployment:
- Multiple people touch the agent. A mutable shared dict is a contract anyone can break. Without declared
reads/writes, any node can silently overwrite any other’s data — and the breakage shows up two iterations downstream, not where it happened. - A production run fails at 3 a.m. and you need to debug it. “What was
shared["context"]when the model went off the rails?” The answer requires either a print statement you added in advance or a state snapshot you kept. PocketFlow has neither. - You need audit trails and reproducibility. Regulators, customers, or your own future self will ask “why did the agent do X?” Without immutable state snapshots, you can’t answer.
The mental model — state machine over actions — is the right one. What’s missing are the production guardrails: typed contracts on state, immutable snapshots, fork-from-state replay, and observability that doesn’t rely on print statements.
The rest of this chapter takes that exact model and hardens it with Burr + Pydantic — one production-grade implementation of everything PocketFlow teaches. It’s a deliberate teaching choice: Burr’s actions, typed state, and fork-from-state replay make the production concepts unusually legible, and they transfer even if you later ship on a different framework. The biggest of those alternatives is LangGraph, which does the same job with a much larger ecosystem; we compare the two head-to-head in Section 7.12. Chapter 8 covers two more production frameworks (Agno and mcp-agent) that solve adjacent problems — batteries-included multi-agent orchestration and MCP-native pattern composition. Chapter 9 covers Mastra, the TypeScript equivalent for teams working in that ecosystem.
flowchart TD
P["PocketFlow<br/>(learn the model)"] --> B["Burr + Pydantic<br/>typed FSM + replay"]
P --> A["Agno (Ch 8)<br/>batteries-included platform"]
P --> M["mcp-agent (Ch 8)<br/>MCP-native patterns"]
P --> MA["Mastra (Ch 9)<br/>TypeScript agents"]
B --> Q["Production deployment"]
A --> Q
M --> Q
MA --> Q
style P fill:#fef3c7,stroke:#92400e,color:#92400e
style B fill:#dbeafe,stroke:#1e40af,color:#1e40af
style A fill:#dcfce7,stroke:#166534,color:#166534
style M fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
style MA fill:#e0f2fe,stroke:#0369a1,color:#0369a1
style Q fill:#fce7f3,stroke:#9d174d,color:#9d174d
7.9 Burr + Pydantic: Typed FSM with Replay
Apache Burr (incubating) preserves PocketFlow’s mental model — a state machine over actions — but replaces every loose part with a guardrail. The mutable shared dict becomes an immutable State. The prep → exec → post lifecycle becomes an @action that declares its reads and writes. The hand-drawn flow diagram becomes a graph you can ask the framework to draw for you.
And — a key reason we reach for Burr as the typed upgrade — it has a first-class Pydantic integration. You can declare your state as a BaseModel, attach it to the application, and write actions that take a typed state object instead of a string-keyed dict. The IDE autocompletes the fields. Pydantic validates writes. The “any action can corrupt any key” problem becomes a typed-attribute problem.
We build on Burr because its actions-and-transitions model makes the production concepts in this section — typed state contracts, immutable snapshots, replay debugging — exceptionally clear, and because its observability and fork-from-state are strong out of the box and fully self-hosted. For many teams the right default is instead LangGraph, whose ecosystem and community are far larger. Read this section for the concepts; they carry over either way. Section 7.12 lays out exactly when to pick which.
7.9.1 The core API
The smallest possible Burr application:
Throughout this section, llm is whatever LLM client you use — anything with a .complete(prompt) -> str method. The Burr-specific piece is how dependencies like llm get injected into actions: declare the parameter, then pass it via .bind() at the builder. Unbound parameters become required runtime inputs to app.run(inputs={...}) instead.
from burr.core import ApplicationBuilder, State
from burr.core.action import action
0llm = MyLLMClient(...)
1@action(reads=["query"], writes=["answer"])
2def answer_question(state: State, llm) -> State:
answer = llm.complete(f"Answer: {state['query']}")
3 return state.update(answer=answer)
app = (
ApplicationBuilder()
4 .with_actions(answer_question=answer_question.bind(llm=llm))
5 .with_transitions()
.with_state(query="What is context engineering?", answer="")
.with_entrypoint("answer_question")
.build()
)
6action_run, result, state = app.run(halt_after=["answer_question"])
print(state["answer"])- 0
-
llmis your LLM client — Anthropic, OpenAI, whatever has a.complete()method. Shown as a placeholder here. - 1
-
@action(reads=..., writes=...)is the declaration that replaces PocketFlow’s shared-dict free-for-all. The framework reads this at build time to detect missing keys, enforce write boundaries, and figure out which actions can run in parallel. - 2
-
The action receives a
Stateand any bound dependencies (herellm). The shared-store-vs-exec separation from PocketFlow collapses into one function with explicit declarations. - 3
-
state.update(...)returns a newState— it does not mutate. Every transition produces a new snapshot, which is what makes replay possible (Section 7.9.4). - 4
-
with_actions(name=action.bind(...))registers actions and pre-binds their non-state dependencies..bind()works likefunctools.partial. The kwarg key (answer_question=) becomes the action’s name in the graph; positional form (with_actions(answer_question)) also works and usesfunc.__name__. - 5
-
with_transitions(...)declares the edges. For a one-action app there are none — left empty here. The full syntax (label, label,Condition) is covered in Section 7.9.3. - 6
-
run(halt_after=[...])returns the(action_obj, result_dict, final_state)triple and stops after the named action completes. Burr also hasstep,iterate, andstream_resultvariants — see the next subsection.
In PocketFlow, SearchWeb.post() accidentally overwriting shared["query"] breaks the agent silently — and you don’t notice until the next decision node reads the wrong question. Burr’s @action(reads=["query"], writes=["context"]) lets the framework enforce that the action cannot write to query. The framework also uses these declarations to: (1) detect missing entries at build time, (2) figure out which actions are independent and could run in parallel, (3) attribute every state change to a specific action in the debugging UI. The principle is the same as database column-level permissions — limit the blast radius before the blast.
7.9.2 Typed state with Pydantic
The stringly-typed reads=["query"] declaration is still string-typed — a typo in the key gives you a runtime error, not a compile-time one. The Pydantic integration upgrades that to a real type.
We’re going to use one canonical QAState for the rest of the chapter. The full schema has five fields — three of them (query, context, answer) carry the question-and-answer data, two more (next_action, search_term) will be used by the routing logic in Section 7.9.3. We introduce all five now so there’s only ever one QAState to track:
from pydantic import BaseModel, Field
from burr.core import ApplicationBuilder
from burr.core.action import action
from burr.integrations.pydantic import PydanticTypingSystem
1class QAState(BaseModel):
query: str
2 context: list[dict] = Field(default_factory=list)
3 next_action: str | None = None
search_term: str | None = None
answer: str | None = None
4@action.pydantic(reads=["query", "context"], writes=["answer"])
5def synthesize(state: QAState, llm) -> QAState:
6 state.answer = llm.complete(
f"Question: {state.query}\nContext: {state.context}\nAnswer:"
)
return state
app = (
ApplicationBuilder()
.with_actions(synthesize=synthesize.bind(llm=llm)) # bind llm dep (see §7.9.1)
7 .with_typing(PydanticTypingSystem(QAState))
.with_state(QAState(query="What is context engineering?"))
.with_entrypoint("synthesize")
.build()
)
_, _, state = app.run(halt_after=["synthesize"])
8print(state.data.answer)- 1
-
State is a Pydantic
BaseModel. Fields are typed; defaults are real Python values, not magic strings. - 2
-
context: list[dict]holds search results — each entry is{"term": ..., "result": ...}. Thesynthesizeaction below treats the whole list as opaque context for the LLM; the search action in the worked example is what appends to it. - 3
-
next_actionandsearch_termare unused by this minimal example but declared up front so the schema is stable across sections. Thedecideaction in the worked example writes them; the FSM transitions readnext_actionto choose the next edge. - 4
-
@action.pydanticis the typed sibling of@action. It still declaresreads/writes, but the action body sees a typed object. - 5
-
state: QAState— the IDE autocompletesstate.query,state.context,state.answer. A typo (state.querry) is a static-analysis error before you run. - 6
- Field mutation looks normal. Burr’s runtime captures the change and produces a new immutable snapshot under the hood; you write idiomatic Python.
- 7
-
with_typing(PydanticTypingSystem(QAState))attaches the type system to the application. Burr now validates state transitions against the schema. - 8
-
state.datareturns the typed object afterrun().state.data.answeris what you get;state["answer"]still works but you’ve left the typed path.
For streaming actions, the typed decorator is @streaming_action.pydantic with explicit state_input_type, state_output_type, and stream_type parameters. The streaming story matters when you want token-by-token output to a UI while still preserving the typed-state contract.
Pydantic AI and its sibling pydantic-graph give you a typed agent and a typed graph respectively — BaseNode[StateT, DepsT, RunEndT] with edges inferred from run() return annotations, plus first-class Logfire/OpenTelemetry observability. It’s a coherent typed-FSM story.
The reason we’re treating Burr-plus-Pydantic as the typed pick instead: pydantic-graph does not have a documented persistence or fork-from-state story. You get the typed graph, but you don’t get the replay debugger. Burr-plus-Pydantic gives you typed state and fork-from-state in the same framework. If you don’t need replay and you do want the agent-level typing (deps, output schemas), reach for Pydantic AI — it’s the cleaner choice for that specific job.
7.9.3 Transitions: the edges of the FSM
Before persistence makes sense, we need to look at the edge syntax. The first two snippets used .with_transitions() either empty or hand-waved with (...) — fine for a single-action app, misleading once the FSM has branches. The real shape:
- 1
-
Three condition forms ship with
burr.core:when(field=value)for equality (with operator suffixes__gt,__lt,__in,__ne),expr("python expression")for arbitrary Python evaluated against state, and the sentineldefaultfor the catch-all branch. Negate with~when(...)or~expr(...). - 2
-
Each tuple is
(from_label, to_label, Condition). Slots 1 and 2 are strings — the action names you registered inwith_actions(...). Slot 3 is aConditionobject, not an arbitrary callable. Conditions are evaluated in declaration order; the first one that returns true wins. - 3
-
defaultis a sentinel imported fromburr.core— not a user-defined function despite reading like one. Reach for it as the fallback branch after the typed conditions. - 4
-
Length-2 shortcut — omit the condition and Burr inserts
defaultfor you. A list in slot 1 fans multiple sources into one destination.
If no condition is true and there’s no default, the run halts. That’s a feature: it surfaces missing edges instead of looping silently.
7.9.4 State persistence and fork-from-state
The immutable state.update(...) pattern isn’t aesthetics — it’s the foundation of Burr’s killer feature. Because every transition produces a new snapshot rather than mutating, the framework can save those snapshots and, later, fork execution from any one of them. This is what makes a production failure debuggable: you don’t have to reproduce the bug, you replay the exact state that led to it.
from burr.core import ApplicationBuilder, when, expr, default
1from burr.core.persistence import SQLLitePersister
from burr.integrations.pydantic import PydanticTypingSystem
persister = SQLLitePersister(db_path="./burr.db", table_name="qa_states")
2persister.initialize()
app = (
ApplicationBuilder()
.with_actions( # <2a>
decide=decide.bind(llm=llm),
search=search.bind(web=web),
synthesize=synthesize.bind(llm=llm),
)
.with_transitions(
("decide", "search", expr("next_action == 'search'")),
("decide", "synthesize", expr("next_action == 'answer'")),
("search", "decide", default),
)
.with_typing(PydanticTypingSystem(QAState))
3 .with_identifiers(app_id="qa-2026-05-29-001",
4 partition_key="user-42")
5 .with_state_persister(persister)
.with_entrypoint("decide")
.build()
)- 1
-
SQLLitePersister— the doubleLis intentional, not a typo. Postgres ships asPostgreSQLPersister; async versions live underburr.integrations.persisters(e.g.AsyncPGPersister). Custom backends extendBaseStatePersisterorAsyncBaseStatePersister. - 2
-
persister.initialize()creates the table if it doesn’t exist. Easy to forget — Burr will not auto-run schema setup at build time. - 3
-
app_iduniquely identifies this run. Auto-generated if omitted. This is what you fork from. - 4
-
partition_keygroups runs (per-user, per-session, per-tenant). Snapshots within a partition are queryable together. - 5
-
with_state_persister(persister)wires up automatic snapshot-after-action. Everystate.update(...)lands a row.
Once persistence is on, replay is another builder call. The forked builder needs the same actions, transitions, and typing as the original — it’s still a Burr application, just one whose initial state comes from a snapshot instead of with_state(...):
forked = (
ApplicationBuilder()
.with_actions(
decide=decide.bind(llm=llm),
search=search.bind(web=web),
synthesize=synthesize.bind(llm=llm),
)
.with_transitions(
("decide", "search", expr("next_action == 'search'")),
("decide", "synthesize", expr("next_action == 'answer'")),
("search", "decide", default),
)
.with_typing(PydanticTypingSystem(QAState))
1 .with_identifiers(app_id="qa-2026-05-29-001-fork-a",
partition_key="user-42")
2 .initialize_from(
persister,
resume_at_next_action=True,
default_state=None,
default_entrypoint="decide",
3 fork_from_app_id="qa-2026-05-29-001",
fork_from_partition_key="user-42",
4 fork_from_sequence_id=7,
)
.with_state_persister(persister)
.build()
)- 1
-
New
app_idfor the fork. The forked run gets its own identity so the original’s history isn’t overwritten — you can compare branches side by side in the UI. - 2
-
initialize_from(...)replaceswith_state(...). It tells the builder: load initial state from this persister rather than constructing it inline. - 3
-
fork_from_app_id+fork_from_partition_keylocate the source run within the persister. - 4
-
fork_from_sequence_id=7picks the snapshot written after the 7th action of the source run. Execution resumes from that state;resume_at_next_action=Truemeans the next action runs against it (set false to re-run the action that produced it). Change one thing — a prompt, a model, a tool binding — and observe whether the new branch goes right.
Production bugs are non-deterministic. A retry has different LLM output. A new vector DB query returns different chunks. The “reproduce the bug” step that’s routine in normal software is a research project in agent debugging.
Fork-from-state replaces reproduction with replay. You don’t reconstruct what the world looked like when the bug fired — Burr already saved that. You just point fork_from_sequence_id at the snapshot before the bad decision, change one thing, and observe whether the new branch goes right. It’s git checkout for agent state.
7.9.5 The Burr UI
Burr ships a local dashboard that reads the persister and visualizes runs:
- Graph view — the state machine, with actions colored by frequency
- Run timeline — every action of a specific
app_id, with input/output state for each - State diff — between any two snapshots, what changed and which action changed it
- Fork button — pick a snapshot, change a prompt, kick off a forked run from the UI
Combined with the OpenTelemetry exporter (and the Traceloop integration for hosted tracing), this gives you a single picture of “what the agent did, why it did it, and what state it had when it did it.” That’s the closest thing in the open-source agent ecosystem to a real debugger.
7.9.6 Hamilton as the sibling
Burr is maintained alongside Hamilton, the same company’s DAG library for data pipelines. If you already use Hamilton, Burr is the natural agent layer on top — the same reads/writes discipline carries over, and a Burr action can wrap a Hamilton driver for the data-prep half of a RAG pipeline. The combination shows up in the official Conversational RAG example.
For agents that need RAG, this is the Burr path: Hamilton for the index pipeline, Burr for the conversational loop, both inspectable through the same UI.
Burr is currently in the Apache Software Foundation incubator. The library is stable and used in production; the governance maturation is the work-in-progress.
7.10 Worked Example: Q&A Agent in Burr + Pydantic
Earlier in this chapter we built a Q&A agent in PocketFlow: a DecideAction node returning "search" or "answer", a SearchWeb node, and an Answer node. Now let’s port it to Burr + Pydantic and see what a production-grade version of the same logic looks like.
7.10.1 The state schema
We use the same QAState introduced in Section 7.9.2 — five fields, one canonical schema for the whole chapter:
- 1
-
contextaccumulates — each search appends a dict{"term": ..., "result": ...}; the decide step reads the whole list. - 2
-
Routing fields.
decidewritesnext_action("search"or"answer") andsearch_term; the transitions in Section 7.9.3 readnext_actionto pick the edge.
7.10.2 Three typed actions
import yaml
from burr.core.action import action
@action.pydantic(reads=["query", "context"],
writes=["next_action", "search_term"])
def decide(state: QAState, llm) -> QAState:
prompt = f"""Question: {state.query}
Previous searches: {state.context or "none"}
Reply in YAML:
```yaml
action: search | answer
search_term: <if action is search>
```"""
raw = llm.complete(prompt)
parsed = yaml.safe_load(raw.split("```yaml")[1].split("```")[0])
state.next_action = parsed["action"]
state.search_term = parsed.get("search_term")
return state
@action.pydantic(reads=["search_term", "context"], writes=["context"])
def search(state: QAState, web) -> QAState:
result = web.search(state.search_term)
state.context = state.context + [{"term": state.search_term,
"result": result}]
return state
@action.pydantic(reads=["query", "context"], writes=["answer"])
def answer(state: QAState, llm) -> QAState:
state.answer = llm.complete(
f"Question: {state.query}\nContext: {state.context}\nAnswer:"
)
return stateEach action declares the exact fields it touches. The framework can now check, before any LLM call, that search never writes to query.
7.10.3 Transitions and persistence
Transitions encode the loop using next_action as the routing field. We bind the same llm from earlier and a web placeholder (anything with a .search(term) -> str method — a real search SDK, an MCP tool, or a stub for testing):
from burr.core import ApplicationBuilder, expr
from burr.integrations.pydantic import PydanticTypingSystem
from burr.core.persistence import SQLLitePersister
llm = MyLLMClient(...)
web = MyWebSearch(...)
persister = SQLLitePersister("qa.db", "runs")
1persister.initialize()
app = (
ApplicationBuilder()
.with_actions(
decide=decide.bind(llm=llm),
search=search.bind(web=web),
answer=answer.bind(llm=llm),
)
.with_transitions(
2 ("decide", "search", expr("next_action == 'search'")),
("decide", "answer", expr("next_action == 'answer'")),
3 ("search", "decide"),
)
.with_typing(PydanticTypingSystem(QAState))
.with_state(QAState(query="Who won the 2024 Physics Nobel?"))
4 .with_state_persister(persister)
.with_identifiers(app_id="qa-001", partition_key="demo")
.with_entrypoint("decide")
.build()
)
_, _, final = app.run(halt_after=["answer"])
print(final.data.answer)- 1
-
Create the table.
persister.initialize()is required on first use; Burr does not run schema setup implicitly. - 2
-
Conditional transitions.
expr(...)evaluates a Python expression against the typed state. Conditions are checked in order; the first true one wins. - 3
-
Length-2 tuple = implicit
default.("search", "decide")is the same as("search", "decide", default)— unconditional aftersearch. - 4
-
Persistence on. Every snapshot lands in
qa.db. If a future run goes wrong, fork from the badsequence_id(see Section 7.9.4) and try a different prompt.
7.10.4 What you gained over PocketFlow
The PocketFlow version was roughly the same number of lines, but the Burr version ships with three guarantees the PocketFlow version doesn’t have:
| Guarantee | PocketFlow | Burr + Pydantic |
|---|---|---|
| State contract | Any node can write any key — typos are silent bugs | reads/writes declared and enforced; Pydantic validates types |
| Immutable snapshots | State is a mutable dict — no history | Every action produces a new snapshot; full audit trail |
| Fork-from-state replay | Not available — reproduce bugs manually | fork_from_sequence_id lets you rewind and retry from any point |
The shape is identical — decide/search/answer with edges encoding the loop. The difference is entirely in what the framework guarantees about the state flowing through that shape.
7.11 Beyond the Worked Example: Burr’s Feature Set
The Q&A agent used Burr’s spine — actions, typed state, transitions, persistence. Production agents need more than a spine. This section tours the features that turn it into a deployable system, each grounded in Burr’s documentation. None of them change the mental model; they hang off the same actions-and-transitions machine you already have.
7.11.1 Lifecycle hooks: cross-cutting logic without touching actions
An action should do one job. But production has concerns that cut across every action — structured logging, latency and token-cost metrics, PII redaction, a “pause here for approval” gate. Threading those into each action body duplicates code and buries the business logic. Burr’s answer is lifecycle hooks: small classes whose methods fire at fixed points in the run.
from typing import Any, Optional
from burr.core import Action, State
from burr.lifecycle import PreRunStepHook, PostRunStepHook
1class TracingHook(PreRunStepHook, PostRunStepHook):
def pre_run_step(self, *, state: State, action: Action,
2 **future_kwargs: Any):
print(f"▶ {action.name} reads={action.reads}")
def post_run_step(self, *, state: State, action: Action,
result: Optional[dict], sequence_id: int,
3 exception: Exception, **future_kwargs: Any):
if exception:
print(f"✗ {action.name} raised {exception!r}")
else:
print(f"✓ {action.name} wrote={action.writes} seq={sequence_id}")
# register at build time — hooks see every action, no action knows they exist
4app = ApplicationBuilder().with_actions(...).with_hooks(TracingHook()).build()- 1
-
A hook subclasses one or more hook protocols.
PreRunStepHook+PostRunStepHookare the two you reach for most; the family also includesPostApplicationCreateHook,PreRunApplicationHook/PostRunApplicationHook(around the whole run), andPostEndStreamHookfor streaming. - 2
-
pre_run_stepfires before each action. Always accept**future_kwargs— Burr adds keyword arguments over time, and this keeps your hook forward-compatible. - 3
-
post_run_stepfires after each action and receives theresult, thesequence_id(the snapshot number you’d fork from), and anyexception. This is exactly where a cost meter, an OpenTelemetry span, or a Datadog metric goes. - 4
-
with_hooks(...)registers them. The actions stay pure; the cross-cutting concern lives in one place. This is the idiomatic way to bolt observability vendors onto Burr without editing a single action body.
7.11.2 Streaming: token-by-token without losing typed state
The worked example returned the answer all at once. For a chat UI you want tokens as they arrive — but you still want the final, validated state snapshot for persistence and replay. Burr’s streaming actions give you both: yield partial chunks during generation, then a final yield carries the state update.
from typing import AsyncGenerator
from burr.core.action import StreamingAction
class StreamingAnswer(StreamingAction):
async def stream_run(self, state: State, **kwargs) -> AsyncGenerator[dict, None]:
buffer = []
1 async for delta in llm.astream(state["query"]):
buffer.append(delta)
2 yield {"response": delta}
3 yield {"response": "".join(buffer)}
@property
def reads(self): return ["query"]
@property
def writes(self): return ["answer"]
def update(self, result: dict, state: State) -> State:
4 return state.update(answer=result["response"])- 1
-
stream_runis an async generator. Eachyieldis a partial result pushed to the caller. - 2
- Intermediate chunks drive the UI — print them, send them over a websocket.
- 3
-
The last yield is the complete value that
updatewill commit. Burr distinguishes “still streaming” from “done” by the generator finishing. - 4
-
updateruns once at the end and produces the immutable snapshot — so streaming output and the replayable state contract coexist. The typed sibling is@streaming_action.pydantic(withstate_input_type,state_output_type,stream_type), which streams partial Pydantic objects via Instructor.
Consume it with the stream_result variant of run (the third sibling alongside run and iterate mentioned in Section 7.9 — §7.9.1):
action, container = app.stream_result(halt_after=["generate"], inputs={"query": q})
for partial in container: # tokens as they arrive
print(partial["response"], end="", flush=True)
_, _, state = container.get() # final, committed state once the stream closes7.11.3 Human-in-the-loop: halt, inspect, resume
HITL in Burr is not a hook — it’s the halting API plus the persister. You tell run to stop before a side-effecting action, inspect the proposed action, get a human decision, then resume the same application past the gate. Because the persister already snapshotted the state, the human can approve an hour later from a different process.
# 1. Stop BEFORE the risky step — nothing irreversible has happened yet
1action, result, state = app.run(halt_before=["execute_trade"])
# 2. State is fully serialized by the persister — show the human what's pending
proposed = state.data.proposed_trade
2if get_human_approval(proposed):
# 3. Resume the SAME app, now allowed past the gate, feeding the decision in
action, result, state = app.run(halt_after=["execute_trade"],
3 inputs={"approved": True})- 1
-
halt_before=["execute_trade"]runs the loop right up to — but not into — the named action, and returns control.halt_afteris its sibling for “stop once this completes.” - 2
- The gate is your code, not Burr’s. Burr’s job is to pause cleanly with the full state available; the approval channel (a web form, a Slack button, a CLI prompt) is yours.
- 3
-
Resuming is just another
runwithinputssupplying the human’s answer. With persistence on, this can be a brand-new process — load the app from the snapshot (initialize_from, Section 7.9.4) and continue. That’s what makes Burr HITL work for a deployed web service, not just a notebook.
7.11.4 Observability: a built-in UI, plus OpenTelemetry
This is Burr’s clearest advantage, and the row most worth getting right in the comparison below. Burr ships its own open-source tracking server — you do not wire up a third-party SaaS to see traces.
app = (
ApplicationBuilder()
.with_actions(...)
1 .with_tracker("local", project="qa-agent")
.build()
)
# then, in a terminal: `burr` → opens the UI at http://localhost:7241- 1
-
with_tracker("local", project=...)streams every run, action, and state snapshot to the local tracking store. The bundledburrCLI launches the dashboard — the same graph view, run timeline, state diff, and fork button from Section 7.9.5, now fed live.
On top of the built-in UI, Burr has two OpenTelemetry integrations: it can (1) export its own action-level traces to any OTel backend, and (2) capture OTel spans created inside an action — e.g. by an auto-instrumented LLM SDK — and nest them under that step. The result is one timeline that shows the agent’s decisions and the HTTP calls underneath them. By contrast, LangGraph’s first-party tracing UI is LangSmith, a commercial hosted service (it also speaks OTel); Burr’s equivalent is OSS and self-hosted.
7.11.5 Testing: fixtures captured from real runs
Because every action is pure — State in, State out — you can unit-test one in isolation without standing up the whole graph. Burr ships a pytest helper that parametrizes a test from a JSON fixture of expected input/output state (which you can capture straight from a real run via the tracker):
import pytest
from our_agent import decide
from burr.core import state
1from burr.testing import pytest_generate_tests
2@pytest.mark.file_name("decide_search.json")
def test_decide_routes_to_search(input_state, expected_state):
in_state = state.State.deserialize(input_state)
3 out_state = decide(in_state, llm=fake_llm)
# exact match, or fuzzy/LLM-graded for non-deterministic fields
assert out_state["next_action"] == expected_state["next_action"]- 1
-
Importing
pytest_generate_testsactivates Burr’s file-based parametrization. - 2
-
@pytest.mark.file_name(...)points at a fixture holdinginput_stateandexpected_state— generated by serializing a captured snapshot, so your tests exercise real production states. - 3
- Call the action directly with a stub LLM. No builder, no transitions, no I/O — the reads/writes contract is what makes this isolation sound.
7.11.6 Parallelism, briefly
When one step must fan out — summarize ten documents, query five tools, evaluate three candidate answers — Burr’s parallelism module spawns sub-applications and gathers their states (map-reduce over state). It’s the recursive case of the same model: an action whose body is itself a set of Burr runs. We don’t use it for the Ralph loop (sequential by nature), but it’s the right tool for batch agents.
7.12 Burr vs LangGraph
Burr’s closest neighbor is LangGraph — the other open-source framework that models an agent as a stateful graph you can persist, replay, and inspect. If you’re choosing between them, the differences are real but narrower than the marketing suggests: both express a state machine over an LLM loop, and both support persistence, streaming, human-in-the-loop, and replay. The split is about defaults, ergonomics, and ecosystem, not capability checkboxes.
| Dimension | Burr | LangGraph |
|---|---|---|
| Programming model | Explicit FSM — @actions with declared reads/writes; edges are (from, to, Condition) tuples |
Graph — nodes are functions; edges (incl. conditional) wire them, routing returns the next node |
| State | Immutable State; one snapshot per step, automatic |
Typed channels (TypedDict) with reducers you define for merge semantics |
| Observability | Built-in OSS tracking UI + OpenTelemetry, self-hosted | LangSmith (commercial SaaS) or OpenTelemetry |
| Replay / time-travel | fork_from_sequence_id — fork from any snapshot |
Checkpointer + update_state time-travel — both support it |
| Human-in-the-loop | halt_before/halt_after + inputs + persister |
interrupt() / Command(resume=...) |
| Streaming | Native (StreamingAction, stream_result) |
Native (stream/astream, multiple modes) |
| Testing | burr.testing pytest fixtures from captured runs |
Standard Python — invoke nodes directly; no dedicated helpers |
| Learning curve | Gentle — plain functions, explicit transitions | More moving parts — channels, reducers, the LangChain model |
| Ecosystem | DAGWorks; sibling Hamilton for data pipelines; Apache incubating | LangChain ecosystem — LangSmith, LangServe, large integration surface |
| GitHub stars (early 2026) | ~2k | ~34k |
Star counts are a snapshot (Burr ~2k, LangGraph ~34k as of early 2026) and shift monthly — treat them as “small vs large community,” not a scoreboard. Two corrections to the comparison you’ll often see online: LangGraph does have replay (checkpointer time-travel), so it’s not a Burr exclusive; and Burr’s human-in-the-loop is the halt_before + persister mechanism above, not lifecycle hooks. Hooks are for cross-cutting concerns; halting is for HITL.
The honest summary: pick LangGraph if you already live in the LangChain ecosystem — the integration surface, LangSmith, and a 34k-vs-2k community gap are decisive when you need a connector that already exists or a teammate who already knows the model. Pick Burr when self-hosted observability and replay are first-order requirements — the built-in UI and fork_from_sequence_id are more polished out of the box, and the actions-and-transitions model has fewer concepts to onboard a team into. For this book’s purpose — a Ralph harness you need to debug when iteration 137 edits the wrong file — Burr’s built-in tracing and fork-from-state are exactly why we reach for it. Neither is wrong; they optimize for different first problems, and porting the core FSM between them is a day’s work because the surrounding ecosystem, not the loop, is the real lock-in.
7.13 Forward Link: From Typed State to the Ralph Loop
The next chapters introduce more frameworks and then Ralph — Geoffrey Huntley’s minimal autonomous coding loop: load spec, select task, execute, observe, repeat. Burr and the frameworks in Chapters 8–9 are different layers of the same stack. They provide the state machine; Ralph provides the loop shape that runs on it.
Two connections worth flagging now:
- Burr’s
@action.pydanticis what makes a Ralph harness inspectable. When the loop runs for 200 iterations and one of them edits the wrong file, you want to know which action wrote which field. Typed reads/writes are not optional at that scale. - The Evaluator-Optimizer pattern is the Ralph loop in disguise. Generate a candidate change; evaluate against the spec/tests; iterate. Section 10.2 will show that the “spec → execute → test → re-evaluate” loop is structurally the same pattern, just with code edits as the action space.
The book’s path is: this chapter picks the state machine, Chapter 8 covers more production options, the Ralph loop in Chapter 10 wraps the loop around it, and Chapters 12–13 scale the loop into a fleet and a career.
7.14 Key Takeaways
- An agent framework is just Graph + Shared Store — nodes connected by action edges, communicating through a shared dictionary
- The
prep → exec → postlifecycle separates I/O (shared store access) from computation (LLM calls) — making nodes retriable and testable - The agent pattern is a decision node that returns different action labels, routing the flow to different nodes (including back to itself)
- PocketFlow proves that 100 lines of framework code can implement workflows, agents, RAG, and multi-agent systems
- The framework isn’t the hard part — the design is. Deciding what nodes to build, what actions to route, and what data to share requires engineering judgment that no framework eliminates
- PocketFlow’s mutable shared dict stops being enough when multiple developers share an agent, production failures need debugging, or you need audit trails
- Burr + Pydantic is one production-grade implementation of the same model — chosen here because it makes the concepts legible and ships strong self-hosted observability.
@action.pydantic,PydanticTypingSystem, and immutablestate.update()give compile-time-ish guarantees on state changes;fork_from_sequence_idis the production debugger - The Burr UI provides graph visualization, run timelines, state diffs, and a fork button — the closest thing in open-source agents to a real debugger
- Burr’s production feature set hangs off the same actions-and-transitions spine: lifecycle hooks for cross-cutting concerns, streaming actions that keep typed state intact,
halt_before+ persister for human-in-the-loop, a built-in OSS tracking UI plus OpenTelemetry, and pytest fixtures for testing pure actions in isolation - Burr vs LangGraph comes down to defaults and ecosystem, not capability: LangGraph wins on community size (~34k vs ~2k stars) and the LangChain integration surface; Burr wins on self-hosted observability and fork-from-state replay out of the box
- Chapter 8 covers Agno (batteries-included multi-agent platform) and mcp-agent (MCP-native pattern composition) — two more frameworks that solve adjacent production pressures
7.15 Concept Map
flowchart TD
GS["Graph + Shared Store"] --> PF["PocketFlow (100 lines)"]
PF --> W["Workflow pattern"]
PF --> AG["Agent pattern"]
PF --> RAG["RAG pattern"]
PF --> D["Design is the hard part"]
PF --> PR["Production pressures"]
PR --> T["Typed state contracts"]
PR --> R["Replayable state"]
PR --> A["Audit trails"]
T --> B["Burr + Pydantic"]
R --> B
A --> B
B --> UI["Burr UI (debugger)"]
B --> FS["Fork-from-state"]
B --> H["Hamilton (RAG pipelines)"]
B --> RL["Ralph loop (Ch 10)"]
PR --> CH8["Agno + mcp-agent (Ch 8)"]
PR --> CH9["Mastra (Ch 9)"]
style GS fill:#fef3c7,stroke:#92400e,color:#92400e
style PF fill:#fef3c7,stroke:#92400e,color:#92400e
style B fill:#dbeafe,stroke:#1e40af,color:#1e40af
style RL fill:#fce7f3,stroke:#9d174d,color:#9d174d
style CH8 fill:#dcfce7,stroke:#166534,color:#166534
style CH9 fill:#e0f2fe,stroke:#0369a1,color:#0369a1