9 Mastra: TypeScript Agents, End to End

One Framework, Every Primitive — From Tools to Deployment

Author

AI-Powered SE Tutorial

Published

June 21, 2026

Abstract

The Python chapters built a progression: PocketFlow showed the machinery naked, Burr added typed state and replay, Agno shipped a batteries-included platform, and mcp-agent treated MCP as the substrate. But the TypeScript ecosystem has its own answer — and it ships as a single framework. Mastra gives you agents, tools, workflows, memory, RAG, MCP integration, observability, and deployment in one package. This chapter is the TypeScript counterpart to Chapters 7 and 8: it covers the same concepts (agents, tools, state, memory, human-in-the-loop, multi-agent, observability) but through the lens of a framework designed for the Node.js runtime and the npm ecosystem. If you build in TypeScript, this is where the book’s agent patterns become native.

9.1 Why TypeScript for Agents?

Chapters 7 and 8 covered the agent landscape in Python — the natural home for ML pipelines and LLM client libraries. But a large class of agent deployments lives in a different world:

Full-stack web applications. Your backend is Next.js or Hono. Your API routes are TypeScript. Adding a Python agent means running a separate process, serializing data across the boundary, and maintaining two dependency trees.
Edge and serverless. Cloudflare Workers, Vercel Edge Functions, and Deno Deploy run JavaScript/TypeScript natively. Python cold-start overhead is a problem; TypeScript cold-start is measured in single-digit milliseconds.
The npm ecosystem. MCP servers, database clients, payment SDKs, CMS integrations — the JavaScript ecosystem has packages for everything your agent’s tools need to call. Using a Python agent means wrapping those libraries or duplicating them.
Team composition. Many product engineering teams are TypeScript-primary. Asking them to context-switch to Python for one subsystem increases the bus factor and slows iteration.

None of this makes Python wrong. It makes TypeScript also right — and for certain deployment shapes, it’s the shorter path.

Note

The concepts in this chapter — agents, tools, workflows, memory, human-in-the-loop — are the same ones from Chapters 7 and 8. If you’ve read those chapters, you already know what these things are. This chapter focuses on how Mastra implements them and where the TypeScript idioms diverge from the Python ones.

9.2 The Mastra Model: Everything Ships Together

The Python agent landscape asks you to compose: pick a graph library (PocketFlow, LangGraph, Burr), pick a memory backend, pick an MCP client, pick an observability layer, wire them together. Each choice is a dependency, a version matrix, and a configuration surface.

Mastra takes the opposite approach. It ships one npm package (@mastra/core) that includes agents, tools, workflows, memory, RAG, MCP client/server, structured output, observability, and a visual development studio. You npm install one thing and get the full stack.

flowchart TD
    M["@mastra/core"] --> AG["Agents"]
    M --> T["Tools"]
    M --> W["Workflows"]
    M --> ME["Memory"]
    M --> R["RAG"]
    M --> MC["MCP Client/Server"]
    M --> O["Observability"]
    M --> S["Studio"]
    M --> D["Deployment"]
    AG --> T
    AG --> ME
    AG --> MC
    W --> AG
    W --> T
    R --> ME
    O --> AG
    O --> W
    style M fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style AG fill:#dcfce7,stroke:#166534,color:#166534
    style T fill:#fef3c7,stroke:#92400e,color:#92400e
    style W fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style ME fill:#fce7f3,stroke:#9d174d,color:#9d174d
    style R fill:#fce7f3,stroke:#9d174d,color:#9d174d
    style MC fill:#dcfce7,stroke:#166534,color:#166534
    style O fill:#fef3c7,stroke:#92400e,color:#92400e
    style S fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style D fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8

Mastra’s integrated architecture — one framework, every primitive

This is the same philosophical choice as Agno (Section 8.2) — “batteries included” — but in a different language ecosystem with different deployment targets. The tradeoff is the standard one: you gain integration coherence, you lose mix-and-match flexibility. Mastra’s bet is that the integration coherence wins, especially for teams that want to go from “hello world” to “production deployment” without assembling a bespoke stack.

Who uses Mastra in production?

Replit, Fireworks, Medusa, SoftBank, and others run Mastra agents in production. The framework is open-source (MIT license), actively maintained, and backed by a company (Mastra Inc.) that also offers a hosted platform. It’s not a weekend project — it’s infrastructure that companies ship on.

9.3 Agents

An agent in Mastra is an autonomous unit: an LLM, a set of tools, instructions, and optionally memory. Creating one is a single constructor call:

import { Agent } from '@mastra/core/agent'

export const researchAgent = new Agent({
1  id: 'research-agent',
  name: 'Research Agent',
  instructions: `You are a research assistant. When asked a question,
    search for relevant information and synthesize a clear answer.
2    Always cite your sources.`,
3  model: 'openai/gpt-5.4',
})

1: id is the machine-readable identifier — used in logs, traces, and the studio. Keep it kebab-case and unique within your project.
2: instructions is the system prompt. This is where you encode the agent’s personality, constraints, and task scope. The same design principles from Chapter 3 apply: be specific, set boundaries, define the output format.
3: model uses the provider/model string format. Mastra supports OpenAI, Anthropic, Google, Groq, Fireworks, and others through a unified model router.

9.3.1 Generate vs. Stream

Agents expose two execution modes:

// Complete response — waits for the full output
const result = await researchAgent.generate(
1  'What is context engineering?'
)
2console.log(result.text)

// Token-by-token streaming
const stream = await researchAgent.stream(
3  'What is context engineering?'
)
for await (const chunk of stream.textStream) {
4  process.stdout.write(chunk)
}

1: .generate() sends the prompt and waits for the complete response. Use this for backend processing where latency to first token doesn’t matter.
2: result.text is the complete response string. For structured output, use result.object (see Section 9.11).
3: .stream() returns an async iterable. Use this for chat UIs where you want to show tokens as they arrive.
4: Each chunk is a string fragment. The stream also exposes fullStream for structured events (tool calls, step completions, etc.).

Model router — switching models without code changes

Mastra’s model field accepts a string like 'anthropic/claude-sonnet-4-6' or 'google/gemini-2.5-flash'. At runtime, Mastra routes to the correct provider SDK. This means you can swap models by changing one string — no import changes, no API client rewiring. The same agent definition works across providers, which is useful for cost optimization (use a fast model for triage, a strong model for synthesis) and for A/B testing model versions.

9.4 Tools

Tools are the bridge between the LLM and the outside world. Mastra creates them with createTool(), using Zod schemas for type-safe input and output:

import { createTool } from '@mastra/core/tools'
import { z } from 'zod'

const weatherTool = createTool({
1  id: 'weather-tool',
2  description: 'Get current weather for a city',
3  inputSchema: z.object({
    city: z.string().describe('City name'),
    units: z.enum(['celsius', 'fahrenheit']).default('celsius'),
  }),
4  outputSchema: z.object({
    temperature: z.number(),
    condition: z.string(),
    humidity: z.number(),
  }),
5  execute: async ({ inputData }) => {
    const data = await fetch(`https://api.weather.com/${inputData.city}`)
    return {
      temperature: data.temp,
      condition: data.condition,
      humidity: data.humidity,
    }
  },
})

1: id uniquely identifies the tool. The LLM sees this as the function name.
2: description is what the LLM reads to decide whether to call this tool. Make it precise — a vague description leads to incorrect tool selection.
3: inputSchema defines what the LLM must provide. Zod schemas give you validation at the boundary — if the LLM produces an invalid city name, the error is caught before execute() runs.
4: outputSchema defines what the tool returns. This is optional but valuable: it documents the contract, and Mastra can validate the output before handing it back to the LLM.
5: execute is the implementation. It receives { inputData } (the validated input) and returns the output. This is where your business logic lives.

9.4.1 Attaching tools to agents

Tools become available to an agent through the tools property:

const agent = new Agent({
  id: 'weather-agent',
  name: 'Weather Agent',
  instructions: 'You help users check the weather.',
  model: 'anthropic/claude-sonnet-4-6',
1  tools: { weatherTool },
})

2const result = await agent.generate('What is the weather in Tokyo?')

1: tools: { weatherTool } — pass tools as an object. The property name becomes the tool name the LLM sees. You can attach multiple tools: tools: { weatherTool, stockTool, calendarTool }.
2: When the agent processes this prompt, it will recognize that it needs weather data, call weatherTool with { city: 'Tokyo' }, receive the result, and synthesize a natural-language response.

This is the same pattern as Chapter 5’s SDK tool use, but with Zod schemas replacing JSON Schema definitions and TypeScript type inference replacing manual type assertions.

9.4.2 Advanced tool features

Two features worth knowing about for production tools:

toModelOutput — reshapes what the LLM sees. If your API returns a large payload but the LLM only needs a subset, toModelOutput filters it before the response enters the context window:

const searchTool = createTool({
  id: 'search',
  description: 'Search the web',
  inputSchema: z.object({ query: z.string() }),
  outputSchema: z.object({ results: z.array(z.object({
    title: z.string(), url: z.string(), snippet: z.string()
  })) }),
  execute: async ({ inputData }) => {
1    return await searchAPI(inputData.query)
  },
2  toModelOutput: ({ result }) => {
    return result.results.map(r => `${r.title}: ${r.snippet}`).join('\n')
  },
})

1: execute returns the full structured result (for your code to use).
2: toModelOutput returns a condensed string (for the LLM to read). This keeps the context window lean while preserving the full data for downstream processing.

transform — modifies the payload before or after tool execution, useful for authentication injection, rate limiting, or logging.

9.5 MCP Integration

The Model Context Protocol (MCP) is the interoperability layer from Chapter 4. Mastra integrates with MCP from both sides: as a client that consumes MCP servers, and as a server that exposes Mastra primitives to other MCP clients.

9.5.1 MCPClient — consuming MCP servers

import { MCPClient } from '@mastra/mcp'
import { Agent } from '@mastra/core/agent'

const mcp = new MCPClient({
  id: 'mcp-client',
  servers: {
1    wikipedia: {
      command: 'npx',
      args: ['-y', 'wikipedia-mcp'],
    },
2    weather: {
      url: new URL('https://server.smithery.ai/weather-api/mcp'),
    },
  },
})

const agent = new Agent({
  id: 'research-agent',
  name: 'Research Agent',
  instructions: 'Use Wikipedia and weather data to answer questions.',
  model: 'anthropic/claude-sonnet-4-6',
3  tools: await mcp.listTools(),
})

1: Stdio server. The command + args pattern starts a local process. Mastra manages the process lifecycle — spawning, health checks, cleanup.
2: HTTP server. The url pattern connects to a remote MCP server over HTTP. No process management needed; Mastra handles the protocol.
3: mcp.listTools() discovers all tools from all configured servers and returns them in the format Mastra agents expect. The agent treats MCP tools identically to native createTool() tools — no code-level distinction.

Note

This is the same client-server model from Chapter 4, but Mastra handles the transport negotiation, capability exchange, and tool discovery automatically. You don’t write the MCP plumbing — you declare the servers and Mastra wires them.

9.5.2 Static vs. dynamic tool loading

By default, mcp.listTools() fetches tools once at startup (static loading). For long-running agents that need to pick up new tools without restarting:

const mcp = new MCPClient({
  id: 'dynamic-client',
  servers: {
    tools_server: {
      url: new URL('https://my-tools.example.com/mcp'),
    },
  },
})

// Dynamic: re-fetch tools on each agent invocation
const agent = new Agent({
  id: 'dynamic-agent',
1  tools: async () => await mcp.listTools(),
})

1: Passing a function instead of an object makes tool loading dynamic. Each .generate() or .stream() call re-discovers available tools. Useful when MCP servers add/remove tools at runtime.

9.5.3 Tool approval — human-in-the-loop for tool calls

Not every tool call should execute automatically. For sensitive operations (database writes, financial transactions, external API mutations), Mastra’s first-class HITL story is the workflow suspend/resume pattern — wrap the sensitive tool call as a workflow step, suspend before executing it, and resume only after a human approves. The full mechanism is covered in Section 9.7; the shape:

const sensitiveOp = createStep({
  id: 'delete-user',
  inputSchema: z.object({ userId: z.number() }),
  resumeSchema: z.object({ approved: z.boolean() }),
  outputSchema: z.object({ deleted: z.boolean() }),
  execute: async ({ inputData, resumeData, suspend }) => {
    if (!resumeData) {
      // First pass: pause and wait for human.
      await suspend({ pendingAction: `delete user ${inputData.userId}` })
      return { deleted: false }
    }
    if (!resumeData.approved) return { deleted: false }
    await db.users.delete(inputData.userId)
    return { deleted: true }
  },
})

The advantage over a per-tool approval flag is composability: the same suspend/resume primitive handles approvals, multi-step human input, and long-running external callbacks, and survives process restarts because the workflow state is checkpointed. For agent-driven flows where each tool is a candidate for review, expose the workflow as a tool and let the agent invoke it; the approval is enforced at the workflow boundary, not the agent boundary.

9.5.4 MCPServer — exposing Mastra as an MCP server

The other direction: making your Mastra agents, tools, and workflows available to any MCP client (Claude Desktop, Cursor, other agents):

import { MCPServer } from '@mastra/mcp'

const server = new MCPServer({
  id: 'my-server',
  name: 'My Mastra Server',
1  version: '1.0.0',
2  agents: { researchAgent },
3  tools: { weatherTool },
4  workflows: { approvalWorkflow },
})

5await server.startStdio()
// Or, to serve over HTTP/SSE:
// await server.startHonoSSE({ port: 4111 })

1: version is required by the MCP protocol — pick semver for your server.
2: Agents become MCP tools. Each agent is exposed as a tool that other MCP clients can invoke. The agent’s instructions and capabilities are described in the tool’s MCP metadata.
3: Native tools pass through. createTool() tools are exposed with their Zod schemas translated to JSON Schema for MCP compatibility.
4: Workflows become tools. Each workflow’s input schema becomes the tool’s input; the workflow’s output becomes the tool’s response.
5: Transport-specific start methods. There is no generic server.start() — pick startStdio() for local clients (Claude Desktop), startHonoSSE() or startHTTP() for network clients. Choose by where your consumers live.

MCP registries — discovering tools at scale

Mastra supports MCP registries — curated directories of MCP servers that your agents can discover tools from. Supported registries include Klavis, mcp.run, Composio, Smithery, Apify, and Ampersand. Instead of hardcoding server URLs, you can point your agent at a registry and let it discover relevant tools dynamically. This is the MCP equivalent of a package manager — your agent npm installs capabilities at runtime.

9.6 Workflows

Agents are autonomous — you give them a goal and they figure out the steps. Workflows are deterministic — you define the steps and the framework executes them in order. When you need guaranteed execution order, explicit error handling, or auditability, workflows are the right primitive.

9.6.1 Steps and the builder pattern

import { createWorkflow, createStep } from '@mastra/core/workflows'
import { z } from 'zod'

1const extractStep = createStep({
  id: 'extract',
  inputSchema: z.object({ url: z.string() }),
  outputSchema: z.object({ text: z.string(), title: z.string() }),
  execute: async ({ inputData }) => {
    const page = await fetch(inputData.url)
    return { text: await page.text(), title: 'Extracted' }
  },
})

const summarizeStep = createStep({
  id: 'summarize',
  inputSchema: z.object({ text: z.string(), title: z.string() }),
  outputSchema: z.object({ summary: z.string() }),
  execute: async ({ inputData }) => {
    const summary = await llm.generate(`Summarize: ${inputData.text}`)
    return { summary }
  },
})

2const pipeline = createWorkflow({
  id: 'extract-and-summarize',
  inputSchema: z.object({ url: z.string() }),
  outputSchema: z.object({ summary: z.string() }),
})
3  .then(extractStep)
4  .then(summarizeStep)
5  .commit()

1: createStep defines a single unit of work. Each step has typed input and output schemas — the framework validates data at every boundary.
2: createWorkflow starts the builder. The workflow’s inputSchema is what you pass to run.start(); the outputSchema is what comes back.
3: .then(extractStep) — sequential execution. The output of the workflow input feeds into extractStep.
4: .then(summarizeStep) — the output of extractStep feeds into summarizeStep. Type compatibility is checked at build time.
5: .commit() finalizes the workflow definition. After this, the workflow is immutable and ready to execute.

9.6.2 Control flow

Mastra workflows support rich control flow beyond sequential .then():

// Parallel execution — fan-out, fan-in
const pipeline = createWorkflow({ id: 'parallel-example', ... })
1  .parallel([fetchFromDB, fetchFromAPI, fetchFromCache])
2  .then(mergeResults)
  .commit()

// Conditional branching
const pipeline = createWorkflow({ id: 'branch-example', ... })
3  .branch([
4    [isUrgent, handleUrgent],
    [isRoutine, handleRoutine],
5    [fallback, handleDefault],
  ])
  .commit()

// Iteration with concurrency control
const pipeline = createWorkflow({ id: 'foreach-example', ... })
6  .foreach(processItem, { concurrency: 5 })
  .commit()

// Loops
const pipeline = createWorkflow({ id: 'loop-example', ... })
7  .dountil(refineStep, checkQuality)
  .commit()

1: .parallel([...]) runs multiple steps concurrently. All must complete before the workflow proceeds (fan-out/fan-in).
2: After .parallel(), the next step receives an array of results — one per parallel branch.
3: .branch([...]) evaluates conditions in order and routes to the first matching handler.
4: Each branch is a [condition, step] pair. The condition is a function that receives the current data and returns a boolean.
5: Fallback branch — the last condition can be a catch-all.
6: .foreach(step, { concurrency }) iterates over an array, running step for each element with configurable parallelism.
7: .dountil(step, condition) runs step repeatedly until condition returns true. There’s also .dowhile(step, condition) for the inverse.

9.6.3 Shared state

When steps need to communicate data that doesn’t fit the input/output chain — accumulated context, running totals, configuration — use stateSchema:

const workflow = createWorkflow({
  id: 'stateful-workflow',
  inputSchema: z.object({ query: z.string() }),
  outputSchema: z.object({ answer: z.string() }),
1  stateSchema: z.object({
    searchResults: z.array(z.string()).default([]),
    attempts: z.number().default(0),
  }),
})

const searchStep = createStep({
  id: 'search',
  inputSchema: z.object({ query: z.string() }),
  outputSchema: z.object({ results: z.array(z.string()) }),
2  execute: async ({ inputData, state, setState }) => {
    const results = await search(inputData.query)
3    setState({
      ...state,
      searchResults: [...state.searchResults, ...results],
      attempts: state.attempts + 1,
    })
    return { results }
  },
})

1: stateSchema declares the shape of shared workflow state. Zod validates it — a step can’t write a string where a number is expected.
2: state is a direct parameter, setState is the writer. Both arrive on the execute context. (Despite what an LLM might guess, there is no getState() function.)
3: State updates are validated and replace the whole object. Spread the current state and override the fields you’re changing. If you try to set attempts to "three", Zod catches it at runtime.

Mastra workflows vs. PocketFlow’s shared store

PocketFlow’s shared store is an untyped dict — any node can read or write any key, and the schema is implicit. Mastra’s stateSchema is the typed equivalent: Zod validates every read and write. The tradeoff is the same one Burr’s PydanticTypingSystem makes — you trade flexibility for safety. The difference: Mastra validates at runtime (JavaScript is not statically typed the way Pydantic is), but the Zod schemas give you IDE autocompletion and schema-based documentation.

9.6.4 Execution and results

const mastra = new Mastra({ workflows: { pipeline } })
const run = mastra.getWorkflow('extract-and-summarize')
1  .createRun()

const result = await run.start({
2  inputData: { url: 'https://example.com/article' },
})

3if (result.status === 'success') {
  console.log(result.result.summary)
} else if (result.status === 'failed') {
  console.error(result.error)
4} else if (result.status === 'suspended') {
  // Handle human-in-the-loop (see next section)
}

1: createRun() instantiates a workflow execution. Each run has its own state and lifecycle.
2: start({ inputData }) kicks off execution with validated input.
3: Results are discriminated unions. The status field tells you which shape the result has — 'success', 'failed', 'suspended', or 'tripwire'. Pattern-match on it instead of try-catch.
4: 'suspended' means the workflow hit a human-in-the-loop point and is waiting for input (see Section 9.7).

9.7 Suspension and Human-in-the-Loop

Real-world workflows often can’t run to completion without human input. An approval step, a content review, a budget sign-off — these are points where the workflow must pause, notify a human, wait for their response, and resume. Mastra makes this a first-class primitive.

9.7.1 The suspend/resume pattern

const approvalStep = createStep({
  id: 'approval',
  inputSchema: z.object({ proposal: z.string(), cost: z.number() }),
  outputSchema: z.object({ decision: z.string() }),
1  resumeSchema: z.object({
    approved: z.boolean(),
    comment: z.string().optional(),
  }),
  execute: async ({ inputData, resumeData, suspend }) => {
2    if (!resumeData) {
3      await suspend({
        reason: 'Awaiting manager approval',
        proposal: inputData.proposal,
        estimatedCost: inputData.cost,
      })
    }

4    if (resumeData.approved) {
      return { decision: `Approved: ${resumeData.comment || 'no comment'}` }
    } else {
      return { decision: `Rejected: ${resumeData.comment}` }
    }
  },
})

1: resumeSchema defines what the human provides when resuming. Zod validates the human’s input — if they forget to include approved, the framework catches it.
2: First execution — resumeData is undefined, so the step hasn’t been resumed yet.
3: suspend() pauses the workflow and stores the suspension context. The object you pass is available to whatever system notifies the human (a Slack message, an email, a dashboard).
4: After resumption — resumeData is populated with the human’s validated response. The step runs again from the top, but this time it has the data it needs to produce a result.

9.7.2 Resuming a suspended workflow

// Later — after the human has made their decision:
const result = await run.resume({
1  step: 'approval',
2  resumeData: { approved: true, comment: 'Looks good, ship it.' },
})

console.log(result.status)  // 'success' (if no more suspensions)

1: step identifies which suspended step to resume. A workflow can have multiple suspension points; you resume them individually.
2: resumeData is validated against the step’s resumeSchema. If the shape is wrong, the framework throws before the step re-executes.

9.7.3 Time-based suspension

Sometimes you don’t need human input — you need to wait for time to pass. sleep and sleepUntil are workflow-builder methods, not parameters on the step’s execute context. You chain them between steps:

const pipeline = createWorkflow({
  id: 'order-poll',
  inputSchema: z.object({ orderId: z.string() }),
  outputSchema: z.object({ status: z.string() }),
})
  .then(submitOrder)
1  .sleep(60 * 5)
2  // or: .sleepUntil(new Date('2026-06-01T09:00:00Z'))
  .then(checkStatus)
  .commit()

1: .sleep(seconds) pauses the workflow for a duration. The workflow is suspended and resumed automatically after the time elapses.
2: .sleepUntil(date) pauses until a specific timestamp. Useful for scheduled operations — “process this order when the business day starts.” Both methods can also take a function that derives the duration from the previous step’s output (e.g. (prev) => prev.retryAfterSeconds).

Why suspension matters for autonomous loops

Chapter 10 introduces the Ralph loop — an autonomous coding loop that runs until a spec is satisfied. But even autonomous loops need escape hatches: a test fails in a way that suggests a design mistake (not a code mistake), a deployment requires manual approval, a cost budget is exhausted. Mastra’s suspension primitive is the mechanism for these escape hatches. The loop runs autonomously until it can’t, suspends with context, and a human picks up where the machine left off. This is the “human-in-the-loop” pattern from Chapter 4, implemented as a workflow primitive.

9.8 Memory

Memory is what separates a stateless LLM call from a conversational agent. Mastra ships a multi-layered memory system that handles the spectrum from “remember what was said five messages ago” to “recall a fact from three weeks ago.”

9.8.1 The four memory layers

flowchart LR
    MH["Message History<br/>Recent messages"] --> OM["Observational Memory<br/>Compressed observations"]
    OM --> WM["Working Memory<br/>Structured user data"]
    WM --> SR["Semantic Recall<br/>Meaning-based retrieval"]
    style MH fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style OM fill:#dcfce7,stroke:#166534,color:#166534
    style WM fill:#fef3c7,stroke:#92400e,color:#92400e
    style SR fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8

Mastra’s memory layers — from short-term to semantic

Message History — the most recent N messages. Configurable via lastMessages. This is the “short-term memory” that keeps the conversation coherent.
Observational Memory — when the conversation exceeds lastMessages, older messages are automatically compressed into dense “observations” — factual summaries that preserve information while using fewer tokens. The agent sees the observations alongside recent messages, giving it a compressed view of conversation history.
Working Memory — structured data about the user or session. Unlike message history (which is conversational), working memory stores explicit facts: user preferences, account details, accumulated decisions. Think of it as the agent’s notepad.
Semantic Recall — meaning-based retrieval across all past conversations. When the agent needs to remember “that thing we discussed about the deployment architecture three weeks ago,” semantic recall uses embedding similarity to find relevant past messages regardless of how long ago they occurred.

9.8.2 Configuring memory

import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'

const agent = new Agent({
  id: 'memory-agent',
  name: 'Memory Agent',
  instructions: 'You are a helpful assistant who remembers user preferences.',
  model: 'anthropic/claude-sonnet-4-6',
  memory: new Memory({
    options: {
1      lastMessages: 20,
2      observationalMemory: true,
3      semanticRecall: {
        topK: 5,
        messageRange: { before: 2, after: 2 },
      },
    },
  }),
})

1: lastMessages: 20 — keep the 20 most recent messages in full. Older messages get compressed into observations (if enabled).
2: observationalMemory: true — enable automatic compression of older messages into observations. This is opt-in because it adds an LLM call to the memory pipeline.
3: semanticRecall — retrieve the top 5 semantically similar past messages, with 2 messages of context before and after each match. This gives the agent relevant historical context without loading the entire conversation history.

9.8.3 Multi-user threads

Agents serve multiple users. Memory must be isolated per user and per conversation:

// User A, conversation 1
await agent.generate('My favorite color is blue.', {
  memory: {
1    resource: 'user-alice',
2    thread: 'thread-001',
  },
})

// User B, conversation 1 — completely isolated
await agent.generate('My favorite color is red.', {
  memory: {
    resource: 'user-bob',
    thread: 'thread-002',
  },
})

// User A, same thread — agent remembers "blue"
const result = await agent.generate('What is my favorite color?', {
  memory: { resource: 'user-alice', thread: 'thread-001' },
})
console.log(result.text)  // "Your favorite color is blue."

1: resource identifies the user (or any entity — a team, a project, an organization).
2: thread identifies the conversation within that resource. A user can have many threads; each thread has its own memory.

9.8.4 Multi-agent memory scoping

When a supervisor agent delegates to sub-agents (Section 9.10), you need to control what memory each agent sees. Mastra supports memory isolation for supervisor-to-subagent delegation — the supervisor’s memory context doesn’t leak into the sub-agent’s context unless you explicitly share it. This prevents context confusion where a writing agent accidentally “remembers” the research agent’s internal reasoning.

9.9 RAG

Retrieval-Augmented Generation follows a pipeline in Mastra: document loading, chunking, embedding, storage, and retrieval. The framework provides primitives for each stage.

9.9.1 The RAG pipeline

flowchart LR
    D["Documents<br/>Text, PDF, HTML"] --> C["Chunking<br/>Split into segments"]
    C --> E["Embedding<br/>Vector representation"]
    E --> S["Storage<br/>Vector database"]
    S --> R["Retrieval<br/>Top-K similarity"]
    R --> A["Agent<br/>Answer with context"]
    style D fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style C fill:#dcfce7,stroke:#166534,color:#166534
    style E fill:#fef3c7,stroke:#92400e,color:#92400e
    style S fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style R fill:#fce7f3,stroke:#9d174d,color:#9d174d
    style A fill:#dcfce7,stroke:#166534,color:#166534

Mastra’s RAG pipeline — from raw documents to relevant context

import { MDocument } from '@mastra/rag'
import { PgVector } from '@mastra/pg'
0import { embedMany, embed } from 'ai'
import { openai } from '@ai-sdk/openai'

// 1. Load a document
1const doc = MDocument.fromText(rawText)

// 2. Chunk it
2const chunks = await doc.chunk({
  strategy: 'recursive',
  size: 512,
  overlap: 50,
})

// 3. Embed the chunks
3const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: chunks.map((c) => c.text),
})

// 4. Store in a vector database
const vectorStore = new PgVector({ connectionString: process.env.PG_URL })
4await vectorStore.upsert({
  indexName: 'knowledge-base',
  vectors: embeddings,
  metadata: chunks.map((c) => ({ text: c.text })),
})

// 5. Retrieve at query time
const { embedding: queryVector } = await embed({
  model: openai.embedding('text-embedding-3-small'),
  value: 'What is context engineering?',
})
5const results = await vectorStore.query({
  indexName: 'knowledge-base',
  queryVector,
  topK: 5,
})

0: Embeddings come from the ai SDK, not @mastra/core. Pair it with an @ai-sdk/* provider (@ai-sdk/openai, @ai-sdk/cohere, etc.). embedMany batches; embed does a single value.
1: MDocument.fromText() — also available: fromPDF(), fromHTML(), fromMarkdown(). The document abstraction normalizes different source formats.
2: Chunking strategies — 'recursive' splits on paragraph/sentence boundaries. 'sliding-window' creates overlapping windows. size and overlap control the tradeoff between context granularity and retrieval recall.
3: Batch embedding. embedMany is more efficient than calling embed per chunk. The values array must be raw strings, so we pull .text out of each chunk.
4: Vector storage takes a single options object. upsert and query both use { indexName, ... } — not positional (name, payload) args. Mastra ships adapters for pgvector, Pinecone, Qdrant, MongoDB Atlas, and others, all behind the same options shape.
5: Query — cosine similarity retrieval. topK controls how many chunks come back. These chunks become the context the agent uses to answer.

Note

The RAG pipeline is conceptually identical to what you’d build with LangChain or LlamaIndex in Python. Mastra’s contribution is that the same framework that defines your agents and tools also defines your RAG pipeline — no glue code between separate libraries.

9.10 Multi-Agent Systems

When a single agent can’t handle a task — because it requires different expertise, different tools, or different models — you compose multiple agents. Mastra’s primary pattern is the supervisor.

9.10.1 Supervisor pattern — agent-as-tool

const researcher = new Agent({
  id: 'researcher',
  name: 'Researcher',
  instructions: 'Search for information and return structured findings.',
  model: 'anthropic/claude-sonnet-4-6',
  tools: { searchTool, wikipediaTool },
})

const writer = new Agent({
  id: 'writer',
  name: 'Writer',
  instructions: 'Write clear, concise prose from research findings.',
  model: 'anthropic/claude-sonnet-4-6',
})

const supervisor = new Agent({
  id: 'supervisor',
  name: 'Research Supervisor',
  instructions: `You coordinate research tasks. Delegate research
    to the researcher and writing to the writer. Synthesize their
    outputs into a final deliverable.`,
  model: 'anthropic/claude-sonnet-4-6',
1  agents: { researcher, writer },
})

const result = await supervisor.generate(
  'Write a 500-word brief on the state of MCP adoption in 2026.'
)

1: agents: { researcher, writer } — sub-agents are automatically converted to tools. The supervisor sees tools named agent-researcher and agent-writer and calls them like any other tool. Mastra handles the serialization, context passing, and response collection.

The supervisor pattern is the same concept as Agno’s Team from Chapter 8, but with a different mechanism. Agno routes between team members automatically; Mastra exposes sub-agents as tools and lets the supervisor LLM decide the routing. The Mastra approach gives you more control (the supervisor’s instructions define the routing logic) at the cost of more tokens (the supervisor must reason about which agent to call).

Supervisor vs. workflow for multi-agent

When should you use a supervisor agent vs. a workflow with multiple agent steps? The rule of thumb: use a supervisor when the routing between agents is dynamic (the supervisor decides who to call based on the conversation), and use a workflow when the routing is static (research always happens before writing, always). The supervisor is more flexible; the workflow is more predictable and auditable.

9.11 Structured Output

LLMs return strings. Applications need objects. Mastra bridges the gap with schema-validated structured output.

9.11.1 Basic structured output

import { z } from 'zod'

const result = await agent.generate('Plan my day.', {
  structuredOutput: {
1    schema: z.object({
      activities: z.array(z.object({
        time: z.string().describe('HH:MM format'),
        name: z.string(),
        duration: z.number().describe('minutes'),
        priority: z.enum(['high', 'medium', 'low']),
      })),
      totalHours: z.number(),
    }),
  },
})

2console.log(result.object)
// { activities: [{ time: '09:00', name: 'Deep work', duration: 120, priority: 'high' }, ...], totalHours: 8 }

1: Zod schema defines the expected output shape. Mastra translates this to the model’s native structured output format (JSON mode, function calling, etc.) depending on the provider.
2: result.object is the typed, validated output. Not a string — a real JavaScript object with the shape you specified. TypeScript infers the type from the Zod schema, so result.object.activities[0].time has type string in your IDE.

9.11.2 Streaming structured output

For large structured outputs, you can stream the object as it’s being generated:

const stream = await agent.stream('Analyze these 50 items.', {
  structuredOutput: {
    schema: z.object({
      analyses: z.array(z.object({
        item: z.string(),
        sentiment: z.enum(['positive', 'negative', 'neutral']),
        confidence: z.number(),
      })),
    }),
  },
})

1for await (const partial of stream.objectStream) {
2  console.log('Partial result:', partial)
}

3const final = await stream.finalObject

1: stream.objectStream yields partial objects as the model generates them.
2: Partial results — the array grows as the model produces more items. You can render a progress indicator or update a UI incrementally.
3: stream.finalObject — the complete, validated object after streaming finishes.

9.11.3 Error strategies

What happens when the model produces output that doesn’t match the schema?

const result = await agent.generate('...', {
  structuredOutput: {
    schema: mySchema,
1    errorStrategy: 'strict',
2    // errorStrategy: 'warn',
3    // errorStrategy: 'fallback',
  },
})

1: 'strict' — throw an error if the output doesn’t validate. Use this when correctness is non-negotiable.
2: 'warn' — return the output with a warning. Use this when partial results are better than no results.
3: 'fallback' — try a secondary model or a simpler schema before giving up. Useful for cost-optimized pipelines where you try a cheap model first.

9.11.4 Multi-step structured output with `prepareStep`

Sometimes you need the agent to use tools first and then produce structured output. The prepareStep pattern handles this:

const result = await agent.generate('What is the weather in all G7 capitals?', {
  tools: { weatherTool },
  structuredOutput: {
    schema: z.object({
      cities: z.array(z.object({
        city: z.string(),
        temperature: z.number(),
        condition: z.string(),
      })),
    }),
1    prepareStep: true,
  },
})

1: prepareStep: true — the agent first executes tool calls (fetching weather for each city), then structures the accumulated results into the schema. Without prepareStep, the agent would try to produce structured output immediately without using tools.

9.12 Observability

An agent that can’t be debugged can’t be trusted. Mastra ships three observability signals: tracing, logging, and metrics.

9.12.1 Tracing

Every agent invocation, tool call, workflow step, and memory operation produces a span — a hierarchical record of what happened, how long it took, and what data flowed through.

flowchart TD
    AG["agent.generate()"] --> TC1["Tool call: search"]
    AG --> TC2["Tool call: weather"]
    AG --> LLM1["LLM call #1"]
    TC1 --> API["API request"]
    TC2 --> API2["API request"]
    LLM1 --> LLM2["LLM call #2<br/>(with tool results)"]
    style AG fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style TC1 fill:#dcfce7,stroke:#166534,color:#166534
    style TC2 fill:#dcfce7,stroke:#166534,color:#166534
    style LLM1 fill:#fef3c7,stroke:#92400e,color:#92400e
    style LLM2 fill:#fef3c7,stroke:#92400e,color:#92400e
    style API fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style API2 fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8

Trace hierarchy — a single agent.generate() call

Traces are hierarchical: the top-level agent.generate() span contains child spans for each tool call and LLM invocation. You can see exactly which tool call took 3 seconds, which LLM call consumed 4,000 tokens, and which step failed.

9.12.2 Logging

Structured logs are correlated to traces — every log line carries a traceId and spanId, so you can jump from a log line to its trace and vice versa. This is the difference between “the agent errored” and “the agent errored during the second search tool call in trace abc-123.”

9.12.3 Metrics

Mastra auto-extracts metrics from traces: duration, token usage, estimated cost. These are the same numbers you’d calculate manually from LLM API responses, but Mastra aggregates them across runs and agents.

9.12.4 OpenTelemetry compatibility

All three signals are OpenTelemetry-compatible. You can export traces to Langfuse, Datadog, Jaeger, or any OTel collector. Mastra also supports composite storage — DuckDB for metrics (fast local aggregation) and LibSQL for trace data (durable storage).

import { Mastra } from '@mastra/core'
import {
  Observability,
  SensitiveDataFilter,
} from '@mastra/core/observability'
import { OTLPHttpExporter } from '@mastra/otel-exporter-otlp-http'

const mastra = new Mastra({
  agents: { researchAgent },
1  observability: new Observability({
    configs: {
      default: {
        serviceName: 'my-agent-app',
2        exporters: [
          new OTLPHttpExporter({
            endpoint: 'https://otel.example.com/v1/traces',
          }),
        ],
3        spanOutputProcessors: [new SensitiveDataFilter()],
      },
    },
  }),
})

1: The field is observability, not telemetry, and it takes an Observability instance — not a plain config object. Multiple named configs (e.g. default, dev) can coexist for different environments.
2: exporters is an array of exporter instances. OTLP HTTP is the universal choice; provider-specific exporters (Langfuse, Datadog) plug in the same way.
3: spanOutputProcessors run before export. SensitiveDataFilter strips PII and secrets from span attributes; write your own processor class for custom redaction.

Observability in the Ralph loop

When an autonomous loop runs 200 iterations (Section 10.2), the observability stack is the difference between “it failed somewhere” and “iteration 47, step 3, the search tool returned stale data and the LLM made a wrong decision based on it.” Mastra’s hierarchical tracing gives you that granularity. The economics chapter (Chapter 11) covers how to use these metrics for cost tracking.

9.13 Studio and Deployment

9.13.1 Studio — visual development

Mastra Studio is a local development UI that gives you visual tools for building and debugging agents:

Graph visualization — see your workflows as interactive graphs. Each step shows its input/output schemas, current status, and execution time.
Input forms — generated automatically from Zod schemas. Test your agents and workflows without writing curl commands.
Live status — watch workflow execution in real time. See which step is running, which is suspended, which has completed.
Time-travel debugging — for workflows, step through execution history and inspect the data at each point. This is Mastra’s answer to Burr’s fork-from-state: you can see exactly what state the workflow had at any step and understand why it made the decisions it did.

9.13.2 Deployment targets

Mastra supports multiple deployment models:

Target	How	Best for
Standalone server	`mastra build` produces a Hono HTTP server	VMs, containers, PaaS
Mastra Platform	Hosted observability + studio + server	Teams wanting managed infrastructure
Vercel	`@mastra/deployer-vercel`	Serverless, edge functions
AWS Lambda	`@mastra/deployer-lambda`	Event-driven, pay-per-invocation
Cloudflare Workers	`@mastra/deployer-cloudflare`	Edge-first, global distribution
Web frameworks	Next.js, Astro, Express, SvelteKit, Hono	Embedded in existing apps

The mastra build command compiles your agents, tools, and workflows into a standalone HTTP server powered by Hono. The resulting artifact is a Node.js application you can deploy anywhere — Docker containers, Fly.io, Railway, or bare metal.

For production workflow execution, Mastra integrates with Inngest — a durable execution platform that adds memoization, automatic retries, and step-level recovery to your workflows. This is the TypeScript equivalent of mcp-agent’s Temporal integration from Chapter 8.

// src/mastra/index.ts — the entry point `mastra build` looks for
import { Mastra } from '@mastra/core'

1export const mastra = new Mastra({
  agents: { researchAgent, writerAgent },
  workflows: { pipeline },
  server: {
2    port: 3001,
    cors: { origin: 'https://myapp.com' },
  },
})

1: You export the Mastra instance, you don’t start() it. The class has no runtime start() method — instead, mastra build compiles this entry point into a deployable Hono server, and mastra dev runs it locally. The shape on disk matters: the file must be src/mastra/index.ts (or whatever your mastra.config points at) and the instance must be exported.
2: Server configuration travels with the instance. CORS, middleware, and authentication live here; the deployer wires them into the generated server.

Note

For web framework integration, Mastra provides route handlers that plug into your existing application. In Next.js, you’d export a route handler from app/api/agent/route.ts; in Express, you’d mount Mastra’s router on a path. The agent becomes an endpoint in your existing application rather than a separate service.

9.14 Guardrails: Processors

Production agents need safety boundaries. In Mastra these are all built from the same primitive: processors. Three arrays on the agent — inputProcessors, outputProcessors, errorProcessors — run at different points in the request lifecycle, and each array contains processor instances (not raw functions or rule objects). Mastra ships processor classes for the common needs; you can write your own by implementing the processor interface.

import { Agent } from '@mastra/core/agent'
import {
  PromptInjectionDetector,
  PIIDetector,
  ModerationProcessor,
  TokenLimiter,
  PrefillErrorHandler,
} from '@mastra/core/processors'

const agent = new Agent({
  name: 'safe-agent',
  instructions: 'You are a helpful assistant',
  model: 'anthropic/claude-sonnet-4-6',
1  inputProcessors: [
    new TokenLimiter(4000),
    new PromptInjectionDetector({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      detectionTypes: ['injection', 'jailbreak', 'system-override'],
      threshold: 0.8,
2      strategy: 'rewrite',
    }),
    new PIIDetector({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      detectionTypes: ['email', 'phone', 'credit-card', 'ssn'],
3      strategy: 'redact',
    }),
  ],
4  outputProcessors: [
    new ModerationProcessor({
      model: 'openai/gpt-4o-mini',
      categories: ['hate', 'harassment', 'violence'],
      strategy: 'block',
    }),
  ],
5  errorProcessors: [new PrefillErrorHandler()],
})

1: inputProcessors run before the LLM sees the message. They execute in declaration order. TokenLimiter truncates oversized inputs; the detectors classify the message.
2: Strategy decides what happens on a hit. 'rewrite' lets the processor neutralize the unsafe portion and continue; 'block' aborts with a 'tripwire' result; 'redact' masks matched spans (used here for PII). Each processor documents which strategies it supports.
3: PII strategies are class-specific — PIIDetector also accepts a redactionMethod ('mask', 'hash') and a preserveFormat flag so downstream tools still parse the message.
4: outputProcessors run on the LLM response. Same shape — ModerationProcessor here blocks harmful content before it reaches the caller. Output processors can also stream-edit token-by-token by implementing the processOutputStream hook.
5: errorProcessors activate when the LLM call throws. PrefillErrorHandler retries with a salvaged partial response; you can write custom handlers for provider-specific error shapes.

When a blocking processor fires, the agent result’s status is 'tripwire' — distinct from 'failed', signalling “the agent could have responded, but a safety check prevented it.” 'rewrite' and 'redact' strategies do not tripwire; they modify the message and proceed.

9.14.1 Writing a custom processor

The processor interface is small: a name, plus one or more of processInput, processOutputResult, processOutputStream, processError:

import { Processor, RequestContext } from '@mastra/core/processors'

class AddDisclaimer implements Processor {
  readonly name = 'add-disclaimer'

  async processOutputResult({ messages }: { messages: any[] }) {
    const last = messages[messages.length - 1]
    if (last?.role === 'assistant' && typeof last.content === 'string') {
      last.content += '\n\n_Not financial advice._'
    }
    return { messages }
  }
}

This is also how you handle transformations the shipped processors don’t cover — input normalization, channel-specific formatting, translation, watermarking. Pass an instance into inputProcessors or outputProcessors like any built-in.

One mechanism, three behaviors

The thing the older Mastra docs used to call “guardrails” and the thing they used to call “processors” are now the same primitive. You pick the behavior through the processor’s strategy ('block' = guardrail, 'rewrite'/'redact' = transform) and through which array you put it in (inputProcessors vs outputProcessors). If you find a code sample online with guardrails: { input: [...] } or processors: { pre, post }, it’s stale — both shapes have collapsed into the unified processor model shown here.

9.15 Comparison with Python Frameworks

How does Mastra stack up against the Python frameworks from Chapters 7 and 8? The comparison matters because many teams need to choose between ecosystems, not just within them.

Capability	Mastra (TS)	Agno (Py)	mcp-agent (Py)	Burr + Pydantic (Py)
Language	TypeScript	Python	Python	Python
Agent creation	`new Agent()`	`Agent()`	`Agent()`	`@action.pydantic`
Tool typing	Zod schemas	Python type hints	MCP schemas	`reads`/`writes` decl.
Workflows	Builder pattern (`.then/.parallel/.branch`)	`Workflow` class	Orchestrator patterns	`ApplicationBuilder` graph
MCP client	`MCPClient` (native)	`MCPTools` wrapper	Native substrate	Manual integration
MCP server	`MCPServer` (native)	Not built-in	Not built-in	Not built-in
Memory	4-layer (history, observational, working, semantic)	Session + memory stores	Conversation context	State snapshots
RAG	Built-in pipeline	`Knowledge` primitive	External	External (Hamilton)
Human-in-the-loop	Workflow suspension	Workflow steps	Temporal signals	Manual state check
Structured output	Zod/Valibot/ArkType	Pydantic models	JSON Schema	Pydantic models
Observability	OTel traces + metrics	OTel compatible	Logging	Burr UI + OTel
Fork/replay	Studio time-travel	Not built-in	Temporal replay	`fork_from_sequence_id`
Multi-agent	Supervisor (agent-as-tool)	`Team` (auto-routing)	Router/Parallel/Evaluator	Manual composition
Deployment	Hono server, Vercel, Lambda, Cloudflare, Inngest	FastAPI, Docker	Custom server	Custom server
Visual dev	Studio	Agno Platform	Not built-in	Burr UI
Guardrails	Input/output guardrails, tripwire	Custom middleware	Not built-in	Not built-in

Note

The comparison is not “which is best” — it’s “which fits your stack.” If your backend is TypeScript, Mastra is the natural choice. If your team is Python-primary and needs MCP as the substrate, mcp-agent is the fit. If you need fork-from-state debugging above all else, Burr is unmatched. If you want Python batteries-included, Agno is the pick. The matrix helps you find your row.

9.15.1 Where Mastra leads

MCP server support. Mastra is the only framework in this comparison that can expose your agents as MCP servers, not just consume MCP servers. This makes your Mastra agents composable into other agents’ tool sets — a capability that matters when you’re building a multi-agent system across teams or organizations.
Memory depth. The four-layer memory system (history, observational, working, semantic) is more nuanced than any of the Python frameworks’ built-in memory. Agno has flexible storage backends but doesn’t ship observational memory or semantic recall as primitives.
Edge deployment. TypeScript’s cold-start advantage on Cloudflare Workers, Vercel Edge, and Deno Deploy is real. Python agents on Lambda have 2-5 second cold starts; TypeScript agents on edge runtimes start in milliseconds.

9.15.2 Where Mastra trails

Fork-from-state debugging. Burr’s fork_from_sequence_id lets you replay agent execution from any state snapshot — the production debugging story is stronger than Mastra’s Studio time-travel, which is primarily a development tool.
Canonical agent patterns. mcp-agent ships Anthropic’s recommended patterns (Router, Parallel, Evaluator-Optimizer) as composable primitives. Mastra gives you the building blocks to implement these patterns, but they’re not pre-built.
ML ecosystem integration. For pipelines that include model training, data processing, or scientific computing, Python’s numpy/pandas/scikit-learn ecosystem has no TypeScript equivalent. Mastra agents that need heavy data processing will call Python services or use tools.

9.16 Forward Link: From Mastra to the Ralph Loop

Chapter 10 introduces the Ralph loop — the minimal autonomous coding loop that drives the rest of the book. Mastra’s primitives map directly onto the loop’s requirements:

The loop needs tools. Mastra’s createTool() with Zod schemas defines the tool set — read_file, write_file, run_command, search — with type-safe input validation.
The loop needs memory. Working memory stores the spec, the current task, and accumulated context. Semantic recall finds relevant past decisions across long-running sessions.
The loop needs escape hatches. Workflow suspension implements the “pause for human input” pattern when the loop encounters something it can’t handle autonomously.
The loop needs observability. When iteration 47 of 200 goes wrong, hierarchical tracing tells you exactly which tool call produced the bad result.

If you’re building a Ralph loop in TypeScript, Mastra is the framework that provides all four requirements from a single dependency.

9.17 Key Takeaways

TypeScript is a first-class agent runtime. The edge deployment story, the npm ecosystem, and team composition all make TypeScript the right choice for many agent deployments. Mastra is the framework that makes those deployments practical.
Mastra ships everything together. Agents, tools, workflows, memory, RAG, MCP, observability, and deployment in one npm install. The integration coherence tradeoff is the same as Agno’s — you gain a consistent API, you lose mix-and-match flexibility.
Tools are Zod-typed functions. createTool() with inputSchema and outputSchema gives you validation at the boundary and IDE autocompletion inside execute(). The same Zod schemas power structured output, workflow steps, and MCP tool exposure.
Workflows are deterministic; agents are autonomous. Use workflows when you need guaranteed execution order and auditability. Use agents when you need dynamic routing and tool selection. Use both together when you need structured processes with autonomous steps.
Suspension is a first-class primitive. suspend() and resume() make human-in-the-loop a workflow step, not a hack. Time-based suspension (sleep/sleepUntil) handles scheduled operations.
Memory has four layers. Message history for short-term, observational memory for compressed history, working memory for structured facts, semantic recall for meaning-based retrieval. Configure what you need; disable what you don’t.
MCP integration is bidirectional. MCPClient consumes external MCP servers; MCPServer exposes your agents to external MCP clients. This makes Mastra agents both consumers and producers in the MCP ecosystem.
Observability is built in, not bolted on. Tracing, logging, and metrics are automatic. OpenTelemetry compatibility means you can export to any backend. Sensitive data filtering is a configuration flag, not a custom middleware.
The choice between TypeScript and Python frameworks is about your stack, not about capability. The comparison matrix shows that all four frameworks covered in this book can build production agents. The differentiators are language ecosystem, deployment targets, and which specific features (fork-from-state, canonical patterns, edge deployment, memory depth) matter most to your project.

9.18 Concept Map

flowchart TD
    PF["PocketFlow (Ch 7)<br/>See the machinery"] --> BU["Burr + Pydantic (Ch 8)<br/>Typed FSM + replay"]
    PF --> AG["Agno (Ch 8)<br/>Python batteries"]
    PF --> MA["mcp-agent (Ch 8)<br/>MCP-native patterns"]
    PF --> MS["Mastra (Ch 9)<br/>TypeScript batteries"]
    MS --> TOOLS["Tools<br/>Zod schemas"]
    MS --> WF["Workflows<br/>Builder pattern"]
    MS --> MEM["Memory<br/>4-layer system"]
    MS --> MCP["MCP<br/>Client + Server"]
    MS --> OBS["Observability<br/>OTel traces"]
    MS --> STU["Studio<br/>Visual dev"]
    TOOLS --> RL["Ralph Loop (Ch 10)<br/>Autonomous coding"]
    WF --> RL
    MEM --> RL
    MCP --> RL
    OBS --> EC["Economics (Ch 11)<br/>Cost tracking"]
    MS --> GT["Gas Town (Ch 12)<br/>Multi-agent fleet"]
    BU --> RL
    AG --> GT
    MA --> RL
    style PF fill:#fef3c7,stroke:#92400e,color:#92400e
    style BU fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style AG fill:#dcfce7,stroke:#166534,color:#166534
    style MA fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style MS fill:#dbeafe,stroke:#1e40af,color:#1e40af
    style RL fill:#fce7f3,stroke:#9d174d,color:#9d174d
    style EC fill:#fef3c7,stroke:#92400e,color:#92400e
    style GT fill:#dcfce7,stroke:#166534,color:#166534
    style TOOLS fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style WF fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style MEM fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style MCP fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style OBS fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8
    style STU fill:#f3e8ff,stroke:#6b21a8,color:#6b21a8

How Mastra’s concepts connect to the book’s progression

9.1 Why TypeScript for Agents?

9.2 The Mastra Model: Everything Ships Together

9.3 Agents

9.3.1 Generate vs. Stream

9.4 Tools

9.4.1 Attaching tools to agents

9.4.2 Advanced tool features

9.5 MCP Integration

9.5.1 MCPClient — consuming MCP servers

9.5.2 Static vs. dynamic tool loading

9.5.3 Tool approval — human-in-the-loop for tool calls

9.5.4 MCPServer — exposing Mastra as an MCP server

9.6 Workflows

9.6.1 Steps and the builder pattern

9.6.2 Control flow

9.6.3 Shared state

9.6.4 Execution and results

9.7 Suspension and Human-in-the-Loop

9.7.1 The suspend/resume pattern

9.7.2 Resuming a suspended workflow

9.7.3 Time-based suspension

9.8 Memory

9.8.1 The four memory layers

9.8.2 Configuring memory

9.8.3 Multi-user threads

9.8.4 Multi-agent memory scoping

9.9 RAG

9.9.1 The RAG pipeline

9.10 Multi-Agent Systems

9.10.1 Supervisor pattern — agent-as-tool

9.11 Structured Output

9.11.1 Basic structured output

9.11.2 Streaming structured output

9.11.3 Error strategies

9.11.4 Multi-step structured output with prepareStep

9.12 Observability

9.12.1 Tracing

9.12.2 Logging

9.12.3 Metrics

9.12.4 OpenTelemetry compatibility

9.13 Studio and Deployment

9.13.1 Studio — visual development

9.13.2 Deployment targets

9.14 Guardrails: Processors

9.14.1 Writing a custom processor

9.15 Comparison with Python Frameworks

9.15.1 Where Mastra leads

9.15.2 Where Mastra trails

9.16 Forward Link: From Mastra to the Ralph Loop

9.17 Key Takeaways

9.18 Concept Map

9.11.4 Multi-step structured output with `prepareStep`