The Walker (MDP Engine)¶
The walker is the heart of aitester-bdd's runtime. It interprets the rule DAG as a Markov Decision Process: each rule is a sequence of (state, action, observation) tuples executed against a live browser.
Entry point¶
def walk_verification(verification, ctx=None):
if ctx is None:
ctx = WalkContext.from_env()
# ... build registry, open browser, walk scenarios
Called from Then I finalize verification. Everything before this keyword was plan-building.
Walk algorithm¶
for each scenario:
navigate to entry_url
order = topo_sort(scenario.rules) # Kahn's algorithm
already_passed = set()
for rule_name in order:
rule = rules[rule_name]
# 1. Parent gating
if any parent not in already_passed:
result = FAIL("parent_failed")
continue
# 2. Guard check
guards, body = split_at_first_action(rule.items)
ok = check_guards(guards) # short timeout, no waiting
# 3. Retry-redo (if configured)
if not ok and rule.retry_max > 0:
for attempt in range(retry_max):
execute_body(body) # replay actions
ok = check_guards(guards) # re-check
if ok: break
# 4. Body execution
if ok:
result = execute_body(body) # actions + observations
# 5. Record result
if result.passed:
already_passed.add(rule_name)
verdict.results.append(result)
Topo-sort¶
Rules declare parents: And I declare parents "login". The walker sorts them parents-before-children using Kahn's algorithm. If a parent fails, all its children are auto-skipped with failure_step_kind="parent_failed".
Cycles raise ValueError at sort time (not at execution time).
Guard semantics¶
Guards are StateChecks positioned before the first Action in a rule. They answer: "is the world already in the expected state?"
- Timeout: 200ms (configurable via
set rule timeout) - On failure: rule is skipped (not failed), unless
guard_policy="abort" - Retry-redo: if
set retry N delay Mis declared, the walker replays the body N times between guard re-checks
This handles the real-world pattern where AJAX updates haven't landed yet — replay the actions that trigger the update, then re-check.
Body execution¶
The body is everything from the first Action onward. For each item:
Action items:
- Dismiss interrupts (cookie banners, modals)
- Fire
before_actionaspects - Execute the action against the browser
- If it raises → dismiss interrupts + retry once
- Fire
after_actionaspects - Honor
await=<selector>option (wait for element before continuing)
StateCheck items (observations):
- Wait with full timeout (30s default, or rule's
timeout_ms) - If it passes → fire
after_state_check, continue - If it fails → fail the rule with structured evidence
Emit items:
- Capture page state into emit.jsonl
- Never fail the rule (observation only)
Interrupt dismissal¶
Ported from WISE. Before every action, the walker clicks any visible elements matching the verification's dismiss_selectors. This handles:
- Cookie consent banners
- Newsletter popups
- Chat widgets
- Any overlay that would block the click target
Per-rule scoping:
interrupt_paused— suppress all dismissals for this ruleinterrupt_override— replace the global list with a custom one
Scope inheritance (TIER 2.5)¶
A rule can declare set child scope ".container". All its children automatically prefix their selectors with .container >>. This enables:
I define rule "sidebar"
...set child scope ".sidebar-panel"
I define rule "sidebar_link"
And I declare parents "sidebar"
# All selectors here are automatically scoped under .sidebar-panel
Then selector exists "a.nav-link" # resolves to .sidebar-panel >> a.nav-link
Per-rule timeout¶
set rule timeout 5000 gives the rule a 5-second deadline. Both guards and observations inherit this timeout. The global run timeout (default 300s, via AITESTER_RUN_TIMEOUT) caps the entire verification.
RuleResult¶
Every rule produces a RuleResult:
@dataclass
class RuleResult:
rule_name: str
scenario_name: str
passed: bool
failure_step_kind: str # "guard", "action", "observation_or_assertion", "parent_failed", "run_timeout"
failure_step_repr: str # human-readable step that failed
failure_message: str
expected: str # what we wanted
observed: str # what we got
screenshot: str | None # path to failure screenshot
ai_diagnosis: str # LLM explanation (if diagnose aspect is wired)
duration_ms: float
The Verdict aggregates all RuleResults and formats a human-readable failure report.