The Plan-Then-Execute Model¶

This is the core architectural insight of aitester-bdd: keywords don't execute — they build a plan. Execution happens once, all at once, in a controlled walk.

Why deferred execution?¶

In a naive approach, each Robot Framework keyword would immediately drive the browser:

# NAIVE (not how aitester-bdd works)
When I click ".login-btn"     ← immediately clicks
Then selector exists ".dashboard"  ← immediately checks

This seems simpler but causes problems:

No rule ordering — you can't express "rule B depends on rule A passing"
No retry-redo — if a guard fails, you can't replay the body and re-check
No aspects — there's no transition point to hook timing, logging, or diagnosis into
No scope inheritance — child rules can't inherit a parent's DOM scope
No topo-sort — execution order is file order, not dependency order

How it actually works¶

Phase A: Plan (keyword execution)¶

Every keyword appends to an in-memory Verification model:

# What happens when RF calls "When I click locator '.btn'"
def when_click_locator(self, css):
    self._current_rule().items.append(Action("click", target=css))

Nothing touches the browser. The keyword just records what to do later.

After all keywords run, the model looks like:

Verification "login flow"
  └── Scenario "happy path" (entry_url="http://localhost:5173")
      ├── Rule "login"
      │   ├── [Guard] selector_exists ".login-form"
      │   ├── [Action] type "admin" into "#username"
      │   ├── [Action] type "secret" into "#password"
      │   ├── [Action] click "#submit"
      │   └── [Observation] url_contains "/dashboard"
      └── Rule "see_widgets" (parents: ["login"])
          ├── [Guard] url_contains "/dashboard"
          └── [Observation] count_at_least ".widget" 3

Phase B: Execute (walker)¶

Then I finalize verification calls walk_verification(verification):

Build WalkContext — resolve headed mode, step delay, timeouts from env
Wire aspects — trajectory recording, instrumentation, diagnosis, step delay
Open browser — single session for all scenarios
For each scenario:
- Navigate to entry_url
- Topo-sort rules by parent dependencies
- Walk each rule in order:
  - Check guards (fast timeout, no waiting)
  - If guards pass → execute body (actions + observations)
  - If guards fail + retry configured → replay body, re-check
  - Fire aspects at every transition
Collect Verdict — pass/fail per rule with structured evidence

The split point¶

Every item in a rule's items list gets split at the first Action:

[StateCheck, StateCheck, Action, StateCheck, Action, StateCheck]
 ├── guards ──────────┤├── body ─────────────────────────────┤

Guards (before first Action): checked with a short timeout (200ms). They ask "is the world already in the right state?" If not, the rule is skipped (or retried).
Body (from first Action onward): Actions execute against the browser. Inline StateChecks after actions are observations — they wait with a long timeout (30s) and fail the rule if they don't pass.

This position-determined semantics means the same StateCheck type behaves differently based on where the author placed it. No explicit "assert" vs "wait" keywords needed.

The Verification model¶

@dataclass
class Verification:
    name: str
    scenarios: list[Scenario]
    interrupts: InterruptConfig      # global dismiss selectors
    state_setup: StateSetup          # suite-level auth/consent

@dataclass
class Scenario:
    name: str
    entry_url: str
    rules: dict[str, Rule]           # ordered dict

@dataclass
class Rule:
    name: str
    items: list[Action | StateCheck | Emit]
    parents: list[str]               # dependency names
    retry_max: int                   # guard retry count
    rule_type: str                   # "pinned" or "explore"
    # ... options, scope, interrupt overrides

Why this matters for testing¶

The deferred model gives aitester-bdd properties that immediate-execution frameworks lack:

Property	How
Dependency ordering	Topo-sort places parents before children
Guard-based skipping	If parent failed, child is auto-skipped
Retry-with-redo	Replay body → re-check guards (handles AJAX timing)
Scope inheritance	Child rules inherit parent's CSS scope prefix
Cross-cutting aspects	Every transition fires hooks without touching rule logic
Structured failure evidence	Walker knows exactly which step failed, with expected/observed
AI diagnosis	Full trajectory is available for the LLM to reason about