Running Tests¶

Basic run¶

aitester run suite.robot

This: 1. Reads ${ENGINE} from the suite to pick the browser backend 2. Invokes Robot Framework with the suite 3. The keyword library builds the rule DAG, walks it, and reports pass/fail

Headed mode (watch it run)¶

aitester run suite.robot --headed

Opens a visible browser window. Combine with step delay for slow-motion:

aitester run suite.robot --headed --step-delay 500

Pauses 500ms after each action so you can follow along visually.

Output files¶

After a run, you'll find in the output directory:

File	Content
`output.xml`	Robot Framework's standard output
`log.html`	Detailed HTML log (clickable keywords)
`report.html`	Summary report
`walk_log.jsonl`	Every MDP transition (action, state check, timing)
`failures.jsonl`	Failure context + AI diagnosis per failed rule
`emit.jsonl`	Explicit emit captures (if any)
`fail_*.png`	Screenshots captured on failure

Choosing a backend¶

# Default: playwright (consistent with explore, reliable get_text, native waits)
aitester run suite.robot

# Agent-browser (zero-install, same driver as authoring)
AITESTER_BROWSER=agent-browser aitester run suite.robot

# Nodriver (bot-detection-resistant)
AITESTER_BROWSER=nodriver aitester run suite.robot

First run with playwright requires aitester init-browser (or rfbrowser init) to download browser binaries. The Browser library is auto-imported — suites don't need Library Browser.

Or declare in the suite itself:

*** Variables ***
${ENGINE}    playwright

CI integration¶

# GitHub Actions example
- name: Run E2E tests
  run: |
    pip install aitester-bdd
    npm i -g agent-browser
    aitester run tests/smoke.robot --output-dir results/
- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: test-results
    path: results/

No LLM configuration needed for running authored suites. Zero tokens consumed.

Debugging failures¶

1. Check the verdict output¶

Robot Framework's console output shows which rules passed/failed:

Login Flow :: Verify login and dashboard
    Rule login: PASS (1.2s)
    Rule dashboard_widgets: FAIL (30.1s)
        observation_or_assertion: post-action state check failed
        expected: count >= 5
        observed: 2

2. Read the AI diagnosis¶

If the diagnose aspect is enabled (default), failures.jsonl contains an LLM-written explanation:

{
  "rule": "dashboard_widgets",
  "ai_diagnosis": "The page loaded but only 2 widgets rendered. The API response for /api/widgets returned a 500 error (visible in the network tab), suggesting a backend issue rather than a test problem."
}

3. Check the trajectory¶

walk_log.jsonl has the full MDP trace — every action, state check, and timing:

{"kind": "before_action", "rule": "dashboard_widgets", "action": "click", "target": ".refresh"}
{"kind": "after_action", "rule": "dashboard_widgets", "action": "click", "dt_ms": 87, "raised": false}
{"kind": "state_check", "rule": "dashboard_widgets", "check": "count_at_least", "ok": false, "expected": "count >= 5", "observed": "2", "position": "observation"}

4. Re-run headed¶

aitester run suite.robot --headed --step-delay 1000

Watch exactly what happens. The step delay gives you time to see each action's effect.

5. Disable diagnosis (faster iteration)¶

AITESTER_DISABLE_ASPECTS=diagnose aitester run suite.robot

Skips the LLM call on failure — useful when iterating quickly on a known issue.

Timeouts¶

Scope	Default	Override
Global run	300s	`AITESTER_RUN_TIMEOUT=600`
Observation (after action)	30s	`set rule timeout 60000`
Guard (before action)	10s	`set rule timeout 15000` (guards inherit)

Running with Robot Framework directly¶

Since aitester-bdd is a standard RF library, you can also run suites directly:

robot --outputdir results/ suite.robot

This works but skips the aitester run backend-selection logic. Set AITESTER_BROWSER manually if not using the default.