Smart Agent Test Runs
A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.
Merged manual — this file documents three related pages in order: the test-run list, the test-run card, and the per-step results subpage embedded in the card.
In plain terms
A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.
The part worth understanding: the grading is done by an AI model acting as a judge (often called LLM as judge). It compares the agent's actual response to the expected one and assigns a verdict — Pass / Partial / Fail — plus a 0–100 score and a short written rationale, per step and overall.
A test run is essentially a pass/fail report you read after the fact. Unlike a code test, an AI evaluator judges whether the answer is "close enough" rather than requiring an exact match — which is why you also see a score and a rationale, not just pass/fail. Each run also totals the credits, latency, and tool calls, so you can see what the test cost.
Tip: Re-evaluate re-grades a finished run without re-asking the agent — useful if you changed scoring settings. Refresh updates progress while a run is still going.
Smart Agent Test Runs (list)
Page type: List Source table:
Smart Agent Test RunObject:Page 72778352 "SA Test Runs QUA"—Page.72778352.SATestRuns.al
This page lists every execution of every test case. You can open it standalone to review all historical runs, or filtered to a single test case from the View Runs action on the test case list or card. Each row shows the aggregate outcome so you can compare runs at a glance.
How to open it
- Tell Me (Alt+Q) → search "Smart Agent Test Runs".
- View Runs action on the Smart Agent Test Cases list or the Smart Agent Test Case card — opens pre-filtered to that test case.
Fields
List columns
| Field | Type | Description |
|---|---|---|
| Run No. | Integer | The unique number identifying this test run. Read-only — assigned automatically. |
| Test No. | Integer | The test case that was executed. |
| Test Name | Text[100] | Snapshot of the test name at run time. |
| Started At | DateTime | When this run started. |
| Completed At | DateTime | When this run reached a terminal state. |
| Status | Enum | Lifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled. |
| Verdict | Enum | Aggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. |
| Score | Decimal | Aggregate score (0 - 100). Blank when zero. |
| Total Latency (ms) | Integer | Sum of per-step wall-clock latencies. Blank when zero. |
| Total Tool Calls | Integer | Total tool calls observed across every step. Blank when zero. |
| Total Credits | Decimal | Sum of credits charged across every step. Blank when zero. |
Actions
| Action | What it does |
|---|---|
| View Steps | Show every step of this run. Opens the Smart Agent Test Run card for the selected run. |
| Re-evaluate | Run the AI evaluator again on this run's captured transcripts. Does not replay the prompts. |
Related pages
Smart Agent Test Run (card)
Page type: Card Source table:
Smart Agent Test RunObject:Page 72778353 "SA Test Run Card QUA"—Page.72778353.SATestRunCard.al
The detail view for a single test execution. It shows the run's status and progress while it is still running, the aggregate verdict and score once evaluation is complete, and a per-step breakdown in the embedded Steps subpage. The entire card is read-only.
How to open it
- Select a row in the Smart Agent Test Runs list and press Enter (or choose View Steps).
- Opened automatically when you choose Run Now on a test case — the card for the newly created run appears immediately so you can monitor progress.
Fields
General
| Field | Type | Description |
|---|---|---|
| Run No. | Integer | The unique number identifying this test run. Read-only. |
| Test No. | Integer | The test case that was executed. Read-only. |
| Test Name | Text[100] | Snapshot of the test name at run time. Read-only. |
| Agent No. | Integer | The agent under test. Read-only. |
| Started At | DateTime | When this run started. Read-only. |
| Completed At | DateTime | When this run reached a terminal state. Read-only. |
| Status | Enum | Lifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled. Read-only. |
| Current Step No. | Integer | The step that is currently executing in the background session. Read-only. |
| Progress | Text[250] | Live status text written by the background runner. Read-only. |
| Run Session ID | Guid | The chat session created to replay the test prompts. Read-only. |
| Started By | Code[50] | The user who started this run. Read-only. |
Verdict
| Field | Type | Description |
|---|---|---|
| Verdict | Enum | Aggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only. |
| Score | Decimal | Aggregate score (0 - 100) produced by the AI evaluator. Read-only. |
| Verdict Summary | Text[2048] | Short rationale extracted from the evaluator verdict. Read-only. |
| Verdict JSON | Text | Full evaluator verdict produced by the AI evaluator. Read-only. |
| Error Text | Text[2048] | Captured error message when Status = Failed. Read-only. |
Totals
| Field | Type | Description |
|---|---|---|
| Step Count | Integer | Number of steps in this run. Read-only. |
| Total Latency (ms) | Integer | Sum of per-step wall-clock latencies. Read-only. |
| Total Tool Calls | Integer | Total tool calls observed across every step. Read-only. |
| Total Credits | Decimal | Sum of credits charged across every step. Read-only. |
Steps
The Steps subpage is embedded directly on the card. See Test Run Steps below.
Actions
| Action | What it does |
|---|---|
| Refresh | Refresh the page to show the latest progress written by the background runner. |
| Re-evaluate | Run the AI evaluator again on this run's captured transcripts. Does not replay the prompts. |
Related pages
Test Run Steps (subpage)
Page type: ListPart (embedded in the Smart Agent Test Run card) Source table:
Smart Agent Test Run StepObject:Page 72778354 "SA Test Run Steps Sub QUA"—Page.72778354.SATestRunStepsSub.al
Shows the per-step breakdown for a single test run. Each row corresponds to one prompt replayed during the run and captures the actual agent response, the step-level verdict and score, latency, and any error that occurred. Compare this against the expected outcomes defined in the Test Case Steps subpage to understand where a run diverged from its baseline. All rows are read-only.
How to open it
This subpage is embedded in the Steps section of the Smart Agent Test Run card. It cannot be opened standalone.
Fields
| Field | Type | Description |
|---|---|---|
| Step No. | Integer | Order of this step. Read-only. |
| Prompt Preview | Text[250] | Short preview of the prompt sent. Read-only. |
| Actual Response Preview | Text[250] | Short preview of the actual agent response. Read-only. |
| Step Verdict | Enum | Per-step verdict produced by the AI evaluator. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only. |
| Step Score | Decimal | Per-step score (0 - 100). Blank when zero. Read-only. |
| Actual Tool Call Count | Integer | Tool calls executed for this step. Read-only. |
| Latency (ms) | Integer | Wall-clock latency for this step. Read-only. |
| Credits | Decimal | Credits charged for this step. Blank when zero. Read-only. |
| Step Notes | Text[2048] | Per-step rationale produced by the AI evaluator. Read-only. |
| Error Text | Text[2048] | Error captured if this step failed during execution. Read-only. |
Actions
| Action | What it does |
|---|---|
| View Actual Response | Show the full actual response captured for this step. |
| View Actual Tool Calls | Show the snapshot of tool calls captured for this step. |
Related pages
Notes
- Use Refresh on the run card while a run is in progress to see updated Status, Progress, and Current Step No. values. The page does not refresh automatically.
- Re-evaluate re-runs the AI evaluator using the responses already captured — it does not send new prompts to the agent. This is useful if evaluator settings or scoring weights have changed since the run completed.
- A run with Status = Failed will show the error message in the Error Text field on the card. Individual steps that failed during execution show their own error in the Error Text column of the Steps subpage.
- The Verdict colour-coding follows the same scheme throughout: Pass = green, Partial = amber, Fail or Error = red.