Smart Agent Test Runs

A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.

Merged manual — this file documents three related pages in order: the test-run list, the test-run card, and the per-step results subpage embedded in the card.

In plain terms

A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.

The part worth understanding: the grading is done by an AI model acting as a judge (often called LLM as judge). It compares the agent's actual response to the expected one and assigns a verdict — Pass / Partial / Fail — plus a 0–100 score and a short written rationale, per step and overall.

A test run is essentially a pass/fail report you read after the fact. Unlike a code test, an AI evaluator judges whether the answer is "close enough" rather than requiring an exact match — which is why you also see a score and a rationale, not just pass/fail. Each run also totals the credits, latency, and tool calls, so you can see what the test cost.

Tip: Re-evaluate re-grades a finished run without re-asking the agent — useful if you changed scoring settings. Refresh updates progress while a run is still going.

Smart Agent Test Runs (list)

Page type: List Source table: Smart Agent Test Run Object: Page 72778352 "SA Test Runs QUA" — Page.72778352.SATestRuns.al

This page lists every execution of every test case. You can open it standalone to review all historical runs, or filtered to a single test case from the View Runs action on the test case list or card. Each row shows the aggregate outcome so you can compare runs at a glance.

How to open it

Tell Me (Alt+Q) → search "Smart Agent Test Runs".
View Runs action on the Smart Agent Test Cases list or the Smart Agent Test Case card — opens pre-filtered to that test case.

Fields

List columns

Field	Type	Description
Run No.	Integer	The unique number identifying this test run. Read-only — assigned automatically.
Test No.	Integer	The test case that was executed.
Test Name	Text[100]	Snapshot of the test name at run time.
Started At	DateTime	When this run started.
Completed At	DateTime	When this run reached a terminal state.
Status	Enum	Lifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled.
Verdict	Enum	Aggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red.
Score	Decimal	Aggregate score (0 - 100). Blank when zero.
Total Latency (ms)	Integer	Sum of per-step wall-clock latencies. Blank when zero.
Total Tool Calls	Integer	Total tool calls observed across every step. Blank when zero.
Total Credits	Decimal	Sum of credits charged across every step. Blank when zero.

Actions

Action	What it does
View Steps	Show every step of this run. Opens the Smart Agent Test Run card for the selected run.
Re-evaluate	Run the AI evaluator again on this run's captured transcripts. Does not replay the prompts.

Smart Agent Test Run (card)

Page type: Card Source table: Smart Agent Test Run Object: Page 72778353 "SA Test Run Card QUA" — Page.72778353.SATestRunCard.al

The detail view for a single test execution. It shows the run's status and progress while it is still running, the aggregate verdict and score once evaluation is complete, and a per-step breakdown in the embedded Steps subpage. The entire card is read-only.

How to open it

Select a row in the Smart Agent Test Runs list and press Enter (or choose View Steps).
Opened automatically when you choose Run Now on a test case — the card for the newly created run appears immediately so you can monitor progress.

Fields

General

Field	Type	Description
Run No.	Integer	The unique number identifying this test run. Read-only.
Test No.	Integer	The test case that was executed. Read-only.
Test Name	Text[100]	Snapshot of the test name at run time. Read-only.
Agent No.	Integer	The agent under test. Read-only.
Started At	DateTime	When this run started. Read-only.
Completed At	DateTime	When this run reached a terminal state. Read-only.
Status	Enum	Lifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled. Read-only.
Current Step No.	Integer	The step that is currently executing in the background session. Read-only.
Progress	Text[250]	Live status text written by the background runner. Read-only.
Run Session ID	Guid	The chat session created to replay the test prompts. Read-only.
Started By	Code[50]	The user who started this run. Read-only.

Verdict

Field	Type	Description
Verdict	Enum	Aggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only.
Score	Decimal	Aggregate score (0 - 100) produced by the AI evaluator. Read-only.
Verdict Summary	Text[2048]	Short rationale extracted from the evaluator verdict. Read-only.
Verdict JSON	Text	Full evaluator verdict produced by the AI evaluator. Read-only.
Error Text	Text[2048]	Captured error message when Status = Failed. Read-only.

Totals

Field	Type	Description
Step Count	Integer	Number of steps in this run. Read-only.
Total Latency (ms)	Integer	Sum of per-step wall-clock latencies. Read-only.
Total Tool Calls	Integer	Total tool calls observed across every step. Read-only.
Total Credits	Decimal	Sum of credits charged across every step. Read-only.

Steps

The Steps subpage is embedded directly on the card. See Test Run Steps below.

Actions

Action	What it does
Refresh	Refresh the page to show the latest progress written by the background runner.
Re-evaluate	Run the AI evaluator again on this run's captured transcripts. Does not replay the prompts.

Smart Agent Test Runs (list)
Test Run Steps (subpage)
Smart Agent Test Cases

Test Run Steps (subpage)

Page type: ListPart (embedded in the Smart Agent Test Run card) Source table: Smart Agent Test Run Step Object: Page 72778354 "SA Test Run Steps Sub QUA" — Page.72778354.SATestRunStepsSub.al

Shows the per-step breakdown for a single test run. Each row corresponds to one prompt replayed during the run and captures the actual agent response, the step-level verdict and score, latency, and any error that occurred. Compare this against the expected outcomes defined in the Test Case Steps subpage to understand where a run diverged from its baseline. All rows are read-only.

How to open it

This subpage is embedded in the Steps section of the Smart Agent Test Run card. It cannot be opened standalone.

Fields

Field	Type	Description
Step No.	Integer	Order of this step. Read-only.
Prompt Preview	Text[250]	Short preview of the prompt sent. Read-only.
Actual Response Preview	Text[250]	Short preview of the actual agent response. Read-only.
Step Verdict	Enum	Per-step verdict produced by the AI evaluator. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only.
Step Score	Decimal	Per-step score (0 - 100). Blank when zero. Read-only.
Actual Tool Call Count	Integer	Tool calls executed for this step. Read-only.
Latency (ms)	Integer	Wall-clock latency for this step. Read-only.
Credits	Decimal	Credits charged for this step. Blank when zero. Read-only.
Step Notes	Text[2048]	Per-step rationale produced by the AI evaluator. Read-only.
Error Text	Text[2048]	Error captured if this step failed during execution. Read-only.

Actions

Action	What it does
View Actual Response	Show the full actual response captured for this step.
View Actual Tool Calls	Show the snapshot of tool calls captured for this step.

Smart Agent Test Run (card)

Notes

Use Refresh on the run card while a run is in progress to see updated Status, Progress, and Current Step No. values. The page does not refresh automatically.
Re-evaluate re-runs the AI evaluator using the responses already captured — it does not send new prompts to the agent. This is useful if evaluator settings or scoring weights have changed since the run completed.
A run with Status = Failed will show the error message in the Error Text field on the card. Individual steps that failed during execution show their own error in the Error Text column of the Steps subpage.
The Verdict colour-coding follows the same scheme throughout: Pass = green, Partial = amber, Fail or Error = red.

On this page