Skip to content

Smart Agent Test Runs

A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.

Merged manual — this file documents three related pages in order: the test-run list, the test-run card, and the per-step results subpage embedded in the card.

In plain terms

A test run is one execution of a test case: the app replays the saved prompts to the agent, captures the actual answers, and then grades them.

The part worth understanding: the grading is done by an AI model acting as a judge (often called LLM as judge). It compares the agent's actual response to the expected one and assigns a verdict — Pass / Partial / Fail — plus a 0–100 score and a short written rationale, per step and overall.

A test run is essentially a pass/fail report you read after the fact. Unlike a code test, an AI evaluator judges whether the answer is "close enough" rather than requiring an exact match — which is why you also see a score and a rationale, not just pass/fail. Each run also totals the credits, latency, and tool calls, so you can see what the test cost.

Tip: Re-evaluate re-grades a finished run without re-asking the agent — useful if you changed scoring settings. Refresh updates progress while a run is still going.


Smart Agent Test Runs (list)

Page type: List Source table: Smart Agent Test Run Object: Page 72778352 "SA Test Runs QUA"Page.72778352.SATestRuns.al

This page lists every execution of every test case. You can open it standalone to review all historical runs, or filtered to a single test case from the View Runs action on the test case list or card. Each row shows the aggregate outcome so you can compare runs at a glance.

How to open it

Fields

List columns

FieldTypeDescription
Run No.IntegerThe unique number identifying this test run. Read-only — assigned automatically.
Test No.IntegerThe test case that was executed.
Test NameText[100]Snapshot of the test name at run time.
Started AtDateTimeWhen this run started.
Completed AtDateTimeWhen this run reached a terminal state.
StatusEnumLifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled.
VerdictEnumAggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red.
ScoreDecimalAggregate score (0 - 100). Blank when zero.
Total Latency (ms)IntegerSum of per-step wall-clock latencies. Blank when zero.
Total Tool CallsIntegerTotal tool calls observed across every step. Blank when zero.
Total CreditsDecimalSum of credits charged across every step. Blank when zero.

Actions

ActionWhat it does
View StepsShow every step of this run. Opens the Smart Agent Test Run card for the selected run.
Re-evaluateRun the AI evaluator again on this run's captured transcripts. Does not replay the prompts.

Smart Agent Test Run (card)

Page type: Card Source table: Smart Agent Test Run Object: Page 72778353 "SA Test Run Card QUA"Page.72778353.SATestRunCard.al

The detail view for a single test execution. It shows the run's status and progress while it is still running, the aggregate verdict and score once evaluation is complete, and a per-step breakdown in the embedded Steps subpage. The entire card is read-only.

How to open it

  • Select a row in the Smart Agent Test Runs list and press Enter (or choose View Steps).
  • Opened automatically when you choose Run Now on a test case — the card for the newly created run appears immediately so you can monitor progress.

Fields

General

FieldTypeDescription
Run No.IntegerThe unique number identifying this test run. Read-only.
Test No.IntegerThe test case that was executed. Read-only.
Test NameText[100]Snapshot of the test name at run time. Read-only.
Agent No.IntegerThe agent under test. Read-only.
Started AtDateTimeWhen this run started. Read-only.
Completed AtDateTimeWhen this run reached a terminal state. Read-only.
StatusEnumLifecycle status of this run. Values: Not Started, Running, Evaluating, Completed, Failed, Cancelled. Read-only.
Current Step No.IntegerThe step that is currently executing in the background session. Read-only.
ProgressText[250]Live status text written by the background runner. Read-only.
Run Session IDGuidThe chat session created to replay the test prompts. Read-only.
Started ByCode[50]The user who started this run. Read-only.

Verdict

FieldTypeDescription
VerdictEnumAggregate AI verdict for this run. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only.
ScoreDecimalAggregate score (0 - 100) produced by the AI evaluator. Read-only.
Verdict SummaryText[2048]Short rationale extracted from the evaluator verdict. Read-only.
Verdict JSONTextFull evaluator verdict produced by the AI evaluator. Read-only.
Error TextText[2048]Captured error message when Status = Failed. Read-only.

Totals

FieldTypeDescription
Step CountIntegerNumber of steps in this run. Read-only.
Total Latency (ms)IntegerSum of per-step wall-clock latencies. Read-only.
Total Tool CallsIntegerTotal tool calls observed across every step. Read-only.
Total CreditsDecimalSum of credits charged across every step. Read-only.

Steps

The Steps subpage is embedded directly on the card. See Test Run Steps below.

Actions

ActionWhat it does
RefreshRefresh the page to show the latest progress written by the background runner.
Re-evaluateRun the AI evaluator again on this run's captured transcripts. Does not replay the prompts.

Test Run Steps (subpage)

Page type: ListPart (embedded in the Smart Agent Test Run card) Source table: Smart Agent Test Run Step Object: Page 72778354 "SA Test Run Steps Sub QUA"Page.72778354.SATestRunStepsSub.al

Shows the per-step breakdown for a single test run. Each row corresponds to one prompt replayed during the run and captures the actual agent response, the step-level verdict and score, latency, and any error that occurred. Compare this against the expected outcomes defined in the Test Case Steps subpage to understand where a run diverged from its baseline. All rows are read-only.

How to open it

This subpage is embedded in the Steps section of the Smart Agent Test Run card. It cannot be opened standalone.

Fields

FieldTypeDescription
Step No.IntegerOrder of this step. Read-only.
Prompt PreviewText[250]Short preview of the prompt sent. Read-only.
Actual Response PreviewText[250]Short preview of the actual agent response. Read-only.
Step VerdictEnumPer-step verdict produced by the AI evaluator. Values: Untested, Pass, Partial, Fail, Error. Colour-coded: Pass = green, Partial = amber, Fail / Error = red. Read-only.
Step ScoreDecimalPer-step score (0 - 100). Blank when zero. Read-only.
Actual Tool Call CountIntegerTool calls executed for this step. Read-only.
Latency (ms)IntegerWall-clock latency for this step. Read-only.
CreditsDecimalCredits charged for this step. Blank when zero. Read-only.
Step NotesText[2048]Per-step rationale produced by the AI evaluator. Read-only.
Error TextText[2048]Error captured if this step failed during execution. Read-only.

Actions

ActionWhat it does
View Actual ResponseShow the full actual response captured for this step.
View Actual Tool CallsShow the snapshot of tool calls captured for this step.

Notes

  • Use Refresh on the run card while a run is in progress to see updated Status, Progress, and Current Step No. values. The page does not refresh automatically.
  • Re-evaluate re-runs the AI evaluator using the responses already captured — it does not send new prompts to the agent. This is useful if evaluator settings or scoring weights have changed since the run completed.
  • A run with Status = Failed will show the error message in the Error Text field on the card. Individual steps that failed during execution show their own error in the Error Text column of the Steps subpage.
  • The Verdict colour-coding follows the same scheme throughout: Pass = green, Partial = amber, Fail or Error = red.