Testing & Benchmarks | Smart Agents

Check that your agents behave as expected. Capture conversations as test cases, replay them as test runs, and compare models or configurations with benchmarks.

Manual	What it covers
Test Cases	Define expected agent behaviour, including saving a conversation as a test case.
Test Runs	Replay test cases and review the results step by step.
Benchmarks	Benchmark runs and their per-item results.

Related: test cases can be created directly from the chat window.