Prerequisites
Evaluation workflow
A typical CLI evaluation workflow follows four steps:1. List available evaluations and datasets
2. Inspect an evaluation
3. Run the evaluation
Limit prompts per dataset
Set a custom timeout
Enable debug logging
4. View past results
View full evaluation history
Command reference
| Command | Description |
|---|---|
kya doctor | Validate DSN, test API connectivity |
kya target | Show product configuration details |
kya list (or kya ls) | List evaluations and datasets |
kya describe <id> | Show evaluation details, linked datasets, recent runs |
kya evaluate <id> | Run an evaluation with real-time progress |
kya result <id> | Show run history for an evaluation |
kya history | Show all runs across all evaluations |
Flags reference
| Flag | Commands | Description |
|---|---|---|
--max-prompts <n> | evaluate | Max prompts per dataset |
--timeout <s> | evaluate | Max wait time in seconds (default: 600) |
--debug | evaluate | Enable debug logging |
Scripting and CI/CD
Exit codes
| Code | Meaning |
|---|---|
0 | Evaluation passed (score >= threshold) |
1 | Evaluation failed or error occurred |
Shell script example
GitHub Actions example
Tips
Quick smoke test
Use
--max-prompts 5 for a fast sanity check before running the full evaluation.Always run doctor first
If something seems wrong,
kya doctor will tell you exactly which part of the pipeline is broken.Combine with SDK
Use the CLI for ad-hoc testing and the SDK for automated pipelines. Both create runs visible in the dashboard.
Pipe output
CLI output is designed for both humans and machines. Combine with
grep or jq for custom reporting.