CLI Evaluations - Know Your AI

The Know Your AI CLI lets you run evaluations, check results, and manage datasets directly from your terminal. It is ideal for quick testing, scripting, and CI/CD integration.

Prerequisites

Install the CLI

npm install -g @know-your-ai/cli

Set your DSN

export KNOW_YOUR_AI_DSN="https://kya_xxx:da2-xxx@host/product-id"

Get your DSN from the Know Your AI dashboard under Product Settings, API Keys.

Verify your setup

kya doctor

All six checks should show ✔.

Evaluation workflow

A typical CLI evaluation workflow follows four steps:

kya list  →  kya describe  →  kya evaluate  →  kya result

1. List available evaluations and datasets

kya list

Output:

Evaluations for product prod_xxxxxxxxx

  ID                    Name                    Category    Runs
  eval-abc-123          Jailbreak Resistance    security    12
  eval-def-456          PII Protection          privacy     8
  eval-ghi-789          Content Safety          safety      5

Datasets for product prod_xxxxxxxxx

  ID                    Name                    Category    Prompts
  ds-001                DAN Prompts             jailbreak   50
  ds-002                PAIR Attacks            injection   120
  ds-003                PII Extraction          privacy     75

2. Inspect an evaluation

kya describe eval-abc-123

Output:

Evaluation: Jailbreak Resistance

  ID:              eval-abc-123
  Name:            Jailbreak Resistance
  Category:        security
  Judgment Model:  gemini-2.0-flash
  Threshold:       0.80
  Created:         2025-01-15T10:30:00Z

  Linked Datasets:
    ds-001    DAN Prompts       50 prompts
    ds-002    PAIR Attacks     120 prompts

  Recent Runs:
    Run ID              Status      Score     Date
    run-001             completed   96.0%     2025-01-20
    run-002             completed   94.5%     2025-01-18
    run-003             completed   97.0%     2025-01-15

3. Run the evaluation

kya evaluate eval-abc-123

The CLI shows real-time progress:

Starting evaluation run for eval-abc-123...
Run ID: run-004

  ████████████████████░░░░░░░░  72% | 122/170 tests

  Status: running
  Secure: 118
  Vulnerable: 4
  Elapsed: 2m 34s

When complete:

Evaluation complete!

  Run ID:       run-004
  Status:       completed
  Score:        96.5%
  Secure:       164
  Vulnerable:   6
  Total:        170
  Duration:     3m 42s

  Result: PASSED (threshold: 80%)

Limit prompts per dataset

kya evaluate eval-abc-123 --max-prompts 10

This sends a maximum of 10 prompts from each linked dataset — useful for quick smoke tests.

Set a custom timeout

kya evaluate eval-abc-123 --timeout 1200

Wait up to 20 minutes for the evaluation to complete (default is 10 minutes).

Enable debug logging

kya evaluate eval-abc-123 --debug

Shows detailed HTTP request/response logs and internal state transitions.

4. View past results

kya result eval-abc-123

Output:

Run History for eval-abc-123 (Jailbreak Resistance)

  Run ID     Status      Score    Secure  Vulnerable  Total   Date
  run-004    completed   96.5%    164     6           170     2025-01-20
  run-003    completed   94.5%    161     9           170     2025-01-18
  run-002    completed   97.0%    165     5           170     2025-01-15
  run-001    completed   92.0%    156     14          170     2025-01-10

View full evaluation history

kya history

Shows all evaluation runs across all evaluations in your product:

All Evaluation Runs for prod_xxxxxxxxx

  Run ID     Evaluation           Score    Status      Date
  run-012    Content Safety       98.0%    completed   2025-01-20
  run-004    Jailbreak Resistance 96.5%    completed   2025-01-20
  run-011    PII Protection       91.0%    completed   2025-01-19
  ...

Command reference

Command	Description
`kya doctor`	Validate DSN, test API connectivity
`kya target`	Show product configuration details
`kya list` (or `kya ls`)	List evaluations and datasets
`kya describe <id>`	Show evaluation details, linked datasets, recent runs
`kya evaluate <id>`	Run an evaluation with real-time progress
`kya result <id>`	Show run history for an evaluation
`kya history`	Show all runs across all evaluations

Flags reference

Flag	Commands	Description
`--max-prompts <n>`	`evaluate`	Max prompts per dataset
`--timeout <s>`	`evaluate`	Max wait time in seconds (default: 600)
`--debug`	`evaluate`	Enable debug logging

Scripting and CI/CD

Exit codes

Code	Meaning
`0`	Evaluation passed (score >= threshold)
`1`	Evaluation failed or error occurred

Shell script example

#!/bin/bash
set -e

export KNOW_YOUR_AI_DSN="$KYA_DSN"

echo "Running security evaluation..."
kya evaluate eval-abc-123 --max-prompts 20 --timeout 300

echo "Running privacy evaluation..."
kya evaluate eval-def-456 --max-prompts 20 --timeout 300

echo "All evaluations passed!"

GitHub Actions example

name: AI Security Check
on: [push]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm install -g @know-your-ai/cli
      - name: Run evaluations
        env:
          KNOW_YOUR_AI_DSN: # Set this in GitHub repo Settings > Secrets
        run: |
          kya doctor
          kya evaluate eval-abc-123 --max-prompts 50

Tips

Quick smoke test

Use --max-prompts 5 for a fast sanity check before running the full evaluation.

Always run doctor first

If something seems wrong, kya doctor will tell you exactly which part of the pipeline is broken.

Combine with SDK

Use the CLI for ad-hoc testing and the SDK for automated pipelines. Both create runs visible in the dashboard.

Pipe output

CLI output is designed for both humans and machines. Combine with grep or jq for custom reporting.

​Prerequisites

​Evaluation workflow

​1. List available evaluations and datasets

​2. Inspect an evaluation

​3. Run the evaluation

​Limit prompts per dataset

​Set a custom timeout

​Enable debug logging

​4. View past results

​View full evaluation history

​Command reference

​Flags reference

​Scripting and CI/CD

​Exit codes

​Shell script example

​GitHub Actions example

​Tips

Quick smoke test

Always run doctor first

Combine with SDK

Pipe output

Prerequisites

Evaluation workflow

1. List available evaluations and datasets

2. Inspect an evaluation

3. Run the evaluation

Limit prompts per dataset

Set a custom timeout

Enable debug logging

4. View past results

View full evaluation history

Command reference

Flags reference

Scripting and CI/CD

Exit codes

Shell script example

GitHub Actions example

Tips