@know-your-ai/evaluate SDK gives you full programmatic control over evaluations. Use it to automate security testing in CI/CD pipelines, build custom evaluation workflows, or integrate evaluation results into your own dashboards.
Installation
Quick start
Run an existing evaluation in 3 lines:Client setup
From DSN (recommended)
From environment
Manual configuration (advanced)
Full workflow: create, configure, and run
Step 1 — Create a dataset
Step 2 — Create an evaluation
Step 3 — Link datasets to the evaluation
Step 4 — Run the evaluation
API reference
Datasets API
| Method | Description |
|---|---|
client.datasets.list(options) | List all datasets in your workspace |
client.datasets.get(options) | Get a dataset by ID |
client.datasets.create(options) | Create an empty dataset |
client.datasets.createWithItems(options) | Create a dataset with initial items |
client.datasets.addItems(options) | Add items to an existing dataset |
client.datasets.listItems(options) | List items in a dataset |
client.datasets.delete(options) | Delete a dataset |
Create dataset with items
List datasets with pagination
Evaluations API
| Method | Description |
|---|---|
client.evaluations.list(options) | List all evaluations |
client.evaluations.get(options) | Get an evaluation by ID |
client.evaluations.create(options) | Create a new evaluation |
client.evaluations.update(options) | Update evaluation settings |
client.evaluations.delete(options) | Delete an evaluation |
client.evaluations.addDataSet(options) | Link a dataset |
client.evaluations.removeDataSet(options) | Unlink a dataset |
client.evaluations.listDataSets(options) | List linked datasets |
Create evaluation with full options
Evaluation Runs API
| Method | Description |
|---|---|
client.evaluationRuns.create(options) | Create a new run |
client.evaluationRuns.get(options) | Get run details |
client.evaluationRuns.list(options) | List runs for an evaluation |
client.evaluationRuns.executeDatasetTests(options) | Execute tests on a run |
client.evaluationRuns.waitForCompletion(options) | Poll until run is done |
Low-level run control
If you need more control thanclient.runEvaluation() provides:
CI/CD integration
GitHub Actions
Evaluation script for CI
Advanced patterns
Run multiple evaluations in parallel
Custom progress reporting
Batch dataset creation from files
Compare two model versions
Error handling
All SDK methods return anApiResponse object:
result.success before accessing result.data: