Documentation Index
Fetch the complete documentation index at: https://hydroxai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Aegis is an AI safety dataset designed for evaluating content moderation and safety guardrails in large language models. It provides labeled examples across critical safety categories, enabling teams to benchmark their AI systems’ ability to detect and refuse harmful content. Know Your AI includes Aegis data in the Dataset Marketplace for comprehensive safety testing alongside other safety benchmarks like BeaverTails.Safety categories
Aegis covers key content safety risk areas:| Category | Description |
|---|---|
| Hate speech | Content targeting individuals or groups based on protected characteristics |
| Sexual content | Sexually explicit or suggestive material |
| Violence | Content depicting, promoting, or glorifying violence |
| Self-harm | Content promoting or instructing self-harm or suicide |
| Criminal activity | Content facilitating or instructing illegal activities |
| Dangerous content | Content that could lead to physical harm if followed |
| Harassment | Content intended to bully, intimidate, or harass individuals |
| Misinformation | Demonstrably false claims presented as fact |
Dataset characteristics
- Multi-label annotations — Each example can be tagged with multiple safety categories
- Severity grading — Violations are scored by severity to enable nuanced evaluation
- Diverse attack vectors — Includes direct requests, role-play scenarios, and obfuscated prompts
- Contextual variety — Spans conversational, instructional, and creative writing contexts
Use in Know Your AI
Aegis datasets in the Marketplace enable you to:- Test content moderation — Verify your AI correctly identifies and refuses unsafe content
- Evaluate guardrail coverage — Check for gaps across all safety categories
- Baseline comparison — Compare your model’s safety performance against standard benchmarks
- Firewall validation — Test whether the Know Your AI firewall catches safety violations your model misses
Evaluation workflow
- Add Aegis datasets to your workspace from the Marketplace
- Compose an evaluation selecting Aegis along with your other test datasets
- Monitor the real-time console as prompts are sent and judged
- Review per-category results to identify safety gaps
- Enable the firewall and re-evaluate to measure improvement
Aegis vs. BeaverTails
| Aspect | Aegis | BeaverTails |
|---|---|---|
| Focus | Content moderation & guardrail testing | Broad safety alignment |
| Categories | 8 core safety categories | 14 harm categories |
| Scale | Focused evaluation set | 300K+ annotated pairs |
| Primary use | Guardrail gap analysis | Safety benchmarking & RLHF |
| Annotation | Multi-label with severity | Binary safe/unsafe per category |
Resources
BeaverTails
Compare with BeaverTails safety dataset.
Datasets
Browse all datasets in the Marketplace.