Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hydroxai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Aegis is an AI safety dataset designed for evaluating content moderation and safety guardrails in large language models. It provides labeled examples across critical safety categories, enabling teams to benchmark their AI systems’ ability to detect and refuse harmful content. Know Your AI includes Aegis data in the Dataset Marketplace for comprehensive safety testing alongside other safety benchmarks like BeaverTails.

Safety categories

Aegis covers key content safety risk areas:
CategoryDescription
Hate speechContent targeting individuals or groups based on protected characteristics
Sexual contentSexually explicit or suggestive material
ViolenceContent depicting, promoting, or glorifying violence
Self-harmContent promoting or instructing self-harm or suicide
Criminal activityContent facilitating or instructing illegal activities
Dangerous contentContent that could lead to physical harm if followed
HarassmentContent intended to bully, intimidate, or harass individuals
MisinformationDemonstrably false claims presented as fact

Dataset characteristics

  • Multi-label annotations — Each example can be tagged with multiple safety categories
  • Severity grading — Violations are scored by severity to enable nuanced evaluation
  • Diverse attack vectors — Includes direct requests, role-play scenarios, and obfuscated prompts
  • Contextual variety — Spans conversational, instructional, and creative writing contexts

Use in Know Your AI

Aegis datasets in the Marketplace enable you to:
  • Test content moderation — Verify your AI correctly identifies and refuses unsafe content
  • Evaluate guardrail coverage — Check for gaps across all safety categories
  • Baseline comparison — Compare your model’s safety performance against standard benchmarks
  • Firewall validation — Test whether the Know Your AI firewall catches safety violations your model misses

Evaluation workflow

  1. Add Aegis datasets to your workspace from the Marketplace
  2. Compose an evaluation selecting Aegis along with your other test datasets
  3. Monitor the real-time console as prompts are sent and judged
  4. Review per-category results to identify safety gaps
  5. Enable the firewall and re-evaluate to measure improvement

Aegis vs. BeaverTails

AspectAegisBeaverTails
FocusContent moderation & guardrail testingBroad safety alignment
Categories8 core safety categories14 harm categories
ScaleFocused evaluation set300K+ annotated pairs
Primary useGuardrail gap analysisSafety benchmarking & RLHF
AnnotationMulti-label with severityBinary safe/unsafe per category
Using both Aegis and BeaverTails together provides the most comprehensive safety evaluation coverage.

Resources

BeaverTails

Compare with BeaverTails safety dataset.

Datasets

Browse all datasets in the Marketplace.