Aegis

Overview

Aegis is an AI safety dataset designed for evaluating content moderation and safety guardrails in large language models. It provides labeled examples across critical safety categories, enabling teams to benchmark their AI systems’ ability to detect and refuse harmful content. Know Your AI includes Aegis data in the Dataset Marketplace for comprehensive safety testing alongside other safety benchmarks like BeaverTails.

Safety categories

Aegis covers key content safety risk areas:

Category	Description
Hate speech	Content targeting individuals or groups based on protected characteristics
Sexual content	Sexually explicit or suggestive material
Violence	Content depicting, promoting, or glorifying violence
Self-harm	Content promoting or instructing self-harm or suicide
Criminal activity	Content facilitating or instructing illegal activities
Dangerous content	Content that could lead to physical harm if followed
Harassment	Content intended to bully, intimidate, or harass individuals
Misinformation	Demonstrably false claims presented as fact

Dataset characteristics

Multi-label annotations — Each example can be tagged with multiple safety categories
Severity grading — Violations are scored by severity to enable nuanced evaluation
Diverse attack vectors — Includes direct requests, role-play scenarios, and obfuscated prompts
Contextual variety — Spans conversational, instructional, and creative writing contexts

Use in Know Your AI

Aegis datasets in the Marketplace enable you to:

Test content moderation — Verify your AI correctly identifies and refuses unsafe content
Evaluate guardrail coverage — Check for gaps across all safety categories
Baseline comparison — Compare your model’s safety performance against standard benchmarks
Firewall validation — Test whether the Know Your AI firewall catches safety violations your model misses

Evaluation workflow

Add Aegis datasets to your workspace from the Marketplace
Compose an evaluation selecting Aegis along with your other test datasets
Monitor the real-time console as prompts are sent and judged
Review per-category results to identify safety gaps
Enable the firewall and re-evaluate to measure improvement

Aegis vs. BeaverTails

Aspect	Aegis	BeaverTails
Focus	Content moderation & guardrail testing	Broad safety alignment
Categories	8 core safety categories	14 harm categories
Scale	Focused evaluation set	300K+ annotated pairs
Primary use	Guardrail gap analysis	Safety benchmarking & RLHF
Annotation	Multi-label with severity	Binary safe/unsafe per category

Using both Aegis and BeaverTails together provides the most comprehensive safety evaluation coverage.

Resources

Aegis AI Safety Research

BeaverTails

Compare with BeaverTails safety dataset.

Datasets

Browse all datasets in the Marketplace.

Privacy

OWASP

Frameworks & Standards

Safety Datasets

Overview

Safety categories

Dataset characteristics

Use in Know Your AI

Evaluation workflow

Aegis vs. BeaverTails

Resources

BeaverTails

Datasets

Privacy

OWASP

Frameworks & Standards

Safety Datasets

Documentation Index

​Overview

​Safety categories

​Dataset characteristics

​Use in Know Your AI

​Evaluation workflow

​Aegis vs. BeaverTails

​Resources

BeaverTails

Datasets

Overview

Safety categories

Dataset characteristics

Use in Know Your AI

Evaluation workflow

Aegis vs. BeaverTails

Resources