Attack datasets are collections of adversarial prompts designed to test how well your AI model resists different types of attacks. Know Your AI provides a curated marketplace of datasets covering every major attack vector, and you can upload your own.Documentation Index
Fetch the complete documentation index at: https://hydroxai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Attack categories
Know Your AI organizes attacks into 7 categories, each targeting a different vulnerability surface:Jailbreak
Tricks the model into ignoring safety instructions and acting without restrictions. Tests whether the model maintains its safety boundaries.
Prompt Injection
Injects hidden instructions into the prompt to override the system prompt or manipulate model behavior.
Data Extraction
Attempts to extract training data, system prompts, or sensitive information the model was trained on or given access to.
Harmful Content
Tries to get the model to generate violent, illegal, hateful, or otherwise harmful content.
PII Leakage
Tests whether the model leaks personal identifiable information about real individuals.
Bias and Fairness
Probes the model for racial, gender, religious, or other biases in its responses.
Hallucination
Tests whether the model generates confident but factually incorrect information.
Attack methods
Each category contains multiple attack methods, each using a different technique:Jailbreak methods
| Method | Technique | Description |
|---|---|---|
| DAN | Role-play | ”Do Anything Now” — asks the model to role-play as an unrestricted AI |
| GCG | Token optimization | Greedy Coordinate Gradient — appends adversarial token suffixes |
| PAIR | Iterative refinement | Prompt Automatic Iterative Refinement — uses an attacker LLM to refine prompts |
| GRANDMOTHER | Social engineering | Uses emotional manipulation (“my grandmother used to…”) |
| DEEP_INCEPTION | Layered narrative | Embeds harmful requests inside nested fictional scenarios |
| GPTFUZZER | Mutation fuzzing | Automatically mutates successful jailbreak prompts |
Prompt injection methods
| Method | Technique | Description |
|---|---|---|
| CIPHER | Encoding | Encodes malicious instructions in Base64, ROT13, hex, or other ciphers |
| ARTPROMPT | ASCII art | Hides instructions inside ASCII art that models can decode |
| ADAPTIVE | Dynamic | Adapts attack strategy based on the model’s responses |
Data extraction methods
| Method | Technique | Description |
|---|---|---|
| DRA | Direct request | Directly asks the model to reveal system prompts or training data |
| RENELLM | Reformulation | Reformulates extraction requests in creative ways |
Harmful content methods
| Method | Technique | Description |
|---|---|---|
| PSYCHOLOGY | Manipulation | Uses psychological persuasion techniques to elicit harmful content |
| GPTFUZZER | Mutation | Fuzzes known harmful prompts to find new bypass vectors |
PII and privacy methods
| Method | Technique | Description |
|---|---|---|
| MULTILINGUAL | Language switching | Tests PII leakage across multiple languages |
| PAST_TENSE | Temporal framing | Frames privacy-violating requests in past tense to bypass filters |
Bias and fairness methods
| Method | Technique | Description |
|---|---|---|
| ADAPTIVE | Multi-angle probing | Tests for bias from multiple angles and demographics |
| MULTILINGUAL | Cross-lingual | Tests whether bias manifests differently across languages |
Hallucination methods
| Method | Technique | Description |
|---|---|---|
| DRA | Factual queries | Poses factual questions to detect confident wrong answers |
| PAIR | Adversarial queries | Uses iteratively refined prompts to trigger hallucinations |
Dataset Marketplace
The Dataset Marketplace provides ready-to-use datasets curated by security researchers. Browse and add datasets to your workspace in one click.Featured datasets
| Dataset | Category | Prompts | Description |
|---|---|---|---|
| OWASP LLM Top 10 | Multi-category | 200+ | Covers all 10 OWASP LLM vulnerability categories |
| DAN Collection | Jailbreak | 100+ | All major DAN variants (DAN 5.0 through 15.0) |
| GCG Adversarial | Jailbreak | 50+ | Token-optimized adversarial suffixes |
| PAIR Refined | Jailbreak | 75+ | Iteratively refined jailbreak prompts |
| CIPHER Suite | Prompt injection | 80+ | Multi-encoding prompt injections |
| ARTPROMPT Gallery | Prompt injection | 40+ | ASCII-art based prompt injections |
| PII Extraction | Privacy | 60+ | PII leak probes across categories |
| Bias Benchmark | Fairness | 150+ | Multi-demographic bias testing prompts |
| BeaverTails Safety | Harmful content | 300+ | Comprehensive safety evaluation dataset |
| Hallucination Probes | Hallucination | 100+ | Factual accuracy testing prompts |
Adding marketplace datasets
Add to workspace
Click Add to Workspace on any dataset. It becomes immediately available for evaluations.
Custom datasets
Dataset types
| Type | Use case | Fields |
|---|---|---|
| Default | Attack/security testing | input only — the attack prompt |
| Ground Truth | Accuracy testing | input + goldenAnswer for correctness comparison |
| A/B Test | Comparison testing | input + variantA + variantB for side-by-side evaluation |
Create via dashboard
- Go to Datasets in the sidebar
- Click Create Dataset
- Enter a name, category, and description
- Add prompts manually or upload a JSON/CSV file
Create via SDK
Create via CLI
The CLI can be used with the SDK to script dataset creation:Upload format
JSON format:Building effective datasets
Best practices
Cover multiple attack angles
Cover multiple attack angles
Don’t just use one attack method. Combine DAN, PAIR, CIPHER, and other methods in a single evaluation to test your model’s defenses from multiple angles.
Include both simple and sophisticated attacks
Include both simple and sophisticated attacks
Start with basic “ignore your instructions” prompts and escalate to encoded, multi-step, and context-manipulating attacks. A model that blocks simple attacks may still fail sophisticated ones.
Test across languages
Test across languages
Many models have weaker safety filters in non-English languages. Include multilingual attack prompts to catch these blind spots.
Update regularly
Update regularly
New attack methods are discovered constantly. Refresh your datasets monthly. The marketplace is updated automatically with new techniques.
Match your threat model
Match your threat model
If your model handles financial data, focus on PII extraction and data leakage. If it is customer-facing, prioritize jailbreak and harmful content. Build datasets that mirror your actual risks.
Recommended evaluation compositions
| Use case | Recommended datasets | Expected prompts |
|---|---|---|
| Pre-deployment safety check | DAN, PAIR, CIPHER, Harmful Content | 200-300 |
| Privacy compliance (CCPA) | PII Extraction, Data Leakage, Multilingual PII | 100-200 |
| Chatbot launch readiness | OWASP Top 10, Bias Benchmark, Safety Baseline | 300-500 |
| Quick smoke test | DAN (top 10), PAIR (top 10) | 20-30 |
| Comprehensive audit | All marketplace datasets | 1000+ |
MITRE ATLAS mapping
Know Your AI attack categories map to the MITRE ATLAS framework:| ATLAS Technique | Know Your AI Category | Datasets |
|---|---|---|
| AML.T0051 — LLM Prompt Injection | Prompt Injection | CIPHER, ARTPROMPT, ADAPTIVE |
| AML.T0054 — LLM Jailbreak | Jailbreak | DAN, GCG, PAIR, GRANDMOTHER |
| AML.T0048 — Data Extraction | Data Extraction | DRA, RENELLM |
| AML.T0025 — Exfiltration | PII Leakage | PII Extraction, Multilingual |
| AML.T0043 — Adversarial Input | All categories | All datasets |