Recursive Hijacking

What is Recursive Hijacking?

Recursive Hijacking occurs when an attacker creates a self-reinforcing feedback loop in an AI agent’s reasoning or execution cycle. The agent’s own outputs are fed back as inputs (directly or through tools), progressively amplifying the attacker’s influence with each iteration until the agent is fully compromised.

Why It Matters

Recursive hijacking is uniquely dangerous because it is self-amplifying:

Progressive compromise — Each iteration of the loop deepens the attacker’s control, making recovery increasingly difficult.
Difficult to detect — The compromise happens gradually through legitimate-looking agent actions.
Resource exhaustion — Recursive loops can consume unlimited compute, API calls, and tokens.
Cascading corruption — Outputs from hijacked iterations can corrupt downstream systems, databases, and other agents.
Difficult to reverse — Once data or decisions from hijacked iterations propagate, reversing the damage is complex.

How the Attack Works

Output-to-Input Loops

Creating cycles where compromised outputs become inputs:

Agent writes a report containing hidden instructions → agent reads its own report → instructions activate → agent generates more compromised outputs
Agent modifies a shared document → agent re-reads the document → modification alters agent behavior

Tool Feedback Loops

Exploiting tool interactions for recursive compromise:

Agent calls an API that returns a poisoned response → agent processes the response → agent calls the API differently → response contains more potent manipulation
Agent writes to a database and later queries it, receiving its own compromised data back

Self-Modifying Behavior

Tricking agents into modifying their own prompts or configurations:

“Update your instructions to include this additional rule…”
Agent generates code that modifies its own configuration files
Agent creates automation rules that alter its own behavior

Example Scenarios

Scenario	Risk
Agent writes and re-reads a poisoned document, escalating compromise each cycle	Progressive system takeover
Multi-step workflow where each step’s output poisons the next step’s input	Complete workflow corruption
Agent’s logging output contains injection that is processed in the next monitoring cycle	Persistent compromise
Recursive API calls amplify a small initial manipulation into full control	Resource exhaustion, data corruption

Mitigation Strategies

Loop detection — Monitor for repetitive patterns in agent behavior and tool calls
Iteration limits — Set hard limits on recursive operations and self-referential actions
Output integrity — Validate agent outputs before they can be used as inputs in subsequent steps
Immutable logging — Use append-only logs that the agent cannot modify
State isolation — Prevent agents from reading their own outputs in subsequent iterations without sanitization
Circuit breakers — Implement automatic shutdown when recursive patterns are detected
Regular testing — Use Know Your AI to test for recursive hijacking vulnerabilities in agent workflows

Overview

Data Privacy

Responsible AI

Security

Safety

Business

Agentic

What is Recursive Hijacking?

Why It Matters

How the Attack Works

Output-to-Input Loops

Tool Feedback Loops

Self-Modifying Behavior

Example Scenarios

Mitigation Strategies

Overview

Data Privacy

Responsible AI

Security

Safety

Business

Agentic

Documentation Index

​What is Recursive Hijacking?

​Why It Matters

​How the Attack Works

​Output-to-Input Loops

​Tool Feedback Loops

​Self-Modifying Behavior

​Example Scenarios

​Mitigation Strategies

What is Recursive Hijacking?

Why It Matters

How the Attack Works

Output-to-Input Loops

Tool Feedback Loops

Self-Modifying Behavior

Example Scenarios

Mitigation Strategies