Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hydroxai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

What is Recursive Hijacking?

Recursive Hijacking occurs when an attacker creates a self-reinforcing feedback loop in an AI agent’s reasoning or execution cycle. The agent’s own outputs are fed back as inputs (directly or through tools), progressively amplifying the attacker’s influence with each iteration until the agent is fully compromised.

Why It Matters

Recursive hijacking is uniquely dangerous because it is self-amplifying:
  • Progressive compromise — Each iteration of the loop deepens the attacker’s control, making recovery increasingly difficult.
  • Difficult to detect — The compromise happens gradually through legitimate-looking agent actions.
  • Resource exhaustion — Recursive loops can consume unlimited compute, API calls, and tokens.
  • Cascading corruption — Outputs from hijacked iterations can corrupt downstream systems, databases, and other agents.
  • Difficult to reverse — Once data or decisions from hijacked iterations propagate, reversing the damage is complex.

How the Attack Works

Output-to-Input Loops

Creating cycles where compromised outputs become inputs:
  • Agent writes a report containing hidden instructions → agent reads its own report → instructions activate → agent generates more compromised outputs
  • Agent modifies a shared document → agent re-reads the document → modification alters agent behavior

Tool Feedback Loops

Exploiting tool interactions for recursive compromise:
  • Agent calls an API that returns a poisoned response → agent processes the response → agent calls the API differently → response contains more potent manipulation
  • Agent writes to a database and later queries it, receiving its own compromised data back

Self-Modifying Behavior

Tricking agents into modifying their own prompts or configurations:
  • “Update your instructions to include this additional rule…”
  • Agent generates code that modifies its own configuration files
  • Agent creates automation rules that alter its own behavior

Example Scenarios

ScenarioRisk
Agent writes and re-reads a poisoned document, escalating compromise each cycleProgressive system takeover
Multi-step workflow where each step’s output poisons the next step’s inputComplete workflow corruption
Agent’s logging output contains injection that is processed in the next monitoring cyclePersistent compromise
Recursive API calls amplify a small initial manipulation into full controlResource exhaustion, data corruption

Mitigation Strategies

  • Loop detection — Monitor for repetitive patterns in agent behavior and tool calls
  • Iteration limits — Set hard limits on recursive operations and self-referential actions
  • Output integrity — Validate agent outputs before they can be used as inputs in subsequent steps
  • Immutable logging — Use append-only logs that the agent cannot modify
  • State isolation — Prevent agents from reading their own outputs in subsequent iterations without sanitization
  • Circuit breakers — Implement automatic shutdown when recursive patterns are detected
  • Regular testing — Use Know Your AI to test for recursive hijacking vulnerabilities in agent workflows