All posts

Nested agents: what they mean for your prompt-injection risk

With a single agent, injected input redirects one actor. With nested agents, hostile input read by a sub-agent three layers down can redirect that layer while the orchestrator believes everything is fine. That widens prompt-injection risk in a specific way: the manipulation and the action can happen deep in the chain, far from the layer you are watching. This is a defensive post. You cannot guarantee any layer ignores hostile input, so the durable control is to bound what a redirected agent, at

Free White Paper

Prompt Injection Prevention + Risk-Based Access Control: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

With a single agent, injected input redirects one actor. With nested agents, hostile input read by a sub-agent three layers down can redirect that layer while the orchestrator believes everything is fine. That widens prompt-injection risk in a specific way: the manipulation and the action can happen deep in the chain, far from the layer you are watching.

This is a defensive post. You cannot guarantee any layer ignores hostile input, so the durable control is to bound what a redirected agent, at any depth, can actually do on infrastructure.

Why nesting widens the risk

  • More surfaces. Every layer that reads external content is a place injection can land, and chains read a lot of content.
  • Distance from oversight. A sub-agent acting on injected instructions is several hops from the human or top-level logic, so manipulation is less likely to be noticed in time.
  • Self-reporting. The chain's own account of what happened is exactly what a redirected layer can distort.

The requirement: bound the action, outside the chain

The check on what an agent may do has to run on the access path, outside the agent chain, because any in-chain limit can be argued away by injected instructions or undermined by a compromised layer. A boundary the agents cannot reconfigure is the only one that holds when a deep layer is manipulated.

The point is to make the consequence small. A redirected sub-agent that can only reach a scoped, approved, recorded connection cannot turn a clever prompt into a production incident.

How a gateway contains it

hoop.dev is an open-source access gateway between identities and infrastructure. Every connection any agent in the chain makes passes through it, scoped by policy, with risky operations routed for human approval and every command recorded against a named principal. hoop.dev governs the infrastructure connection; it does not read the model's prompt or output, and it does not need to. If injected content redirects a sub-agent five layers deep, that agent still hits the gateway's scope, approval, and recording on the connection. The manipulation does not get a wider reach just because it happened deeper. See how approvals and scoping are configured in the getting-started guide.

Continue reading? Get the full guide.

Prompt Injection Prevention + Risk-Based Access Control: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Controls that bound prompt-injection risk at depth

  1. Scope each connection tightly so a redirected layer reaches only what its task required.
  2. Route destructive operations for approval so a manipulated deep layer stalls at a human gate.
  3. Record every command per principal so distortion in the chain's self-report does not hide what actually ran.
  4. Time-box access so a redirected agent has a narrow window, not standing reach.

The identity-aware model is described on the hoop.dev site.

Why depth makes the external boundary non-optional

In a single agent, you might argue you can watch the one actor closely enough to catch a redirect. In a nested system that argument collapses. The layer that gets manipulated may be several hops down, spawned at runtime, and gone before you look. You cannot watch what you did not know would exist, and you cannot trust a manipulated layer to flag itself.

That is the precise reason the boundary has to sit on the connection rather than in oversight of the chain. The gateway does not need to know which layer issued a command or why; it only needs the command to pass through it to reach infrastructure, where scope, approval, and recording apply uniformly. A redirect at depth meets the same wall a redirect at the top would. The control does not get weaker as the chain gets deeper, which is exactly the property a defense against prompt-injection risk in nested agents requires.

FAQ

Can a gateway detect the injection itself?

No. It does not read the prompt or model output. It bounds and records what the agent does on infrastructure, which is the part you can enforce when injection succeeds anyway.

Why bound the action instead of trusting the chain?

Because a redirected or compromised layer cannot be trusted to enforce its own limits or report honestly. An external boundary on the connection does not depend on the chain behaving.

To contain prompt-injection risk across nested agents at the connection, read the open-source gateway on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts