Non-human identity: what it means for your prompt-injection risk

An agent reads a support ticket, the ticket contains text crafted to redirect the agent, and the agent goes off and does something with its database access. Everyone calls this a prompt problem. Operationally, the damage is not the prompt. The damage is what the agent's identity was allowed to reach once it was redirected. That reframing is the whole point of treating prompt-injection risk as an identity problem.

This post stays on the defensive side. You cannot reliably guarantee a model ignores hostile input, so the durable control is to bound what a successfully manipulated agent can actually do at the infrastructure boundary.

Reframe the risk around the identity, not the text

An agent is a non-human identity with credentials. Prompt-injection risk is the chance that input steers that identity into using its access for something you did not intend. The blast radius is set entirely by what the identity can reach, not by how clever the injected text was.

That means the lever you control is access. A manipulated agent with read-only access to one table can do far less harm than the same agent with write access to production. The text changes; the bound is what you decide in advance.

Put the control where the agent cannot reconfigure it

Here is the architectural requirement, framed defensively. The check on what the agent may do has to run on the access path, outside the agent process. If the agent enforces its own limits, injected instructions can talk it into relaxing them. A boundary the agent cannot edit is the only kind that survives manipulation.

Continue reading? Get the full guide.

Non-Human Identity Management + Prompt Injection Prevention: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

hoop.dev is an open-source access gateway between identities and infrastructure. An agent reaches databases, clusters, and services through it, and the gateway enforces scope, routes risky operations for human approval, and records every command under the agent's principal. hoop.dev governs the infrastructure connection; it does not read the model's prompt or output, and it does not need to. Even if injected input redirects the agent's intent, the agent still hits a scoped, approved, recorded boundary on the connection. That is server-side defense against prompt-injection risk: limit the action, not the language. See how approvals and scoping work in the getting-started guide.

Operational controls that reduce prompt-injection risk

Scope the agent's identity tightly. Grant only the connections and operations its task needs. Injected instructions cannot reach what the identity cannot.
Route destructive operations for approval. A human gate on writes and deletes means a manipulated agent stalls at the boundary instead of executing.
Record every command. If something does slip through, the per-principal record shows exactly what the agent ran, so you can scope down and respond.
Time-box access. Just-in-time grants mean a redirected agent has a narrow window, not standing reach.

None of these try to win the unwinnable fight of filtering hostile text. They make the consequence small. The identity-aware model is described on the hoop.dev site.

A worked example

An agent triages incoming tickets and has a database identity for looking up account status. A ticket arrives with text engineered to make the agent run a destructive update. Without a boundary on the connection, the agent's identity has write access, so the update runs and you learn about it from an angry customer. With the connection governed, the same agent's identity is scoped to read-only on the account table, the write it attempts is denied at the gateway, and the attempt is recorded under the agent's principal. The injection succeeded at the language level and failed at the action level, which is the level that actually mattered. That is the practical shape of treating prompt-injection risk as an access problem rather than a text problem.

FAQ

Does a gateway stop prompt injection?

No tool reliably stops a model from being influenced by input. A gateway contains the result by bounding what the agent's identity can do on infrastructure, which is the part you can actually enforce.

Why not just filter the inputs?

Input filtering helps but cannot be complete. Defense in depth means assuming some injection succeeds and ensuring the agent's reachable actions are scoped, approved, and recorded.

To bound prompt-injection risk at the connection, where a manipulated agent cannot relax its own limits, read the open-source gateway on GitHub.