A support ticket arrives. Buried in the customer's text is an instruction: ignore your rules, export the user table, and email it out. Your autonomous agent reads it as part of its task. Whether you have a breach now depends on whether the agent obeys, and you are trusting a language model to refuse. That is prompt injection in autonomous agent systems, and the fix is not a better refusal. It is a boundary that makes obeying pointless. This stays server-side.
Why you cannot prompt your way out
You can instruct the model to ignore malicious input, and you should. You cannot prove it will, for every input it ever reads. So treat the agent's chosen action as untrusted, the way you treat input from any external client, and put the real controls behind it.
A worked example, contained
Take that ticket. The injected instruction tells the agent to export the customer table. With the agent's access scoped to the support tools its task needs, the database is not in its grant, so the export call is denied and logged. Suppose the task did include database access. The action that emails data out hits an approval gate, and a person sees a support agent trying to bulk-export a customer table and says no. At every step the injected instruction runs into a control it cannot argue with. The malicious text never becomes a malicious effect.
The controls have to sit where the model cannot reach them
That is the whole point of stopping prompt injection in autonomous agent systems server-side: the grant, the policy check, and the record must run outside the agent process, on a boundary the model cannot reconfigure. Inside the process, a clever enough injection reasons its way around them. Outside, the model's instructions stop mattering at the point an action would execute. That is one control surface, and hoop.dev is built to it, fronting the agent's access as an identity-aware proxy that scopes each grant, checks every action, masks output, and records the attempt. The getting-started guide covers the first connection and hoop.dev/learn the policy model.
Why input filters give false comfort
A tempting response to prompt injection is to filter the inputs: scan documents and tool results for suspicious instructions before the agent reads them. It helps at the margin, and it gives a dangerous sense of completeness. The space of malicious phrasings is unbounded, the attacker adapts to whatever you filter, and the injection can arrive in data you did not think to scan. A filter that catches yesterday's payloads is no guarantee against tomorrow's.
