An offboarded contractor still has a CI job that launches dozens of inference containers every night, each reusing the same service token. A data‑science team rolls out a new model and immediately grants a shared API key to every notebook, trusting that the key will be revoked later. In practice, dozens of short‑lived agents proliferate across the network, all pointing directly at the model endpoint without any visibility into who issued a request or what data was sent.
This uncontrolled growth, often called agent sprawl, creates a massive blast radius. When a token leaks, an attacker can fire unlimited inference calls, exfiltrate proprietary prompts, or poison the model with crafted inputs. Because the agents talk straight to the inference service, there is no central log of which user triggered a request, no way to mask sensitive payloads, and no gate that can pause a risky call for human review.
The core need is a control plane that can limit the number of active agents, enforce just‑in‑time credentials, and require approval for high‑risk operations. The current setup satisfies the need to reach the model, but it leaves the request unmediated: no audit trail, no inline masking, and no ability to block a dangerous query before it hits the model.
Agent sprawl in inference workloads
When every developer or automation script can spin up its own inference client, the environment quickly becomes noisy. Permissions are often granted at the service level, not the user level, so revoking a single user does not stop the agents they have already deployed. Without a unified entry point, security teams cannot answer simple questions such as “who queried the model at 03:00?” or “what data was returned to a particular client?”. The lack of a single choke point also prevents the application of data‑masking policies that protect personally identifiable information that may appear in prompts or responses.
Why a data‑path gateway is required
To break the direct‑to‑model connection, the enforcement point must sit in the data path. Only a gateway that intercepts the wire‑level protocol can observe each request, apply policy, and record the interaction. Placing controls in the identity provider or in the client configuration does not stop a compromised agent from sending traffic straight to the model.
Introducing a data‑path gateway
hoop.dev provides exactly that gateway. It sits between any identity (OIDC, SAML, service accounts) and the inference endpoint. When a client attempts to call the model, hoop.dev authenticates the identity, checks the request against policy, and then forwards it only if the request complies. The gateway can:
- Issue just‑in‑time credentials that expire after a short window.
- Require an approval workflow for queries that contain high‑risk keywords.
- Mask sensitive fields in prompts or responses in real time.
- Record every inference session for replay and audit.
Because hoop.dev is the only component that sees the clear‑text request, all enforcement outcomes are guaranteed to happen. If hoop.dev were removed, the agents would again talk directly to the model and none of the above protections would exist.
Practical steps to tame agent sprawl
1. Deploy the gateway in the same network segment as the inference service. The official getting‑started guide walks through a Docker Compose deployment that includes the required agent.
