Static secrets embedded in code give every compromised service unlimited access to your self‑hosted models, and they highlight why relying on non-human identity alone is risky.
Current practice and its blind spots
Most teams ship a single API key or a long‑lived service account credential alongside the model binary. The key is checked into source control, copied between environments, and shared among dozens of micro‑services. When a container starts, it connects directly to the model endpoint using that credential. The connection bypasses any central policy engine, so there is no audit trail, no real‑time validation of the request, and no way to prevent a rogue service from issuing a malicious prompt.
This pattern works until it doesn’t. A compromised CI pipeline, an over‑privileged service account, or a leaked token instantly grants an attacker the ability to run arbitrary inference calls, extract proprietary weights, or poison the model with poison‑data payloads. Because the request travels straight to the model, the organization cannot answer questions such as: who issued the call, what data was returned, or whether the prompt violated internal policy.
Why non-human identity alone is not enough
Introducing a non-human identity, OIDC‑issued service tokens, short‑lived JWTs, or federated cloud identities, solves the credential‑leak problem. Tokens can be scoped, rotated, and revoked, and they give the platform a reliable way to say *which* automated component is speaking.
However, the token itself does not enforce runtime guardrails. The request still reaches the model endpoint directly, meaning the platform cannot:
- Record the exact prompt and response for later review.
- Mask sensitive fields in the model’s output, such as personally identifiable information.
- Require a human approver before a high‑risk operation is executed.
- Block commands that match a deny‑list (for example, attempts to export model weights).
In other words, the *setup* of non-human identity decides who may start a request, but it provides no enforcement on the data path.
The gateway that makes enforcement possible
To close the gap, the request must pass through a layer that can inspect, control, and log every interaction before it reaches the model. That layer is a Layer 7 identity‑aware proxy that sits between the service token and the model endpoint. By placing enforcement in the data path, the platform gains a single point where policy can be applied consistently.
hoop.dev fulfills exactly that role. It receives the non-human token, validates it against the configured identity provider, and then proxies the traffic to the self‑hosted model. Because the gateway is the only place the traffic flows, it can implement the enforcement outcomes that otherwise would be impossible.
