An engineering team hands a self‑hosted LLM over to a newly onboarded contractor. The contractor receives a static API token that works for weeks, and the model begins serving requests that unintentionally expose internal data. The breach is discovered only after the damage is done, because no one saw who ran the query, what the query asked, or whether the response should have been allowed.
Why human-in-the-loop approval matters for self‑hosted models
Self‑hosted models sit behind the same network perimeter as any other internal service, but they often expose powerful generation capabilities that can leak secrets, violate policy, or cause regulatory risk. A human-in-the-loop approval workflow forces every high‑risk request to pause for a brief manual review before the model runs. The approach gives teams:
- Visibility into who is asking the model to generate content.
- Control over which prompts are allowed to reach the model.
- Audit records that satisfy compliance auditors and internal governance.
- The ability to block or mask responses that contain sensitive data.
Without a dedicated approval step, any token that can reach the model becomes a de‑facto backdoor.
Current practice and its blind spots
Most organizations grant a service account or static API key to every consumer of a self‑hosted model. The key is often stored in CI pipelines, scripts, or local developer environments. This “direct‑connect” pattern satisfies the need to get the model up quickly, but it leaves three critical gaps:
- There is no runtime gate that can inspect the request before it reaches the model.
- All calls are recorded only in the client’s logs, which can be altered or deleted.
- Any high‑risk prompt can be sent without a human ever seeing it.
Introducing a non‑human identity (OIDC token, service account) and least‑privilege scopes reduces the blast radius, yet the request still travels straight to the model endpoint with no audit, no masking, and no approval point.
Architectural pattern with hoop.dev
hoop.dev provides the missing data‑path component. It sits as a Layer 7 gateway between the identity system and the self‑hosted model. The flow looks like this:
- Setup: Engineers configure an OIDC or SAML identity provider (Okta, Azure AD, Google Workspace). The provider issues short‑lived tokens that identify the caller and their group membership.
- The data path: The token is presented to hoop.dev. The gateway validates the token, extracts the caller’s attributes, and then decides whether the request may proceed.
- Enforcement outcomes: If the request matches a policy that requires review, hoop.dev pauses the request and routes it to an approver. The approver sees the prompt, can edit or reject it, and then authorizes the request to continue. While the request is in flight, hoop.dev records the full session, masks any fields that match configured patterns, and stores a persistent audit record.
Because hoop.dev is the only point that can see both the request and the response, every human-in-the-loop approval decision is enforced by the gateway, not by the model or the client. If hoop.dev were removed, the request would flow directly to the model again, and none of the approval, masking, or recording would occur.
