When teams back every inference request with a documented, repeatable access review, they can trust that AI outputs are generated only by authorized users under the right conditions.
In many organizations, teams launch inference workloads with a single service‑account key that lives in a repository, a CI secret store, or even a hard‑coded string in application code. The key grants unrestricted access to the model endpoint, so any developer, script, or compromised container can issue prompts without oversight.
Because the request flows directly from the client to the model, there is no centralized log of who asked what, no way to block dangerous prompts, and no mechanism to hide sensitive data that the model might return.
Why access reviews matter for inference
The root problem is that we know the identity that initiates the request, but we lack an enforcement point. Even if you provision a non‑human identity with the least‑privilege scopes, the request still reaches the inference service directly, bypassing any real gate that could verify whether the user should be allowed to run that particular prompt. Without a gate, you cannot enforce just‑in‑time approval, cannot mask returned personally identifiable information, and cannot retain a replayable session for auditors. The result is a blind spot that defeats the purpose of least‑privilege and leaves the organization exposed to data leakage, model misuse, and compliance gaps.
Introducing a data‑path gateway for inference
hoop.dev provides the missing enforcement layer. It sits between the caller and the inference endpoint, acting as a Layer 7 gateway that inspects each request and response in real time.
The gateway receives an OIDC or SAML token, validates the identity, and then applies policy before the request reaches the model.
Setup – Identity providers issue short‑lived tokens for service accounts, CI pipelines, or AI agents. These tokens convey who is making the call and what group memberships they have. hoop.dev consumes the token but does not grant access on its own; it merely identifies the caller.
The data path – All inference traffic passes through hoop.dev. Because the gateway is the only place the request can travel, it becomes the exclusive point where policy can be enforced.
Enforcement outcomes – hoop.dev records every inference session, masks sensitive fields in model responses, and requires just‑in‑time approval for high‑risk prompts. It can also block commands that match a denylist before they are sent to the model. These outcomes exist only because hoop.dev occupies the data path; remove the gateway and the audit, masking, and approval capabilities disappear.
High‑level steps to add access reviews to inference
- Define a non‑human identity for each automation that needs to run inference. Use OIDC or SAML so the token contains the caller’s group information.
- Register the inference endpoint in hoop.dev as a connection. The gateway stores the model credentials; the client never sees them.
- Configure a policy that requires an access review for any prompt that contains regulated keywords or exceeds a token‑count threshold. The policy triggers a just‑in‑time approval workflow.
- Enable session recording so every prompt and response is archived for replay. This provides the evidence needed for audits and incident investigations.
- Turn on inline response masking for fields such as credit‑card numbers, SSNs, or proprietary identifiers. The gateway applies the mask before the response leaves.
You verify identity, you inspect the request, a reviewer can approve or deny, you log the interaction, and you strip out any sensitive data before it reaches downstream consumers. Those steps create a complete access‑review loop.
Practical guidance
Start with the getting‑started guide to spin up the gateway in a container or Kubernetes pod. The guide walks you through connecting an OIDC provider, registering a target, and enabling the core guardrails. Once the gateway is running, use the learn section to explore policy syntax for access reviews, response masking, and approval workflows. Because hoop.dev is open source, you can inspect the code, contribute improvements, or extend the policy engine to match your organization’s risk model.
FAQ
Do I need to change my existing inference client?
No. The client continues to use the same endpoint address and credentials, but the address now points to the hoop.dev gateway. The gateway forwards the request after applying policy, so no code changes are required.
Can I still use my existing service‑account keys?
Yes, but the keys are stored inside the gateway, not in the client. This eliminates credential sprawl and ensures that every request is subject to the same access‑review process.
How does hoop.dev help with compliance?
hoop.dev records each inference session, capturing who asked what, when, and what the model returned. Those logs satisfy audit‑trail requirements for standards such as SOC 2 and can be used as evidence for broader compliance programs.
Explore the source code and contribute at the official GitHub repository: https://github.com/hoophq/hoop.