All posts

RBAC for Inference

Every inference request is answered only after the system confirms that the caller’s role permits the specific model and data it wants to use. In many organizations the inference layer is exposed through a single API endpoint protected by a shared secret or a generic service account. Anyone who can invoke the endpoint can run any model, see any output, and potentially exfiltrate proprietary data. The result is a blast radius that expands with each new model or dataset added to the platform. W

Free White Paper

Azure RBAC: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Every inference request is answered only after the system confirms that the caller’s role permits the specific model and data it wants to use.

In many organizations the inference layer is exposed through a single API endpoint protected by a shared secret or a generic service account. Anyone who can invoke the endpoint can run any model, see any output, and potentially exfiltrate proprietary data. The result is a blast radius that expands with each new model or dataset added to the platform.

Why traditional rbac often fails for inference workloads

Classic role‑based access control works well for static resources like files or database tables, where permissions can be expressed as simple read/write flags. Inference services are different: they are dynamic, they accept arbitrary prompts, and they may call downstream services (vector stores, feature pipelines, or external APIs). When the enforcement point lives inside the model server, the server itself must be trusted not to bypass the policy, and the audit trail is tied to the same process that could be compromised.

The typical starting state looks like this:

  • A single API key is stored in the CI/CD pipeline and distributed to multiple services.
  • Developers and automated jobs all share the same credential, giving them equal power over every model.
  • There is no per‑request visibility – logs only show that "some client" called the endpoint.
  • Data returned from the model is never inspected for compliance, so sensitive fields can leak unnoticed.

Those conditions satisfy the “who can start” question but leave the “what can they do” question unanswered. The system can start a request, but there is no place to enforce fine‑grained policies, mask outputs, or require human approval for high‑risk prompts.

Adding a data‑path enforcement point

The missing piece is a gateway that sits between the caller’s identity and the inference engine. The gateway receives the caller’s OIDC or SAML token, extracts role information, and then decides, on a per‑request basis, whether the request should be allowed, transformed, or blocked. Because the gateway is the only path the traffic can take, every enforcement outcome is guaranteed to be applied.

Key capabilities of such a gateway include:

  • Mapping roles to allowed models, datasets, and parameter ranges.
  • Real‑time evaluation of policy rules before the request reaches the model server.
  • Inline masking of responses that contain regulated data (PII, trade secrets, etc.).
  • Just‑in‑time approval workflows for high‑risk prompts.
  • Session recording and replay for post‑incident analysis.

How to build rbac for inference without reinventing the wheel

1. Identity layer. Use an enterprise IdP (Okta, Azure AD, Google Workspace, …) that issues short‑lived OIDC tokens. The token carries the user’s group memberships, which you translate into application‑specific roles.

2. Role definition. Create a catalog of roles such as model‑viewer, restricted‑prompt‑executor, and admin‑inference‑operator. Each role lists the models it may invoke, the datasets it may reference, and any output‑masking rules.

Continue reading? Get the full guide.

Azure RBAC: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

3. Policy engine. Store the role‑to‑resource mapping in a central configuration store. The engine evaluates a request by checking the caller’s role against the requested model and data.

4. Enforcement point. Deploy a layer‑7 gateway that intercepts the HTTP/GRPC traffic destined for the inference service. The gateway performs the policy lookup, applies masking, and logs the decision.

5. Audit and replay. Every request and response is recorded by the gateway. The logs include the identity, the requested model, the applied policy, and the final output (masked if needed). This audit trail satisfies compliance requirements and enables forensic investigations.

hoop.dev as the enforcement gateway

hoop.dev is an open‑source, layer‑7 gateway designed exactly for this pattern. It sits between identities and infrastructure, verifies OIDC tokens, and then enforces fine‑grained policies on the traffic that passes through it. Because hoop.dev is the data path, it can:

  • Enforce rbac for inference requests, allowing only the models and datasets permitted for each role.
  • Mask sensitive fields in model responses before they reach the caller.
  • Require a human approver for prompts that match a high‑risk pattern.
  • Record every inference session for replay and audit.

The gateway runs as a Docker Compose service for quick trials, or as a Kubernetes deployment for production. Identity is handled via OIDC; the gateway reads the token, extracts group claims, and maps them to the role catalog you define. All of this happens outside the inference server, so the server never sees a credential it could misuse.

Practical adoption steps

Start by cloning the open‑source repository and following the getting‑started guide. Deploy the gateway in the same network segment as your model server. Then:

  1. Configure OIDC authentication with your IdP.
  2. Define role‑to‑model mappings in the gateway’s configuration (e.g., the data‑analyst role can call sales‑forecast but not customer‑churn).
  3. Enable inline masking for fields such as credit‑card numbers or social security numbers.
  4. Turn on session recording and point the logs to a secure storage bucket.
  5. Test the flow with a few users, verify that denied requests are blocked, and that allowed requests are logged with the correct role information.

For deeper details on masking, approval workflows, and replay, explore the learn section of the documentation.

FAQ

Can I use hoop.dev with an existing inference platform?

Yes. hoop.dev is protocol‑agnostic at layer 7, so it can proxy HTTP, gRPC, or custom inference APIs without requiring changes to the model server.

Does hoop.dev store any credentials?

The gateway holds the service‑account credential that the inference server needs, but it never exposes that credential to the caller. All authentication is performed by the IdP.

How does hoop.dev help with compliance?

Because every request is evaluated against rbac policies, masked as needed, and recorded, you have a complete audit trail that can be presented to auditors for standards such as SOC 2 or internal governance frameworks.

Implementing effective role‑based access control for inference is achievable without building a custom proxy from scratch. By placing a layer‑7 gateway in the data path, you gain the enforcement point needed to turn “anyone can call any model” into “only authorized roles can invoke approved models, with full audit and masking.” hoop.dev provides that gateway out of the box, letting you focus on defining roles and policies rather than plumbing the enforcement yourself.

Explore the open‑source repository on GitHub to get started.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts