Policy as Code for Self-Hosted Models

How can you reliably enforce policy as code when you’re running self‑hosted AI models behind your own firewall?

Most teams start by pulling a model image, deploying it on a VM, and exposing a local HTTP endpoint. Access to that endpoint is often granted through a shared API key or a static service account token that lives in configuration files. Engineers, CI pipelines, and occasionally automated bots all use the same credential, and the model receives requests directly from the client without any intermediate inspection. Because the request path bypasses a dedicated control layer, there is no built‑in audit trail, no way to verify that each inference complies with organizational policy, and no mechanism to mask or redact sensitive fields in the response.

That raw setup satisfies the immediate need to serve predictions, but it leaves three critical gaps. First, the static credential can be copied, exfiltrated, or used beyond its intended lifespan, creating a blast radius that grows with every new consumer. Second, the model itself has no visibility into who initiated a request, what data was supplied, or whether the request matches policy constraints such as rate limits, data‑type checks, or content‑filtering rules. Third, because the request travels straight to the model, there is no opportunity to pause a risky operation for human approval or to record the session for later forensic analysis.

Why policy as code alone isn’t enough

Policy as code promises that you can write rules in a declarative format and have them enforced automatically. In practice, the enforcement point matters as much as the rule definition. When the enforcement lives only in the client library or in the model’s own code, the following problems arise:

Clients can be modified or replaced, bypassing the library entirely.
Compromised containers can skip the internal checks and send raw payloads.
Auditing is limited to log statements that the model emits, which may be incomplete or tampered with.

Consequently, the mere existence of policy as code does not guarantee that the policy will be applied to every request, nor that the organization will have verifiable evidence of compliance.

hoop.dev as the data‑path enforcement point

To close the gap, the request must pass through a controlled gateway before reaching the model. hoop.dev provides exactly that layer. It sits on the network edge, acting as an identity‑aware proxy that inspects each inference call at the protocol level. The gateway validates the caller’s OIDC or SAML token, extracts group membership, and then applies the policy as code rules that have been defined for the model.

Because hoop.dev is the only component that can forward traffic to the model, it becomes the sole place where enforcement can happen. The gateway can:

Continue reading? Get the full guide.

Pulumi Policy as Code + Self-Service Access Portals: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Block a request that violates a content‑filtering rule before it reaches the model.
Route a high‑risk request to an approval workflow, requiring a human to approve the inference.
Mask or redact sensitive fields in the model’s response in real time.
Record the full request and response stream for replay and audit.

Each of these outcomes is triggered by hoop.dev, not by the underlying model or by the client token. If the gateway were removed, none of the enforcement actions would occur, which satisfies the self‑test requirement.

Key enforcement outcomes

When hoop.dev sits in the data path, the following security benefits materialize:

Session recording: hoop.dev captures every inference request and response, generating a complete audit log that can be reviewed during investigations.
Inline masking: hoop.dev redacts personally identifiable information from model outputs, ensuring downstream systems never see raw sensitive data.
Just‑in‑time approval: hoop.dev pauses requests that match a high‑risk pattern and forwards them to an approver, preventing accidental policy breaches.
Command‑level audit: hoop.dev logs the exact API call, the caller’s identity, and the policy decision, giving teams full visibility into how the model is used.

All of these outcomes rely on the gateway being the enforcement boundary. The setup phase, defining OIDC clients, provisioning service accounts, and loading the policy definitions, determines who may start a request, but it does not enforce anything on its own.

Getting started

Deploying hoop.dev for a self‑hosted model follows the same pattern as any other connector. Begin with the getting started guide to spin up the gateway and register your model endpoint. Then author your policy as code rules in the supported format and upload them to the gateway. From that point forward, every inference request will be funneled through hoop.dev, where the policies are enforced and evidence is generated automatically.

Beyond request‑time enforcement, hoop.dev can also enforce policies on model lifecycle events, such as version upgrades or configuration changes, ensuring that any modification passes through the same approval workflow.

FAQ

Do I need to modify my model code to use hoop.dev?

No. hoop.dev acts as a transparent proxy, so your model continues to listen on its original port. Clients simply point to the gateway address instead of the model’s direct address.

Can I use existing OIDC providers with hoop.dev?

Yes. hoop.dev supports any OIDC or SAML identity provider, including Okta, Azure AD, and Google Workspace. The provider handles authentication, while hoop.dev uses the token to enforce policy.

Is the audit data stored securely?

hoop.dev writes session logs to a storage backend that you control. The gateway preserves the logs so they can be examined later, providing the evidence needed for compliance audits.

Explore the open‑source repository on GitHub to try it yourself: https://github.com/hoophq/hoop.