All posts

Self-Hosted Models and AI Governance: What to Know

When a self‑hosted model leaks proprietary data, the financial and reputational damage can quickly eclipse any cost savings from avoiding cloud services, making ai governance a critical concern. Most organizations that run models on their own servers treat the model endpoint like any other internal API: a shared API key or static credential is distributed to developers, CI pipelines, and sometimes third‑party scripts. The key lives in configuration files, environment variables, or secret stores

Free White Paper

AI Tool Use Governance + Self-Service Access Portals: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When a self‑hosted model leaks proprietary data, the financial and reputational damage can quickly eclipse any cost savings from avoiding cloud services, making ai governance a critical concern.

Most organizations that run models on their own servers treat the model endpoint like any other internal API: a shared API key or static credential is distributed to developers, CI pipelines, and sometimes third‑party scripts. The key lives in configuration files, environment variables, or secret stores that are not centrally audited. Because the request travels directly to the model, there is no record of who asked what, no way to hide sensitive fields in the response, and no mechanism to pause a risky inference for human review.

Adding an identity layer, such as OIDC or SAML tokens, solves the first part of the problem. It tells the system who is making the call and can enforce least‑privilege scopes. However, the request still reaches the model without any gatekeeper in the data path. That means the organization still lacks command‑level audit, inline data masking, just‑in‑time approval, or the ability to block a dangerous prompt before it is processed.

AI governance challenges for self‑hosted models

Effective ai governance for on‑premise models requires three capabilities. hoop.dev ties every inference request to an identity and records it in an immutable audit trail. The gateway must also be able to inspect the request and response payloads, redact or mask confidential fields, and optionally route suspicious queries to a reviewer. Finally, the system should record the entire session so that security teams can replay it later, investigate anomalies, and produce evidence for audits.

These capabilities can only be guaranteed when the enforcement point sits between the caller and the model serving process. That is where hoop.dev enters the architecture.

How hoop.dev provides the missing enforcement layer

hoop.dev is a Layer 7 gateway that proxies connections to infrastructure, including self‑hosted model servers. It terminates the client connection, authenticates the caller via OIDC/SAML, and then forwards the request to the model with its own service credentials. Because the model never sees the user’s token, the gateway can apply policy checks on every request.

Continue reading? Get the full guide.

AI Tool Use Governance + Self-Service Access Portals: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

When a request arrives, hoop.dev can:

  • Record the full request and response, creating a replayable audit log.
  • Mask sensitive fields in the model’s output according to policy.
  • Require a human approver for queries that match a risk profile, providing just‑in‑time access control.
  • Block commands that contain prohibited patterns before they reach the model.

All of these outcomes are produced by hoop.dev because it sits in the data path; the surrounding identity setup alone cannot enforce them.

Deploying a governance‑ready gateway

To bring these controls to a self‑hosted model, start with the quick‑start deployment. A Docker Compose file runs the gateway locally, connects it to an OIDC provider, and enables masking and guardrails out of the box. After the gateway is running, register the model endpoint as a connection and supply the service credentials that the gateway will use to talk to the model. Policies are then defined in the dashboard or via declarative files, specifying which fields to mask, which request patterns need approval, and which users or groups may invoke the model.

For step‑by‑step instructions, see the getting started guide. To explore the full set of policy features, visit the learn section of the documentation.

FAQ

What if my model is already behind a load balancer?

hoop.dev can be placed in front of the load balancer or behind it, as long as all traffic to the model passes through the gateway. The gateway’s agent runs on the same network segment, ensuring that the model’s credentials never leave the controlled environment.

Can hoop.dev work with automated pipelines?

Yes. Service accounts can obtain OIDC tokens from your CI system, and hoop.dev will still enforce masking, logging, and approval policies on every inference request generated by a pipeline.

Does using hoop.dev satisfy any compliance standards?

hoop.dev generates the audit evidence that auditors look for in ai governance programs, such as per‑user request logs and immutable session recordings. It does not claim compliance on its own, but the data it produces can be used to support compliance audits.

Explore the open‑source code on GitHub to see how you can extend or customize the gateway for your organization’s specific governance needs.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts