When a self‑hosted model leaks proprietary data, the financial and reputational damage can quickly eclipse any cost savings from avoiding cloud services, making ai governance a critical concern.
Most organizations that run models on their own servers treat the model endpoint like any other internal API: a shared API key or static credential is distributed to developers, CI pipelines, and sometimes third‑party scripts. The key lives in configuration files, environment variables, or secret stores that are not centrally audited. Because the request travels directly to the model, there is no record of who asked what, no way to hide sensitive fields in the response, and no mechanism to pause a risky inference for human review.
Adding an identity layer, such as OIDC or SAML tokens, solves the first part of the problem. It tells the system who is making the call and can enforce least‑privilege scopes. However, the request still reaches the model without any gatekeeper in the data path. That means the organization still lacks command‑level audit, inline data masking, just‑in‑time approval, or the ability to block a dangerous prompt before it is processed.
AI governance challenges for self‑hosted models
Effective ai governance for on‑premise models requires three capabilities. hoop.dev ties every inference request to an identity and records it in an immutable audit trail. The gateway must also be able to inspect the request and response payloads, redact or mask confidential fields, and optionally route suspicious queries to a reviewer. Finally, the system should record the entire session so that security teams can replay it later, investigate anomalies, and produce evidence for audits.
These capabilities can only be guaranteed when the enforcement point sits between the caller and the model serving process. That is where hoop.dev enters the architecture.
How hoop.dev provides the missing enforcement layer
hoop.dev is a Layer 7 gateway that proxies connections to infrastructure, including self‑hosted model servers. It terminates the client connection, authenticates the caller via OIDC/SAML, and then forwards the request to the model with its own service credentials. Because the model never sees the user’s token, the gateway can apply policy checks on every request.
