Why inference needs strict policy enforcement
Uncontrolled inference calls can expose proprietary data, generate disallowed content, or trigger downstream attacks. When a model is invoked directly from a script or a CI pipeline, the request bypasses any review, and the response lands in logs or user interfaces without safeguards.
The current reality of inference pipelines
Most teams embed a model endpoint URL and an API key in application code or environment variables. Engineers, CI jobs, and automated bots reach the model over HTTPS, sending prompts and receiving completions. This pattern gives each caller standing access, meaning the model can be queried any time, from any host that holds the secret. The connection is a raw TCP stream; there is no central point that can inspect the payload, enforce content policies, or record who asked what. Auditors therefore see only the raw logs that the application writes, which often omit the actual prompt or mask it inconsistently.
What policy enforcement alone does not solve
Introducing an identity provider or rotating API keys limits who can obtain credentials, but it does not stop a legitimate user from sending a risky prompt. The request still travels straight to the model endpoint, bypassing any gate that could apply real‑time rules, mask sensitive fields in the response, or require a human approval step before execution. In other words, the setup defines *who* may start a request, but it does not define *what* the request is allowed to do.
Putting the gateway in the data path
hoop.dev acts as an identity‑aware, layer‑7 proxy that sits between callers and the inference service. It receives the request, validates the caller’s OIDC token, and then applies the configured policy set before forwarding the payload to the model. Because the gateway is the only point that can see the request and response, it becomes the natural place to enforce rules.
How hoop.dev enforces policy on inference
When a request arrives, hoop.dev extracts the user identity from the token and checks it against the policy catalog. The policy can specify allowed prompt patterns, maximum token length, or required approval for certain topics. If the request matches a blocked pattern, hoop.dev terminates the connection and returns an error to the caller. For requests that need review, hoop.dev routes the payload to an approval workflow where a designated reviewer can approve or reject the operation. After the model generates a completion, hoop.dev can mask fields that match sensitive data patterns before the response is returned to the client. Every interaction, including the request, the decision, and the masked response, is recorded in a session log that can be replayed later for audit or forensic analysis.
