Many assume that adding a reviewer after a Retrieval Augmented Generation (RAG) pipeline finishes is enough to guarantee safe output, but that skips true human-in-the-loop approval. The reality is that once the language model receives a prompt, it can already leak proprietary data or generate harmful content before any human ever sees it. A post‑generation check does not stop the model from accessing the underlying knowledge base, nor does it provide an audit trail of what was asked.
RAG systems typically stitch together a vector store, a LLM, and an application layer. Engineers often wire the components together with static API keys, give the service account broad read access to the vector database, and let the LLM run unchecked. The result is a fast prototype, but the organization loses visibility into who queried what, when, and whether the response needed extra scrutiny.
What teams really need is a control point that can enforce human-in-the-loop approval *before* the model returns a response, while still preserving the developer experience. The control point must be able to see the full request, pause execution for an approver, optionally mask sensitive fields, and record the entire exchange for later review. Importantly, the control must sit on the data path, not merely in an external logging system.
Why the existing setup falls short
In the typical unsanitized state, a RAG service runs inside a private subnet with a service account that has read‑only access to the vector store and unrestricted invoke rights on the LLM. The service account is created once, stored in a secret manager, and shared across multiple micro‑services. No per‑request authentication or authorization is performed. When a user triggers a query, the request travels directly from the application to the LLM, bypassing any gate that could require a human decision.
Even when organizations add an external approval workflow, say, a ticket that a reviewer signs off on after the fact, the LLM has already processed the prompt. If the model inadvertently returns a confidential snippet, the damage is done. Moreover, because the request never passes through a central enforcement point, there is no reliable audit log that ties the query to a specific identity, nor is there any guarantee that sensitive fields are redacted before they leave the system.
Defining the precondition for safe RAG
The core requirement is a request‑level guard that can evaluate each incoming prompt, enforce human-in-the-loop approval, and optionally mask or block content. This guard must understand the user’s identity, enforce least‑privilege access, and be positioned where it can intercept traffic before the LLM sees it. The guard does not replace the identity provider; it consumes the identity token to decide whether the request is eligible for automatic execution or needs a manual sign‑off.
At this stage, the system still lacks a concrete enforcement mechanism. The request still reaches the LLM directly, without any real‑time approval, without any guarantee that the response will be inspected, and without a tamper‑evident record of the exchange. The precondition therefore solves the “who can ask” problem but leaves the “what happens to the request” problem open.
Introducing a data‑path gateway as the solution
Enter hoop.dev. It is an identity‑aware proxy that sits between the RAG application and the language model. By placing hoop.dev on the data path, every prompt and response passes through a single enforcement point. hoop.dev reads the OIDC token presented by the caller, checks the caller’s group membership, and then decides whether to allow the request to proceed automatically or to pause for a human approver.
Because hoop.dev is the active component in the path, it can enforce the following outcomes:
