A common misconception is that policy as code can be dropped onto a RAG pipeline without any runtime guardrails. In reality, the moment a language model reaches out to a vector store or an external API, the request bypasses any static policy file and runs unchecked.
Why policy as code matters for RAG
Most teams build Retrieval Augmented Generation (RAG) systems by stitching together three moving parts: a large language model (LLM) service, a vector database that holds embeddings, and a data‑ingestion job that feeds new documents. Engineers typically embed API keys for the LLM and credentials for the vector store directly in application code or environment files. Those secrets are long‑lived, often shared across services, and rarely rotated. The data flow travels straight from the application to the LLM endpoint, then to the vector store, without any central point that can inspect, approve, or log the interaction.
This approach leaves two glaring gaps. First, the policy definitions that developers write, rules that say “do not return PII” or “limit queries to 10 tokens per minute”, remain on a repository and never see the live traffic. Second, even if a policy engine were consulted, the request still reaches the target directly; there is no place to block a dangerous query, redact a response, or capture a replayable session for later audit.
Where the gap persists
Introducing policy as code into a RAG workflow fixes the *definition* of what is allowed, but it does not automatically enforce those rules at the point of execution. The request still travels from the client to the LLM or vector store over the public internet or a private VPC link. Without a runtime enforcement layer, a malicious actor who compromises a service account can still issue unrestricted queries, extract confidential embeddings, or cause the model to hallucinate sensitive data. The system lacks a single source of truth for who asked what, when, and with what result.
hoop.dev as the enforcement layer
hoop.dev provides the missing data‑path component. It sits as a Layer 7 gateway between every identity, human engineers, CI pipelines, or AI agents, and the RAG infrastructure. When a client asks the LLM to generate a response, the request first passes through hoop.dev. The gateway reads the caller’s OIDC token, validates group membership, and then applies the policy‑as‑code rules that have been authored for the RAG pipeline.
Because hoop.dev is the proxy that actually opens the connection, it can block a query that violates a rate‑limit, mask any PII that appears in the model’s answer, and route a high‑risk request to a human approver before it reaches the LLM. The gateway also records the entire session, including the original prompt and the final response, and stores the log in a place that only the gateway can write, providing a reliable audit trail.
How the enforcement works
Identity is handled by an OIDC or SAML provider such as Okta, Entra, or Google Workspace. hoop.dev acts as the relying party, verifying the token and extracting claims that drive policy decisions. The gateway holds the LLM API key and the vector‑store credentials; callers never see them. A lightweight agent runs inside the same network as the vector store, ensuring that the connection to the database is always mediated by the gateway.
When a request arrives, hoop.dev evaluates the policy‑as‑code script associated with that RAG endpoint. If the script permits the operation, the gateway forwards the request, injecting the stored credential on the fly. If the script rejects the request, hoop.dev returns an error to the caller without ever contacting the downstream service. In either case, the full request‑response pair is written to an audit log that can be queried later for compliance evidence.
What to watch for when adopting policy as code for RAG
- Policy granularity: Write rules that match the protocol level you are protecting. For LLM calls, filter on prompt length, token budget, and presence of regex‑matched patterns. For vector‑store queries, limit the number of returned vectors and enforce namespace scoping.
- Version control: Store policy files in a version‑controlled repository and tie each version to a deployment of hoop.dev. This ensures that the policy governing a request can be traced back to a commit hash.
- False positives: Start with a monitoring mode that logs violations without blocking them. Review the logs, refine the rules, then enable enforcement.
- Audit readiness: Use hoop.dev’s session recordings to satisfy auditors who need evidence of who accessed which data and when. The recordings are stored in the gateway’s audit log.
- Just‑in‑time access: Combine policy enforcement with hoop.dev’s approval workflow. High‑risk queries can be paused for manual sign‑off, reducing the blast radius of a compromised service account.
Getting started
Deploy the gateway using the quick‑start Docker Compose file, configure OIDC authentication, and register your LLM and vector‑store endpoints as connections. The getting‑started guide walks you through each step. Once the connections are in place, author your policy‑as‑code rules and upload them through the learn portal. From that point forward, every RAG request will be inspected, logged, and, if necessary, masked or blocked by hoop.dev.
FAQ
Does hoop.dev store my LLM API keys?
Yes. The gateway holds the credentials securely and injects them only when a request passes policy evaluation. Callers never see the raw keys.
Can I still use existing CI pipelines with hoop.dev?
Absolutely. CI jobs authenticate to hoop.dev with OIDC tokens, just as a human would. The same policy‑as‑code rules apply to automated workloads.
How does hoop.dev help with compliance audits?
hoop.dev records each session, including the original request and the final response, and stores them in a log that only the gateway can write. Those records provide the evidence auditors look for when verifying access controls and data‑handling policies.
Explore the open‑source code on GitHub: https://github.com/hoophq/hoop.