An offboarded contractor still has a CI job that writes prompts to a shared LLM endpoint, and the job’s service account can read from the same internal vector store used by the production RAG pipeline, creating a clear path for data exfiltration. When the contractor’s token is not revoked, the job can issue queries that return proprietary documents, then push the raw responses to an external webhook. The organization discovers the leak only after the external endpoint logs a large volume of outbound traffic.
This scenario illustrates a common, unguarded reality: Retrieval‑Augmented Generation (RAG) systems often expose internal knowledge bases directly to language models, and the surrounding automation trusts service accounts that have broad read access. The data path is typically a straight TCP connection from the LLM client to the vector database, with no visibility into which queries are issued or what results are returned. Because the connection is unmediated, there is no audit trail, no inline filtering, and no way to intervene when a query attempts to extract sensitive text.
Why data exfiltration is a hidden risk in RAG
RAG pipelines combine a large language model with a retrieval layer that pulls documents from a knowledge store. The model then synthesizes an answer that may contain verbatim excerpts. If an attacker or a mis‑configured job can issue arbitrary queries, they can construct prompts that coax the system into spitting out confidential sections, source code, or customer PII. Since the retrieval layer often runs on the same network as internal services, the threat surface includes any identity that can reach the store – including CI agents, bots, and over‑privileged human users.
Because the retrieval step happens at runtime, traditional static data‑loss‑prevention tools that scan files at rest miss the problem. The exfiltration happens in‑flight, embedded in natural‑language responses, and can be silently forwarded to external endpoints via webhooks, logs, or even copy‑paste by a malicious operator.
What a typical RAG deployment lacks
Most teams rely on a setup that grants a service account read permission on the vector database and trusts the LLM client to honor internal policies. The identity provider may issue a token, and the token’s claims are used to allow the connection. This setup decides who can start a request, but it does not enforce any guardrails on the request itself. The request still travels directly to the database, bypassing any inspection or approval step.
Consequently, the following gaps remain:
- No real‑time inspection of queries or responses.
- No just‑in‑time approval for risky retrieval patterns.
- No session recording that could be replayed for forensic analysis.
- No inline masking of sensitive fields that appear in LLM answers.
Without a control surface that sits in the data path, these gaps cannot be closed.
Introducing a gateway that sits in the data path
hoop.dev provides a Layer 7 gateway that proxies every RAG connection. By placing hoop.dev between the identity layer and the vector store, the gateway becomes the only place where enforcement can happen. The gateway inspects each request, applies policies, and records the entire session.
