An offboarded contractor leaves behind a CI job that stitches together a large language model, a vector store, and a private API key. Weeks later a junior engineer runs a test query, the model dutifully returns a snippet of code that includes the forgotten key, and the secret lands in a public GitHub gist. The breach spreads because the pipeline never questioned whether that piece of data should leave the internal network, exposing a credential leakage scenario that could have been prevented.
Retrieval‑augmented generation (RAG) pipelines are built to pull context from external sources – databases, document stores, or internal APIs – and feed it to an LLM. The same request that asks for a product description can also surface credentials that were stored alongside the data, especially when developers embed secrets in configuration tables or expose them through poorly scoped endpoints.
Understanding credential leakage in RAG
Credential leakage occurs when a secret – API token, database password, or cloud credential – escapes the controlled boundary of a system and becomes observable by an unintended party. In a RAG flow the leakage vector is often the retrieval step. The retrieval component runs a SQL query or an HTTP request, receives rows that contain both business data and hidden tokens, and passes the raw response to the LLM. The model, trained to repeat patterns, may then emit the secret in its generated text.
Because the LLM treats the retrieved text as ordinary content, the downstream user sees the secret without any indication that it originated from a protected source. The problem is amplified when the output is logged, shared, or stored in a location that lacks the same security controls as the original data store.
Why existing setup is insufficient
Most teams rely on a setup where identity providers (Okta, Azure AD, etc.) issue tokens that allow a service account to call the retrieval layer. The service account is granted a broad role that can read many tables, and the credentials are baked into environment variables or configuration files. This arrangement satisfies the "who can start" question, but it provides no enforcement on the actual data path.
Even when teams adopt least‑privilege roles, the request still travels directly from the RAG application to the database or API. No component inspects the payload, no policy decides whether a particular column should be exposed, and no audit record captures the exact query that caused the leak. The result is a blind spot: the system can authenticate the request, yet it cannot guarantee that the response respects confidentiality requirements.
Why a data‑path gateway is required
The only place to enforce fine‑grained controls is the data path itself – the network hop where the request leaves the application and reaches the target resource. By placing a proxy at this boundary, an organization can:
- Inspect each query or HTTP request before it reaches the backend.
- Apply inline masking to fields that contain secrets.
- Require just‑in‑time approval for operations that match risky patterns.
- Record the full session for later replay and audit.
Without such a gateway, any attempt to add these controls would have to be woven into every client library, increasing complexity and opening new attack surfaces.
