How can you prevent sensitive data from leaking when you embed it for AI models?
Embedding services turn raw text, tables, or code into dense vectors that downstream models consume, but without data masking the raw input can be exposed to logs or downstream caches. The process often happens behind a public API or a shared inference server. If a request includes personally identifiable information, credit‑card numbers, or proprietary code, that raw payload can be stored in logs, cached, or even returned inadvertently by the model. Organizations typically try to scrub data upstream, but manual redaction is error‑prone and does not scale across dozens of micro‑services.
Even when developers add a pre‑processor that removes obvious patterns, sophisticated models can reconstruct fragments from the embedding space, creating a covert channel for data exfiltration. The fundamental problem is that the transformation from raw input to vector happens inside a trusted component that also has direct access to the underlying resource. Without a transparent enforcement point, you cannot guarantee that every piece of sensitive text is consistently masked before it ever reaches the model.
What you need is a boundary that sits between the caller and the embedding engine, where policies can be inspected and applied in real time. That boundary must be identity‑aware, so it knows which user or service is making the request, and it must operate at the protocol layer, so it can modify the payload without requiring changes to the client or the embedding service.
Data masking for embeddings
Data masking is the practice of replacing or redacting sensitive fields in a data stream while preserving the overall structure needed for downstream processing. In the context of embeddings, masking typically targets raw text segments that match patterns such as social security numbers, email addresses, or proprietary identifiers. The goal is to ensure that the vector generation step never sees the original secret, thereby eliminating the risk of the secret being stored in model weights, logs, or cache layers.
Effective masking must satisfy three requirements:
- Policy‑driven. Rules are defined centrally and can be updated without redeploying the embedding service.
- Inline. The transformation occurs on the fly, so the original payload never leaves the gateway.
- Auditable. Every masking decision is recorded for later review, providing evidence for compliance audits.
Setup – who can request an embedding
The first line of defense is identity. Users, CI pipelines, or AI agents authenticate against an OIDC or SAML provider. The token they present conveys who they are and what groups they belong to. This step decides whether a request is allowed to proceed at all, but it does not enforce any masking. It is a necessary prerequisite because the gateway needs to know the requester's context before applying policy.
The data path – where enforcement lives
Once the identity is verified, the request is handed to a Layer 7 gateway that proxies the connection to the embedding service. This gateway is the only place where the raw payload can be inspected and altered. By placing the gateway in the data path, you guarantee that no downstream component can bypass the masking logic.
hoop.dev implements exactly this pattern. It sits between the caller and the target, reads the OIDC token, and then applies inline data masking according to the policies you define. Because the gateway holds the credential for the embedding service, the client never sees it, and the gateway can rewrite the request before forwarding it.
