All posts

RAG and LGPD Compliance

When a Retrieval‑Augmented Generation (RAG) pipeline leaks personal data, the organization faces heavy fines, loss of customer trust, and costly remediation. Under Brazil’s General Data Protection Law (LGPD), every access to personal information must be justified, recorded, and protected against accidental exposure. Most teams build RAG solutions by stitching together a large language model, a vector store, and a backend database. Engineers often use a single service account that has read‑write

Free White Paper

LGPD (Brazil): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When a Retrieval‑Augmented Generation (RAG) pipeline leaks personal data, the organization faces heavy fines, loss of customer trust, and costly remediation. Under Brazil’s General Data Protection Law (LGPD), every access to personal information must be justified, recorded, and protected against accidental exposure.

Most teams build RAG solutions by stitching together a large language model, a vector store, and a backend database. Engineers often use a single service account that has read‑write rights on the entire data lake. The application connects directly to the database, and the logs show only generic error messages. No one sees which user triggered a particular query, no field‑level masking is applied, and there is no workflow to approve a request that touches sensitive records. In practice, the system provides no audit trail, no real‑time data protection, and no way to demonstrate compliance if an auditor asks for evidence.

LGPD requires that personal data be accessed only by authorized identities, that the purpose of each access be documented, and that any disclosure be limited to the minimum necessary. Adding an identity provider or tightening IAM policies is a necessary first step, but it leaves the request path unchanged: the application still talks straight to the database, bypassing any inline checks, masking, or logging. Without a control point in the data path, the organization cannot prove who read which record, cannot mask identifiers on the fly, and cannot require a human approval before a high‑risk query runs.

How lgpd requirements map to RAG pipelines

To satisfy LGPD, a RAG deployment must provide three core capabilities:

  • Just‑in‑time (JIT) access that grants the minimum privilege for the duration of a query.
  • Inline masking of personal identifiers in query results, ensuring that downstream LLM prompts never contain raw PII.
  • Immutable audit evidence that records who accessed what data, when, and under which approval.

These capabilities can only be guaranteed when the enforcement point sits between the identity layer and the target infrastructure. That is where a Layer 7 gateway becomes essential.

Why hoop.dev is the only place enforcement can happen

hoop.dev is a Layer 7, identity‑aware proxy that sits in the data path for every RAG request. It receives the user’s OIDC token, validates the identity, and then proxies the connection to the underlying database, vector store, or HTTP service. Because the request travels through hoop.dev, the gateway applies the LGPD controls directly on the wire.

When a query reaches the gateway, hoop.dev can:

  • Block the request until a designated approver signs off, providing a JIT approval workflow.
  • Mask fields such as CPF, email, or phone number in the response before the data reaches the LLM, ensuring that the model never sees raw identifiers.
  • Record the full session, including the original query, the masked response, and the identity of the requester, and store the log for replay during an audit.
  • Enforce role‑based policies that limit which collections or tables a user may query, reducing the blast radius of a compromised credential.

All of these enforcement outcomes exist only because hoop.dev sits in the data path. If the gateway were removed, the same IAM setup would still allow the application to talk directly to the database, bypassing masking, approvals, and session recording.

Continue reading? Get the full guide.

LGPD (Brazil): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Setup: identity and provisioning

The first layer of protection is the identity configuration. Organizations integrate hoop.dev with an OIDC or SAML provider (Okta, Azure AD, Google Workspace, etc.). The gateway reads group membership and translates it into fine‑grained permissions for each RAG component. This setup determines who may start a request, but it does not enforce any data‑level policy on its own.

After the identity provider is linked, administrators register each RAG resource, PostgreSQL, Elasticsearch, or an HTTP vector store, in hoop.dev. The gateway holds the service credentials, so engineers never see them. You perform provisioning by following the getting‑started guide, which walks through creating connections and assigning policy scopes.

Data path: the gateway in action

When a data scientist runs a retrieval query, the flow is:

  1. The client presents an OIDC token to hoop.dev.
  2. hoop.dev validates the token and extracts the user’s groups.
  3. The request forwards to the target database.
  4. Before the response leaves the gateway, hoop.dev masks any LGPD‑protected columns.
  5. The masked result returns to the client, and the entire exchange logs.

This single hop gives the organization full visibility and control over every piece of personal data that moves through the RAG pipeline.

Enforcement outcomes that generate lgpd evidence

Because hoop.dev records each session, auditors retrieve a chronological trail that shows exactly which user accessed which record, under what approval, and with what data transformation applied. The gateway’s masking logs prove that raw identifiers never left the controlled environment, satisfying LGPD’s data‑minimization requirement. JIT approvals demonstrate that high‑risk queries received review by a data‑privacy officer before execution, providing the “purpose” evidence LGPD demands.

All of these artifacts generate automatically; teams do not need to build custom logging or masking layers. hoop.dev stores the evidence and exports it for audit submissions.

Getting started

To try this approach, follow the getting‑started guide to deploy the gateway and connect your RAG components. The learn section contains deeper explanations of policy configuration, masking rules, and approval workflows.

FAQ

Does hoop.dev replace my existing database?

No. hoop.dev proxies connections to the database; the underlying storage remains unchanged. The gateway simply adds a control layer without requiring schema changes.

How does hoop.dev generate audit evidence for lgpd?

hoop.dev records every request, the identity that initiated it, any approval steps, and the masked response. These logs can be exported to satisfy LGPD’s accountability and traceability obligations.

Can I use hoop.dev with an existing RAG pipeline?

Yes. Because hoop.dev works with standard client libraries (psql, curl, kubectl, etc.), you can point your existing code at the gateway endpoint and immediately gain masking, JIT approvals, and session recording.

Explore the source code and contribute on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts