All posts

LGPD for Vector Databases

When a contractor leaves a project, the team often forgets to rotate the static API key that powers the vector search service, exposing lgpd‑covered personal data. The key lives in a CI pipeline, in a developer’s local config file, and in a shared secrets vault. Even after the contract ends, the credential can still be used to pull or insert embeddings that contain personal data. That scenario illustrates a broader reality: many organizations treat vector databases like any other data store, gr

Free White Paper

Vector Database Access Control + LGPD (Brazil): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When a contractor leaves a project, the team often forgets to rotate the static API key that powers the vector search service, exposing lgpd‑covered personal data. The key lives in a CI pipeline, in a developer’s local config file, and in a shared secrets vault. Even after the contract ends, the credential can still be used to pull or insert embeddings that contain personal data.

That scenario illustrates a broader reality: many organizations treat vector databases like any other data store, granting long‑lived credentials to services and people without a central enforcement point. The connection is made directly from the client to the database, the traffic is opaque to anyone besides the client, and no record exists of who queried which vector or when. When the data includes personal information, the lack of visibility becomes a compliance risk.

What LGPD demands for vector databases

LGPD (Lei Geral de Proteção de Dados) requires that personal data be processed only for legitimate purposes, that access be limited to the minimum necessary, and that organizations retain evidence of who accessed the data, when, and under what authority. For a vector database, the law translates into three concrete expectations:

  • Purpose limitation: queries that retrieve or store personal embeddings must be justified and documented.
  • Access control: only identities with a defined need‑to‑know should be able to run similarity searches or insert new vectors.
  • Auditability: every read or write operation must be logged in a way that cannot be altered so auditors can verify compliance.

Meeting these requirements is impossible when the database sits behind a static credential that bypasses any policy engine.

Why traditional setups fall short

Most teams rely on a combination of service accounts, environment variables, and secret‑management tools to connect to a vector store. This approach satisfies the setup layer: identity is represented by a token, and the token is scoped to a set of permissions. However, the data path – the actual network hop that carries the query – remains uncontrolled. The result is a gap where the following problems persist:

  • There is no real‑time check that a query aligns with an approved purpose.
  • Sensitive fields in query results (for example, user identifiers embedded in vectors) are returned in clear text.
  • Session activity is not recorded, so a later audit cannot answer who extracted a specific embedding.

In other words, the setup gives you a user identity, but without a gateway in the data path you cannot enforce LGPD’s core controls.

How hoop.dev creates the compliance evidence

hoop.dev is designed to sit in the data path between the requester and the vector database. By proxying every connection, hoop.dev becomes the only place where enforcement can happen. The platform provides three LGPD‑focused outcomes:

Continue reading? Get the full guide.

Vector Database Access Control + LGPD (Brazil): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.
  • Session recording: hoop.dev records each query and response, timestamps the interaction, and stores the log outside the client process. Auditors can replay a session to verify that the purpose claim matches the actual operation.
  • Inline masking: hoop.dev can redact personal identifiers from query results before they reach the client, ensuring that downstream tools never see raw personal data.
  • Just‑in‑time approval: for high‑risk queries, hoop.dev routes the request to a human approver. The approval decision is attached to the audit record, satisfying LGPD’s requirement for purpose justification.

Because hoop.dev is the gateway, these outcomes exist only because the platform sits in the data path. Removing hoop.dev would instantly eliminate the masking, the approval workflow, and the immutable session log.

Setting up hoop.dev starts with the standard getting‑started guide. The gateway is deployed as a Docker Compose stack or a Kubernetes pod, and it authenticates users via OIDC or SAML providers. Once the gateway is running, the vector database is registered as a connection. From that point forward, every client – whether a CI job, a data scientist’s notebook, or an AI‑driven service – talks to the database through hoop.dev. Further details on configuring masking and approval workflows are available in the hoop.dev documentation.

Because the gateway holds the database credentials, the client never sees them. This separation reinforces the principle of least privilege: the client’s identity is known, but the secret that actually talks to the database stays behind the gateway.

Generating LGPD‑ready evidence

When an auditor asks for proof of compliance, hoop.dev can export the session logs in a structured format that includes:

  • Requester identity (derived from the OIDC token).
  • Timestamp of each operation.
  • Purpose tag attached during the approval step.
  • Masking actions applied to the response.

These logs give organizations the concrete evidence LGPD expects, without requiring additional third‑party tooling.

FAQ

Does hoop.dev make a vector database LGPD‑compliant?

No. hoop.dev generates the audit evidence and enforces the controls that LGPD requires. Full compliance also depends on broader data‑handling policies, data‑subject rights processes, and organizational governance.

Can hoop.dev mask personal data in query results?

Yes. The gateway can apply inline masking rules to any field in the response, ensuring that downstream applications never receive raw personal identifiers.

How does hoop.dev handle non‑human identities?

hoop.dev integrates with OIDC and SAML providers, allowing service accounts, CI pipelines, and AI agents to authenticate using short‑lived tokens. The gateway then applies the same LGPD controls to those non‑human identities.

Ready to see the architecture in action? Explore the open‑source repository on GitHub and start protecting your vector data today.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts