All posts

LGPD for Chunking

An offboarded contractor’s nightly data‑processing job still runs, pulling large chunks of customer records from a warehouse. The script uses a static credential stored in a CI secret and writes the raw rows to a log file that no one monitors. When a regulator asks for proof that the organization respects the Brazilian General Data Protection Law, the team can’t point to any record of who accessed which data, how it was filtered, or whether the extraction was approved. lgpd requires that any pe

Free White Paper

LGPD (Brazil): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

An offboarded contractor’s nightly data‑processing job still runs, pulling large chunks of customer records from a warehouse. The script uses a static credential stored in a CI secret and writes the raw rows to a log file that no one monitors. When a regulator asks for proof that the organization respects the Brazilian General Data Protection Law, the team can’t point to any record of who accessed which data, how it was filtered, or whether the extraction was approved.

lgpd requires that any personal data processing be documented, that data minimisation be enforced, and that individuals’ rights to access, correction and deletion be demonstrable. For chunking workloads, which are batch jobs that retrieve slices of a database or data lake, this translates into three concrete obligations:

  • Every request that extracts personal data must be tied to a verified identity.
  • Sensitive fields must be masked or redacted before they leave the controlled environment.
  • A tamper‑evident audit trail must capture who asked for the chunk, when, what size, and whether an approval workflow was satisfied.

Most teams build their pipelines with a handful of service accounts, grant those accounts broad read privileges, and let the job run unattended. The connection goes straight from the compute node to the database, bypassing any central policy engine. The result is a blind spot: the setup decides who can start the job, but it provides no enforcement on the data path, and there is no reliable evidence that lgpd‑required controls were applied.

The missing piece is a gateway that sits between the identity layer and the chunking target. The gateway must be the only place where request validation, masking, approval and logging occur. Without that data‑path enforcement, any audit the organization produces would be incomplete, and a regulator could easily deem the practice non‑compliant.

hoop.dev is a layer‑7 gateway that proxies connections to databases, storage services and other chunkable resources. It sits in the data path, intercepts each request, and applies policy before the traffic reaches the target. The system integrates with standard OIDC or SAML identity providers, so the same tokens that grant access to the CI system also authenticate to the gateway.

How lgpd defines evidence for chunking

lgpd treats personal data as any information that can identify a natural person. When a chunking job requests a subset of rows, the law expects the organization to prove:

  • Identity of the requester – captured from the verified token.
  • Purpose and scope of the extraction – recorded in an approval record.
  • Data minimisation – enforced by masking or column‑level filters before data leaves the controlled zone.
  • Retention of the audit log – immutable evidence that can be presented to auditors.

hoop.dev provides each of these elements directly in the data path. It reads the caller’s identity, checks the request against a policy that defines allowed tables, columns and row limits, and, if the request exceeds a pre‑defined threshold, routes it to a human approver. Once approved, hoop.dev masks any fields marked as personal data, then forwards the sanitized chunk to the downstream job.

Continue reading? Get the full guide.

LGPD (Brazil): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Setup: identity and least‑privilege grants

The first line of defence is the authentication layer. Organizations configure an OIDC provider (Okta, Azure AD, Google Workspace, etc.) and define groups that map to specific data‑processing roles. hoop.dev verifies the token, extracts the group membership, and uses that information to decide whether the request can proceed. This step decides who the request is, but it does not enforce any data‑level rules on its own.

The data path: hoop.dev as the enforcement boundary

All chunking traffic is forced through hoop.dev. Because the gateway terminates the protocol, it can inspect SQL statements, REST calls or other query languages before they reach the database. This is the only place where policy can be applied reliably; the downstream service never sees the raw request. hoop.dev therefore becomes the single source of truth for enforcement.

Enforcement outcomes generated by hoop.dev

  • hoop.dev records each chunking session, explicitly logging who asked for the chunk, the identity, timestamp, query text and result size.
  • hoop.dev masks personal columns in real time, ensuring that downstream logs and files never contain raw identifiers.
  • hoop.dev requires just‑in‑time approval for large extracts, providing a human‑in‑the‑loop checkpoint that satisfies lgpd’s purpose‑limitation requirement.
  • hoop.dev blocks disallowed commands, such as full table scans on tables that contain sensitive data, reducing the risk of accidental over‑exposure.
  • hoop.dev stores session recordings that can be replayed for forensic analysis, giving auditors a complete picture of what happened.

Because every enforcement outcome originates from the gateway, the organization can generate the evidence lgpd demands without stitching together logs from disparate systems.

Getting started with hoop.dev

To adopt this approach, begin with the getting started guide. Deploy the gateway using the provided Docker Compose file or Kubernetes manifest, register your chunkable resources, and configure your identity provider. The learn section contains detailed policy examples for masking, approval thresholds and audit‑log retention that map directly to lgpd requirements.

FAQ

Does hoop.dev store personal data itself?

No. hoop.dev only proxies traffic and records metadata about the request. Any personal data that passes through is either masked or discarded before it is written to storage.

Can I retroactively audit chunking jobs that ran before hoop.dev was installed?

hoop.dev can only capture events that flow through it. For historical jobs, you would need to rely on existing database logs or enable logging on the source system.

How does masking work for lgpd‑sensitive fields?

Policies define which columns are considered personal data. hoop.dev replaces those values with a deterministic placeholder or removes them entirely before the response is sent to the client. This ensures downstream systems never see raw identifiers.

Ready to see how the gateway can close the lgpd evidence gap for your chunking workloads? Explore the source code on GitHub and start building a compliant data‑processing pipeline today.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts