Nested agents: what they mean for your data exfiltration (on BigQuery)

When every query to BigQuery is inspected, any attempt to siphon data out of the warehouse is blocked, logged, and requires explicit approval. Engineers can still run analytics, but the organization never worries that a compromised service account will silently copy tables to an external bucket.

In practice, many teams reach that ideal by inserting a control point that watches traffic between identities and the data warehouse. The control point validates the caller, checks policy, and then forwards the request. Without that layer, nested agents create a blind spot.

Why nested agents increase data exfiltration risk

Most modern data pipelines use a chain of service accounts. A CI job authenticates with a short‑lived token, which then invokes a data‑processing microservice. That microservice, in turn, runs a BigQuery client using a static service account credential stored in a container image. The credential is shared across environments and rarely rotated. Because the microservice runs inside the same VPC as the data warehouse, the request goes straight to BigQuery over the internal network.

Two problems emerge. First, the original identity – the CI job – is lost once the request hops to the microservice. The warehouse sees only the static service account, making it impossible to attribute a query to a human or a pipeline step. Second, the microservice can issue any query the service account is allowed to run, including SELECT statements that export entire tables to Cloud Storage. If an attacker compromises the microservice, they inherit the service account’s full read scope and can exfiltrate data without triggering any alert.

Because the request bypasses any enforcement layer, there is no audit trail of who initiated the query, no inline masking of sensitive columns, and no approval workflow for high‑risk operations. The organization is left with a "what‑you‑see‑is‑what‑you‑get" model that provides no evidence for auditors and no guardrails against insider threats.

What a data‑path gateway must provide

The missing piece is a gateway that sits in the data path between every nested agent and BigQuery. The gateway must be the only place where policy can be enforced, because the upstream identities and the downstream warehouse cannot be trusted to perform the checks themselves. The gateway’s responsibilities are:

Record each query and the identity that originated it, even when the request passes through multiple agents.
Apply inline masking to columns that contain personally identifiable information, so that downstream consumers only see redacted data.
Require just‑in‑time approval for queries that match a high‑risk pattern, such as exporting more than a threshold number of rows.
Block commands that are known to be dangerous, for example DROP TABLE or EXPORT TO without prior approval.
Store a replayable session log that auditors can review to prove compliance with data‑exfiltration controls.

All of these outcomes exist only because the gateway is positioned where it can see the full request before it reaches BigQuery. If the gateway is removed, the enforcement disappears and the raw service account regains unrestricted access.

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Single Sign-On (SSO): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Introducing hoop.dev as the enforcement point

hoop.dev is an open‑source Layer 7 gateway that fulfills exactly these requirements. It authenticates users and agents via OIDC or SAML, then proxies the connection to BigQuery through a network‑resident agent. Because hoop.dev sits in the data path, it becomes the sole authority that can mask, approve, block, and record every query.

When a request arrives, hoop.dev extracts the original caller’s identity from the token, maps it to a policy, and decides whether the query may proceed. If the query matches a data‑exfiltration risk pattern, hoop.dev either prompts an approver or rejects the request outright. For allowed queries, hoop.dev can rewrite the response stream to hide sensitive fields, ensuring that downstream tools never see raw PII.

Every session is captured in an audit log that includes the originating identity, the exact SQL statement, and the outcome of any masking or approval step. This log can be exported to a SIEM or used directly by auditors to demonstrate that the organization has concrete evidence of controls against data‑exfiltration.

Getting started

To protect BigQuery workloads from nested‑agent leakage, deploy hoop.dev alongside your existing compute environment. The getting‑started guide walks you through the Docker Compose quickstart, OIDC configuration, and registration of a BigQuery connection. The learn section provides deeper coverage of policy definition, inline masking, and approval workflows.

All configuration lives in declarative YAML files, and the gateway runs as a stateless service that can be scaled horizontally. Because hoop.dev is MIT‑licensed, you can inspect the source, contribute improvements, or embed it in a private air‑gapped network.

Frequently asked questions

Does hoop.dev replace existing service‑account keys? No. hoop.dev still needs a credential to talk to BigQuery, but that credential is owned by the gateway, not by individual agents. Agents never see the key, eliminating the primary vector for credential leakage.

Can I still use my CI pipeline to run queries? Yes. Your pipeline authenticates to hoop.dev with an OIDC token, and hoop.dev enforces the same policies you would expect for a human user. The pipeline gains the same audit trail and masking guarantees.

What happens if an attacker compromises a microservice? The attacker would still need a valid OIDC token to reach hoop.dev. Even with a token, any attempt to run a high‑risk query would trigger an approval step or be blocked, and every attempt would be logged.

By placing hoop.dev in the data path, organizations gain a concrete defense against data‑exfiltration that survives the complexity of nested agents and shared service accounts.

Explore the source code on GitHub to see how the gateway is built and to contribute your own extensions.