All posts

MCP gateways: what they mean for your data exfiltration (on BigQuery)

When data exfiltration from BigQuery is fully contained, every query that leaves the warehouse is either approved, masked, or blocked, and you have an immutable record of who attempted what. Current practice: open BigQuery access Most teams grant engineers and service accounts a static service‑account key or a long‑lived OAuth token that lets a client connect directly to the BigQuery endpoint. The client talks straight to the Google API, and the request bypasses any central control point. Bec

Free White Paper

AI Data Exfiltration Prevention + Single Sign-On (SSO): The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When data exfiltration from BigQuery is fully contained, every query that leaves the warehouse is either approved, masked, or blocked, and you have an immutable record of who attempted what.

Current practice: open BigQuery access

Most teams grant engineers and service accounts a static service‑account key or a long‑lived OAuth token that lets a client connect directly to the BigQuery endpoint. The client talks straight to the Google API, and the request bypasses any central control point. Because the credential is stored on the workstation or in a CI pipeline, anyone who obtains it can run arbitrary SELECT statements, export tables, or copy data to external storage without additional oversight. Auditing is limited to the logs that Google Cloud generates, which often lack the granularity needed to tie a specific query to a human decision.

Why that model leaves data exfiltration open

The direct connection model satisfies the first two layers of identity – an OIDC token proves who the caller is, and IAM policies give the caller the least‑privilege permissions needed for the job. Those layers, however, do not provide a place to enforce runtime guardrails. The request reaches BigQuery unchanged, so there is no opportunity to inspect the SQL, to require an approval step for large result sets, or to mask columns that contain personally identifiable information. In short, the setup decides who may start a session, but it does not control what the session does.

Introducing an MCP gateway as the data‑path control

Placing an MCP (Model‑Control‑Proxy) gateway between the identity layer and BigQuery creates a single, enforceable boundary. The gateway runs on a host inside the same network as the BigQuery proxy endpoint, and every client – whether a human, a CI job, or an AI‑assisted tool – must route its queries through this proxy. Because the gateway terminates the wire‑protocol, it can examine each SQL statement before it reaches the warehouse.

How the gateway enforces protection

  • hoop.dev requires just‑in‑time approval for queries that exceed a configurable result‑size threshold, sending the request to a designated approver before it is forwarded.
  • hoop.dev masks sensitive columns in query results in real time, replacing values that match policy‑defined patterns with placeholder text.
  • hoop.dev blocks prohibited commands such as EXPORT, COPY TO GCS, or any statement that references a black‑listed dataset.
  • hoop.dev records every session, storing the full request, the decision path (approved, blocked, or masked), and the identity that initiated it for replay and audit.

All of these outcomes exist only because the gateway sits in the data path. The identity configuration that grants a token does not, by itself, prevent a rogue SELECT that pulls an entire table. By routing traffic through the gateway, you gain a single point where policy can be applied consistently, regardless of the client language or automation framework.

Continue reading? Get the full guide.

AI Data Exfiltration Prevention + Single Sign-On (SSO): Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Getting started with hoop.dev

Deploy the gateway using the Docker Compose quick‑start, configure the BigQuery connection, and point your bq client or any compatible library at the proxy address. The official getting‑started guide walks you through the minimal steps, and the learn section provides deeper insights into masking policies, approval workflows, and session replay. Because hoop.dev is open source, you can review the implementation or contribute enhancements directly from the repository.

FAQ

Q: Does the gateway add latency to every query?
A: The gateway processes the SQL only once per request, performing lightweight policy checks before forwarding. In most workloads the added latency is measured in milliseconds and is outweighed by the security benefit.

Q: Can I still use existing service‑account keys for authentication?
A: Yes. The gateway accepts OIDC tokens issued by your identity provider. Those tokens can be obtained via the same service‑account keys you already use, but the gateway never exposes the raw key to the client.

Q: How do I prove to auditors that exfiltration attempts were blocked?
A: hoop.dev’s session logs contain a record of every query, the decision taken (approved, masked, blocked), and the identity that issued it. Those logs satisfy the evidence requirements for most data‑protection standards.

Explore the open‑source code on GitHub: https://github.com/hoophq/hoop.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts