All posts

PII/PHI redaction for MCP servers on BigQuery

Many assume that an MCP server automatically keeps data safe, but the server streams raw rows straight to the client, and any personally identifiable or protected health information travels unchanged unless you add a masking layer. That oversight means you miss the chance to apply pii/phi redaction where it matters most. Teams often rely on a shared Google service‑account key to let the MCP runtime talk to BigQuery. That key grants broad read access, and without a dedicated guardrail the same cr

Free White Paper

Single Sign-On (SSO) + SSH Bastion Hosts / Jump Servers: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Many assume that an MCP server automatically keeps data safe, but the server streams raw rows straight to the client, and any personally identifiable or protected health information travels unchanged unless you add a masking layer. That oversight means you miss the chance to apply pii/phi redaction where it matters most. Teams often rely on a shared Google service‑account key to let the MCP runtime talk to BigQuery. That key grants broad read access, and without a dedicated guardrail the same credential can be used by any process that reaches the server.

When a data‑science notebook, an automated pipeline, or an AI‑augmented assistant issues a query, the response can contain names, social security numbers, or medical codes. If a developer logs the result, copies it to a temporary bucket, or shares a screen, the exposure spreads quickly. The core problem is that the access path, client → MCP server → BigQuery, contains no inline inspection point. The request passes identity checks, the query runs, and the raw payload returns to the caller.

Why pii/phi redaction matters for BigQuery via MCP servers

Regulatory frameworks treat any export of protected data as a compliance event. Auditors expect evidence that each access was scoped, that sensitive fields were masked, and that the activity can be replayed. When each team builds its own masking logic, policy fragments, coverage gaps appear, and new services slip through unchecked.

The missing control in a typical MCP‑to‑BigQuery flow

A typical flow looks like this: a user authenticates to an identity provider, receives an OIDC token, and the MCP server uses a static Google service‑account key to call the BigQuery API. The token proves who can start the MCP process, but it does not influence what the BigQuery query returns. The service account grants the same privileges to every request, and no component inspects the result before it reaches the user’s console. Consequently, masking, approval, and audit remain optional and scattered across code bases.

Introducing hoop.dev as the data‑path gateway

hoop.dev sits on the only segment where every query and every response must pass: the network layer between the MCP client and the BigQuery endpoint. By proxying the connection, hoop.dev applies policy decisions in real time, without requiring any changes to the MCP server or the client tools.

Setup – identity and least‑privilege

First, you configure OIDC or SAML authentication for the gateway. Users present tokens from their corporate IdP, and hoop.dev validates those tokens, extracts group membership and user attributes, and decides who may start a session. For BigQuery, hoop.dev can request per‑user OAuth tokens when GCP IAM federation is enabled, or fall back to a tightly scoped service‑account key that only allows the specific dataset needed for the workflow. The identity layer decides who may start a session, but it does not enforce data‑level rules.

Continue reading? Get the full guide.

Single Sign-On (SSO) + SSH Bastion Hosts / Jump Servers: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Data path – where masking happens

All traffic between the MCP client and BigQuery routes through hoop.dev’s gateway. The gateway parses the BigQuery response stream, identifies columns that match configured PII/PHI patterns, and replaces the values with redacted placeholders before the data reaches the client. Because the gateway operates at the protocol layer, the masking applies regardless of the client language, the MCP runtime version, or any custom code the user writes. The gateway is the only place where enforcement can occur.

Enforcement outcomes – what hoop.dev provides

  • hoop.dev masks PII/PHI fields inline, ensuring that no raw sensitive value leaves the data path.
  • hoop.dev records each query, each redaction decision, and the user identity that triggered the session, creating a replayable audit trail.
  • hoop.dev requires a just‑in‑time approval step for queries that request high‑risk columns, pausing execution until an authorized reviewer grants consent.
  • hoop.dev never exposes the underlying service‑account key to the MCP process; the agent holds the credential securely.

Each of these outcomes exists only because hoop.dev occupies the data‑path gateway. Removing the gateway returns the system to the original state where raw rows flow unchecked.

Getting started

To try this pattern, follow the getting‑started guide for deploying the gateway and registering a BigQuery connection. The documentation walks you through configuring OIDC, defining a masking policy for common PII/PHI fields, and enabling session recording. All of the heavy lifting lives in hoop.dev; you only need to point your MCP client at the gateway endpoint.

For deeper insight into how masking rules are expressed and how audit data is stored, explore the learn section. It explains the policy language, the built‑in patterns for health‑care identifiers, and how to integrate approval workflows with your existing ticketing system.

FAQ

Does hoop.dev store any raw data?

No. The gateway records only the fact that a query ran, the redaction decisions made, and the user identity. The original payload never persists.

Can I use per‑user OAuth instead of a service‑account key?

Yes. When GCP IAM federation is enabled, hoop.dev exchanges the caller’s OIDC token for a short‑lived OAuth token that scopes to the exact dataset needed. This reduces the blast radius of any compromised credential.

Is the masking performed on the client side?

No. hoop.dev performs all masking inside the gateway before the data travels back to the client. The client never sees unredacted values, regardless of the language or library it uses.

Explore the open‑source repository on GitHub to see how the gateway is built and to contribute enhancements.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts