An analytics agent runs SELECT * FROM users.accounts LIMIT 1000 to summarize signups, and BigQuery hands back a thousand rows of email addresses, phone numbers, and billing details. The agent only needed counts by region. It now holds, in its context window and quite possibly in a downstream prompt or log, a pile of raw personal data it had no business seeing.
Data masking is the control that stops this. For an AI agent on BigQuery, data masking means sensitive columns are redacted before the result ever reaches the agent, so the agent works with usable data and never touches the raw values.
Why masking at the agent is too late
You can ask the agent to filter columns. You can write a prompt that says never select PII. Neither is a control, because both depend on the agent doing what you asked, and an agent that is buggy, jailbroken, or simply over-eager will run the broad query anyway. By the time the agent could redact anything, BigQuery has already returned the raw rows and the exposure has happened.
Redaction has to occur on the path back from BigQuery, before the result reaches the agent, in a layer the agent does not control.
Why data masking belongs on the connection, not in the query
There is a tempting alternative: write careful queries that never select sensitive columns, or build views that exclude them. Those help, but they are not data masking and they do not hold up under an autonomous agent. A view depends on the agent querying the view and not the base table. A careful query depends on the agent staying careful. The first time an agent runs an exploratory SELECT * to understand a schema, the raw values are out.
Data masking on the connection removes that dependency. It does not matter which table the agent hits or how broad its query is, because redaction happens to the result stream regardless. The control is a property of the path, not of the agent's good behavior, and that is exactly why it survives an agent that misbehaves.
How inline masking works on the connection
hoop.dev proxies the connection to BigQuery, so the result set flows back through the gateway. With masking configured, hoop.dev streams that content to a DLP provider, Presidio or Google DLP, for classification and redacts the matched fields inline before results return to the agent. The agent receives a masked result. The raw values never leave the boundary.
Masking on BigQuery connections is configured per connection rather than on by default, so you turn it on with a DLP provider attached and decide which classes of data get redacted.
- Run the hoop.dev agent near your GCP project, connecting outbound to the gateway.
- Create a BigQuery connection with
CLOUDSDK_CORE_PROJECT set, and enable GCP IAM federation for per-user OAuth. - Attach a DLP provider (Presidio or Google DLP) to the connection and turn on masking, choosing the data classes to redact.
- Route the agent's
bq queries through the gateway.
# the agent gets masked output; raw PII never reaches it
bq query --use_legacy_sql=false \
'SELECT email, phone, region FROM users.accounts LIMIT 1000'
# email/phone return redacted; region returns intact
Verify the redaction
Run a query that selects a known PII column as the agent, and confirm the returned values are redacted while non-sensitive columns pass through. Check that the same query run without masking would have exposed the raw data, so you know the gateway, not the agent, did the work.
Pitfalls
- Do not assume masking is on by default for BigQuery. It is configured per connection and needs a DLP provider attached.
- Do not rely on column-level prompts to the agent. A prompt is guidance, not a boundary.
- Do not mask only the obvious fields. Configure the DLP classes to catch the long tail, free-text columns that carry names and identifiers too.
hoop.dev is open source, so you can verify where redaction happens before you route real data through it. See the getting started guide and how masking supports PII and PHI redaction for AI agents on BigQuery. Get the source at github.com/hoophq/hoop and test masking against a known sensitive column.
FAQ
Does data masking change my BigQuery tables?
No. The tables are untouched. hoop.dev redacts in the result stream on the way back to the agent, so the stored data is unchanged and the agent simply never receives the raw values.
Is masking automatic on every BigQuery query?
Masking is configured per connection with a DLP provider, not enabled by default. Once configured, it applies inline to the results that flow through the gateway.