editorialMay 27, 20269 min read

How to Connect Your AI Agents to BigQuery

You can connect an AI agent like Devin, Claude Code, or Cursor to BigQuery three ways: a GCP service account, the engineer's personal gcloud credentials, or a BigQuery MCP server. All three work on day one. All three create credential sprawl, audit-trail gaps, and rollback risk by day 60. The pattern that ships in production is an inline gateway between the agent and BigQuery that federates user identity, masks query responses, gates destructive writes, and analyzes query intent. The same access

Free White Paper

AI Agent Security + BigQuery IAM: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Coleman Nye

You can connect an AI agent like Devin, Claude Code, or Cursor to BigQuery three ways: a GCP service account, the engineer's personal gcloud credentials, or a BigQuery MCP server.
All three work on day one. All three create credential sprawl, audit-trail gaps, and rollback risk by day 60.
The pattern that ships in production is an inline gateway between the agent and BigQuery that federates user identity, masks query responses, gates destructive writes, and analyzes query intent. The same access patterns and failure modes apply when connecting agents to Snowflake, Redshift, or Databricks.

A data engineer connects Devin to BigQuery for the first time. The first query that would have taken her an hour comes back in two minutes. By lunch she has shipped three tickets. The session ends with her asking it to backfill a missing dimension table.

By the end of the week, two more engineers want the same setup. By the end of the month, the security team has questions. By the end of the quarter, the connection has been rolled back and the data lead is on a call with the CISO trying to figure out what to do next.

If you have not lived this, it's coming. The productivity unlock from connecting AI agents to your warehouse is real, immediate, and visible to leadership. The wheels come off between day 14 and day 90.

This post is about what changes between those two points, and the architectural pattern that lets the productivity stick.

What you unlock by connecting AI agents to BigQuery

Start with what works, because it is the part that gets remembered.

The incident-to-impact loop. A production incident fires. An engineer reads the error in the logs, traces it to a deploy, opens a runbook. With the agent connected to the warehouse, the same engineer asks one question: "What was the revenue impact of the failed approvals between 14:00 and 14:35 UTC?" The agent reads the runbook, writes the query, runs it against the warehouse, returns the dollar number and the affected merchant list. The incident report writes itself.

Without the connection, the same engineer spends twenty minutes opening Data Studio, writing the query, copying results back, pasting them into the postmortem. The agent does the analysis but cannot do the lookup. Every incident has this same shape. Multiply by your incident rate.

The feature-launch monitor. Engineering ships a change. Approval rates and error rates are visible in the warehouse a few minutes later. The agent watches both, surfaces anomalies, pings the on-call before the alerting threshold trips. The data team stops being a request queue and starts being an oversight function.

The data analyst pattern. Coding agents are shipping data-analyst modes. Devin already has one. Cursor and Claude Code are converging on the same surface. The mode is useless without warehouse access. With it, every engineer on the team is suddenly half-fluent in your schema. You stop needing to be a SQL secretary for the rest of the org.

The merchant-launch view. A new customer goes live. The agent watches the first six hours of production traffic against the warehouse: approval rate, error rate, P99 latency, anomalies in the payment-method mix. The engineering team that shipped the integration sees the same dashboard the data team would have built on Wednesday. Except it is still Monday morning.

The pattern across all four: BigQuery stops being a tool the data team uses on behalf of everyone else. It becomes a tool the whole engineering organization can reach through their agent, in the language they already speak. The same logic applies to Snowflake, Redshift, and Databricks. BigQuery is the example because it's where most teams hit this first.

This is what your leadership is reading about. This is what the AI mandate is asking for. The question is not whether to do it. The question is how to do it without it getting pulled.

How teams connect AI agents to BigQuery today (and why it breaks)

Three paths work on day one. All three break the same way by day 60.

Pattern	Day 1 setup	Day 60 reality
Service account	GCP service account with `BigQuery Data Viewer`. JSON key in agent config.	Three teams, three undocumented service accounts, two with editor scope. Security cannot list them.
Personal token	Agent runs as the engineer with their `gcloud` credentials.	Credential cached in agent storage and session logs. Shared sessions inherit the launcher's permissions.
MCP server	BigQuery MCP server with OAuth.	Same access semantics. Same audit gap. Protocol cleanliness without policy.

All three paths produce the same four failure modes by month two. None of them are theoretical.

The over-permission problem. The identity the agent connects with is broader than the engineer who launched it. Nobody wants to be the person who narrows the scope and then gets a ticket on Friday. Permissions widen, never tighten.

The credential-sprawl problem. Tokens and keys end up in agent runtimes, session configs, and platform logs. The credential outlives the session it was created for. Shared agent sessions inherit the original launcher's scope.

The compute-it-anyway problem. Column-level permissions catch what the requester touches. They cannot catch what the requester is trying to compute. An engineer with access to revenue and cost-of-goods-sold can ask the agent for gross profit. The warehouse sees two permitted queries. The governance posture sees a leaked KPI. This is the failure mode nobody sees coming.

Continue reading? Get the full guide.

AI Agent Security + BigQuery IAM: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

The audit-trail problem. Three months later, compliance pulls the query history. Every agent query shows up under one service account, or distributed across personal credentials with no consistent label. The audit log describes the credential, not the human, not the prompt, not the approver if there was one. The connection gets rolled back not because the agent did something wrong, but because nobody can prove it did not.

The gateway pattern: how to connect AI agents to BigQuery in production

The fix is not a different agent. It is not a different warehouse. It is not a new permission system to maintain in parallel.

The gateway sits inline between the agent and BigQuery and adds layers that source-side permissioning cannot give you. Your warehouse permissions stay where they are. The gateway adds policy enforcement in flight, where the failure modes live.

Four layers. Each one closes one of the four failure modes above.

Identity federation, not credential sharing. The agent's first query against the gateway prompts the user for SSO. Your identity provider (Okta, Entra ID, whichever) returns a short-lived token tied to that user. The token expires. The agent re-prompts. No service accounts with broad scope. No personal tokens cached in agent storage. Shared sessions stop working at the next expiry. The audit log answers "who, on whose behalf, against what data" because the user identity is in every event.

Response-time data masking. PII, PCI, financial fields. The gateway inspects the query response before it reaches the agent's context. Whatever the agent gets back has sensitive values redacted unless the requester is explicitly cleared to see them, in the clear, for that resource. This works in addition to warehouse-native masking, not in place of it. The warehouse catches the data it has tagged. The gateway catches the data it has not, including in JSON blobs, free-text comments, and federated query results from outside the warehouse. The catalog cannot keep up with the rate engineers add columns. The gateway does not have to.

Approval workflows for writes. The agent wants to run a DELETE, UPDATE, or MERGE against a production table? The gateway holds the query, analyzes the risk with an LLM, and routes it to Slack with the full text and analysis for a data engineer to approve or reject it in 30 seconds. The agent gets the rejection reason back as context, which is often more useful than the approval. "Don't compute gross profit, ask the finance agent" is the kind of feedback that updates an agent's behavior for the rest of the session.

Inline intent analysis. Before a query runs, a separate model reads it and judges what it is trying to do. Not just what tables it touches. What it is for. If the intent matches a rule (computing a financial KPI the requester does not own, scanning a customer table for PII export, anything you write a policy for), the gateway blocks the query and tells the agent why. This is the layer that catches the compute-it-anyway case. Column-level permissions cannot reason about intent. A second model can.

Together, these four layers turn the four failure modes from rollback triggers into non-events. Over-permissioning is replaced by short-lived user identity. Credential sprawl is replaced by SSO-federated sessions that expire. The compute-it-anyway case gets caught at the intent layer. The audit trail reconstructs the human chain of responsibility for every agent query.

The agent stays fast. The 95% of queries that are fine pass through with no human in the loop. Only the 5% that would have been the rollback story get held.

How to set up a BigQuery gateway for AI agents

You can stand up the gateway pattern in front of your existing BigQuery setup without changing the warehouse, the agent platform, or your IDP.

Deploy the gateway. It runs in front of BigQuery as a connection target.
Connect your identity provider. The same Okta or Entra ID your engineers use for everything else.
Add the gateway as an MCP connector in Devin, Claude Code, Cursor, or your agent of choice. Point it at your gateway's URL.
Define your masking rules, approval rules, and intent rules. Start permissive. Tighten as you watch what the agents actually do.

The first agent connects in under an hour. The masking and intent rules iterate over the following week as you see the actual query shape your engineers are asking for.

Try it against your own BigQuery instance: hoop.dev/docs/getting-started.

Frequently asked questions

Can I connect Devin to BigQuery? Yes. Three ways work on day one: a GCP service account, the engineer's personal gcloud credentials, or a BigQuery MCP server. All three create credential sprawl and audit gaps by day 60. Using an inline gateway between Devin and BigQuery handles identity, masking, approvals, and audit in one layer.

How do I give an AI agent access to BigQuery securely? The secure pattern is identity federation through an inline gateway. The agent prompts the user for SSO, receives a short-lived token tied to that user's identity, and queries BigQuery through the gateway. The gateway masks sensitive data in responses, gates destructive writes through Slack approval, and analyzes query intent before execution. No service accounts with broad scope. No personal tokens cached anywhere.

Can I use a service account for an AI agent in BigQuery? You can, and it works on day one. The problem is that service accounts hold broader permissions than the engineer launching the agent, accumulate across teams, and produce an audit trail tied to the service account rather than the human who initiated the query. By month two, most teams have three or more undocumented service accounts and a security team that cannot list them.

What permissions does an AI agent need for BigQuery? The minimum is whatever role lets the agent run the queries the user intends. BigQuery Data Viewer for read-only analytics. BigQuery Data Editor for writes. The harder question is which identity the agent should hold those permissions under. A shared service account leaks scope across users. The engineer's personal token leaks the credential into agent storage. Federating identity through a gateway lets the agent inherit each user's permissions per session.

Does BigQuery dynamic data masking work for AI agents? Partially. BigQuery's policy tags and column-level access controls apply when the agent queries through a session tied to a user identity. They do not catch sensitive data in untagged columns, JSON blobs, free-text fields, or federated queries that reach outside BigQuery. An inline gateway that masks responses by content type, not just by tagged column, closes that gap.

How do I audit which AI agent ran which BigQuery query? Without a gateway, the audit log shows the credential that ran the query (usually a shared service account or a personal token) and cannot reconstruct which human prompted the agent or which session the query belonged to. With identity federation through a gateway, every query is logged against the originating user, the agent session, and (where applicable) the approver.

Does Claude Code work with BigQuery? Yes, through the same patterns as Devin and Cursor: service account, personal gcloud credentials, or an MCP server. The failure modes are identical. The gateway pattern works the same way regardless of which coding agent is on the other end.

What is a BigQuery MCP server? An MCP (Model Context Protocol) server that exposes BigQuery as a tool for an AI agent. It handles the OAuth flow and gives the agent a standardized interface to query the warehouse. It improves the protocol layer but does not, by itself, add identity federation, response masking, write approvals, or intent analysis. Those layers belong to the gateway sitting in front of the warehouse.

Does this pattern work for Snowflake or Redshift? Yes. The four failure modes (over-permission, credential sprawl, compute-it-anyway, audit-trail) are warehouse-agnostic. The gateway pattern works the same way against Snowflake, Redshift, Databricks, or self-hosted Postgres. BigQuery is the example in this post because it is where most teams hit the access problem first.

AI agents are going to keep getting connected to your data warehouse. The teams winning right now are not the ones who locked it down. They are the ones who made it safe to leave open.

If your engineering org has already started this, or your security team has already started worrying about it, the gateway pattern is the conversation worth having this week, not next quarter.

Connect your first agent →

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts