Configuring AI coding agents access to BigQuery with session recording

When an AI coding agent runs queries against a data warehouse without any visibility or session recording, a single stray SELECT can expose millions of rows, trigger unexpected billing, or even violate privacy policies. The cost of that blind execution is measured not only in dollars but also in regulatory risk and loss of trust.

Today many teams hand a service‑account key to an LLM‑driven tool and point it directly at BigQuery. The key is stored in the CI pipeline, shared across dozens of jobs, and never rotated. No one sees which query was run, when, or by which model instance. If the model hallucinates a query that extracts sensitive customer identifiers, the organization has no forensic record to prove what happened.

Even when a team adds a logging sidecar or enables BigQuery’s own audit logs, the logs are tied to the service account, not to the individual AI request. The result is a gap: the request reaches the warehouse, but there is no per‑session evidence, no ability to replay the interaction, and no way to enforce data‑masking policies in real time.

What the organization really needs is a control point that sits between the AI agent and BigQuery, captures every request, and can apply policies before the query touches the data. The control point must be able to record the full session, enforce masking, and require human approval for risky operations, all without exposing the underlying credential to the agent.

Why session recording matters for AI agents

Session recording provides an immutable trail of every command the agent issues. For auditors, this is the decisive evidence that demonstrates who asked for what data and when. For security teams, it enables replay of a suspicious query to understand intent and impact. For developers, it surfaces unexpected model behavior that can be corrected in prompts or training data.

Because AI agents can generate queries on the fly, traditional static access controls are insufficient. A model might request a table it has never touched before, or combine columns in a way that reveals personally identifiable information. Session recording, combined with inline masking, ensures that even dynamically generated queries are subject to the same scrutiny as human‑written SQL.

Setting up the identity and access foundation

The first step is to define a non‑human identity that the AI agent will use. This identity is provisioned in the identity provider (for example, Azure AD, Google Workspace, or Okta) and granted a minimal set of groups that represent the data domains the agent is allowed to explore. The identity provider issues an OIDC token that the agent presents when it initiates a connection.

Continue reading? Get the full guide.

AI Session Recording + Session Binding to Device: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

At this stage, the token only proves who the request is coming from. It does not enforce any policy on the query itself. The token is validated by the gateway, but the actual enforcement must happen later in the data path.

Placing the gateway in the data path

hoop.dev is deployed as a Layer 7 gateway that proxies the connection between the AI agent and BigQuery. The gateway runs alongside a network‑resident agent that holds the service‑account credential, so the agent never sees the secret. When the AI model opens a session, it authenticates with its OIDC token, and hoop.dev validates that token against the identity provider.

Because hoop.dev sits in the data path, it can inspect the SQL payload before it reaches BigQuery. It records the entire session, applies inline masking to any columns that match a sensitive data pattern, and can pause the request for a human approver if the query exceeds a predefined cost or data‑exposure threshold. All of these enforcement outcomes exist only because hoop.dev is the only point where traffic can be examined and acted upon.

How session recording works in practice

When the AI agent sends a query, hoop.dev captures the request metadata (timestamp, user identity, originating IP) and the raw SQL text. It records the request metadata and raw SQL text in an audit store that resides outside the BigQuery environment, keeping the logs independent of the agent and the target service. The recorded session can later be replayed in a sandboxed environment for forensic analysis.

Because the gateway controls the flow, any attempt by the model to run a destructive command, such as DROP TABLE or a massive export, can be blocked automatically. If the query passes the policy checks, hoop.dev forwards it to BigQuery using the stored service‑account credential. The response from BigQuery is then inspected; any fields that contain sensitive values are masked in real time before they are returned to the AI agent.

Operational considerations

Credential management: The service‑account key lives only on the gateway’s agent. Rotating the key is a matter of updating the agent configuration, not the AI code.
Policy definition: Masking rules and approval thresholds are defined once in the gateway’s policy store. Changes take effect immediately for all sessions.
Scalability: The gateway can be run in a container orchestration platform, allowing it to handle many concurrent AI agents without becoming a bottleneck.

Getting started

To try this architecture, follow the getting‑started guide that walks you through deploying the gateway, registering a BigQuery connection, and configuring OIDC authentication for your AI service account. The learn section provides deeper coverage of session recording, inline masking, and approval workflows.

FAQ

Does session recording add latency to queries? The gateway adds only the time needed to inspect the payload and write the audit record, which is typically a few milliseconds and is negligible compared with BigQuery’s own execution time.

Can I disable recording for certain low‑risk queries? Policy rules can be scoped by query type, cost estimate, or data domain, allowing you to bypass recording for trivial SELECTs while still protecting high‑risk operations.

How do I retrieve recorded sessions? Recorded sessions are stored in the audit backend configured for the gateway. They can be queried through the dashboard or exported via the API for downstream analysis.

Take the next step

Explore the source code, contribute improvements, or spin up your own instance by visiting the GitHub repository. The community and documentation will help you integrate session recording for AI agents accessing BigQuery in a secure, auditable way.