Non-human identity for AI coding agents on BigQuery

An automated code‑generation pipeline spins up a large‑language‑model‑driven agent that writes SQL and runs it against the company’s data warehouse. Because the agent operates without a human user, giving it a non-human identity is essential. The agent authenticates with a static Google service‑account key that is checked into the CI repository. Every night the job executes dozens of queries, some of which pull personally identifiable information for model fine‑tuning. Because the shared key is used by every build, there is no way to attribute a specific query to a particular run, and no audit trail exists to prove whether the data was accessed appropriately. The shared key also grants unrestricted read‑write access across all datasets, making it easy for a mis‑behaving model to exfiltrate or corrupt data without any gatekeeper.

This situation illustrates the core challenge of providing non-human identity for AI coding agents. The goal is to replace a monolithic service account with an identity that can be scoped, audited, and revoked per execution. Doing so limits blast radius, satisfies compliance expectations around who touched data, and makes it possible to enforce policies such as masking of sensitive columns. However, simply issuing a per‑user OAuth token does not solve the full problem. The token still travels directly to BigQuery, bypassing any visibility or control layer. Without a gateway, the system lacks real‑time command blocking, session recording, or inline data masking. In short, the request reaches the target, but the organization still has no guardrails, no justification for each query, and no way to stop a rogue agent in flight.

Why non-human identity matters for AI coding agents

AI agents are non‑human by definition, but they still need an identity that behaves like a human user in the access‑control system. This identity must be:

Issued on demand, so that each pipeline run gets a fresh token that can be revoked after the job finishes.
Bound to a policy that limits which datasets and tables the agent may query.
Auditable, providing a record that ties every query back to the specific pipeline execution.
Capable of being inspected, so that sensitive fields can be masked before they leave the warehouse.

When these requirements are met, the organization can treat AI‑generated workloads with the same confidence it gives to human engineers. The identity becomes a first‑class citizen in the IAM system, and the downstream data platform sees only the permissions that the policy grants.

How hoop.dev enforces non-human identity for BigQuery

hoop.dev acts as a Layer 7 gateway that sits between the AI agent and BigQuery. The gateway receives the OAuth token that the agent obtains via GCP IAM federation, validates it, and then forwards the request to the BigQuery service using its own short‑lived credential. Because the gateway is the only component that can speak to BigQuery, it becomes the exclusive point where enforcement can occur.

hoop.dev records each query, capturing the user‑provided token, the SQL statement, and the execution timestamp. This session log is stored outside the agent’s process, ensuring that the audit trail cannot be tampered with by the agent itself. The gateway also applies inline masking rules to any column marked as sensitive, so that even if the agent tries to retrieve raw personally identifiable information, the response is scrubbed before it reaches the agent.

Continue reading? Get the full guide.

Non-Human Identity Management + AI Human-in-the-Loop Oversight: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

When a query exceeds a predefined risk threshold, such as a SELECT that scans an entire dataset, hoop.dev pauses the request and routes it to a human approver. The approver can grant a one‑time exception or reject the operation, providing just‑in‑time approval that aligns with the principle of least privilege.

Because hoop.dev is the sole data path, it can also block commands that are known to be destructive, such as DROP TABLE or DELETE without a WHERE clause. The blocking happens before the command reaches BigQuery, preventing accidental or malicious data loss.

All of these enforcement outcomes, session recording, inline masking, just‑in‑time approval, and command blocking, exist only because hoop.dev sits in the data path. If the AI agent were to connect directly to BigQuery, none of these controls would be enforceable.

Getting started

To adopt this pattern, begin with the getting‑started guide. The guide walks you through deploying the gateway, configuring a BigQuery connection, and enabling GCP IAM federation for per‑run tokens. The learn section provides deeper coverage of masking policies, approval workflows, and session replay.

FAQ

Do I need to change my existing BigQuery queries?
No. The agent continues to use standard client libraries. hoop.dev intercepts the traffic transparently, so the query syntax remains unchanged.

Can I still use a shared service account for legacy jobs?
Yes, but those jobs will not benefit from the audit and masking guarantees that hoop.dev provides. It is recommended to migrate legacy workloads to the non-human identity model as soon as practical.

How is the audit data protected?
The audit records are stored by hoop.dev outside the agent’s runtime, making them immutable from the agent’s perspective. Access to the audit store is governed by the same identity‑aware policies that protect the data path.

Next steps

Explore the source code, contribute improvements, or file an issue on the project’s GitHub repository: hoop.dev on GitHub. The repository contains the full reference implementation and examples for extending the gateway to other data platforms.

Non-human identity for AI coding agents on BigQuery

Why non-human identity matters for AI coding agents

How hoop.dev enforces non-human identity for BigQuery

Getting started

FAQ

Next steps

Save the open-source gateway for agent data access