An automated code‑generation pipeline spins up a large‑language‑model‑driven agent that writes SQL and runs it against the company’s data warehouse. Because the agent operates without a human user, giving it a non-human identity is essential. The agent authenticates with a static Google service‑account key that is checked into the CI repository. Every night the job executes dozens of queries, some of which pull personally identifiable information for model fine‑tuning. Because the shared key is used by every build, there is no way to attribute a specific query to a particular run, and no audit trail exists to prove whether the data was accessed appropriately. The shared key also grants unrestricted read‑write access across all datasets, making it easy for a mis‑behaving model to exfiltrate or corrupt data without any gatekeeper.
This situation illustrates the core challenge of providing non-human identity for AI coding agents. The goal is to replace a monolithic service account with an identity that can be scoped, audited, and revoked per execution. Doing so limits blast radius, satisfies compliance expectations around who touched data, and makes it possible to enforce policies such as masking of sensitive columns. However, simply issuing a per‑user OAuth token does not solve the full problem. The token still travels directly to BigQuery, bypassing any visibility or control layer. Without a gateway, the system lacks real‑time command blocking, session recording, or inline data masking. In short, the request reaches the target, but the organization still has no guardrails, no justification for each query, and no way to stop a rogue agent in flight.
Why non-human identity matters for AI coding agents
AI agents are non‑human by definition, but they still need an identity that behaves like a human user in the access‑control system. This identity must be:
- Issued on demand, so that each pipeline run gets a fresh token that can be revoked after the job finishes.
- Bound to a policy that limits which datasets and tables the agent may query.
- Auditable, providing a record that ties every query back to the specific pipeline execution.
- Capable of being inspected, so that sensitive fields can be masked before they leave the warehouse.
When these requirements are met, the organization can treat AI‑generated workloads with the same confidence it gives to human engineers. The identity becomes a first‑class citizen in the IAM system, and the downstream data platform sees only the permissions that the policy grants.
How hoop.dev enforces non-human identity for BigQuery
hoop.dev acts as a Layer 7 gateway that sits between the AI agent and BigQuery. The gateway receives the OAuth token that the agent obtains via GCP IAM federation, validates it, and then forwards the request to the BigQuery service using its own short‑lived credential. Because the gateway is the only component that can speak to BigQuery, it becomes the exclusive point where enforcement can occur.
hoop.dev records each query, capturing the user‑provided token, the SQL statement, and the execution timestamp. This session log is stored outside the agent’s process, ensuring that the audit trail cannot be tampered with by the agent itself. The gateway also applies inline masking rules to any column marked as sensitive, so that even if the agent tries to retrieve raw personally identifiable information, the response is scrubbed before it reaches the agent.
